<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">

<article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
 <front>
    <journal-meta>
	<journal-id journal-id-type="publisher-id">Jemr</journal-id>
      <journal-title-group>
        <journal-title>Journal of Eye Movement Research</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1995-8692</issn>
	  <publisher>								
	  <publisher-name>Bern Open Publishing</publisher-name>
	  <publisher-loc>Bern, Switzerland</publisher-loc>
	</publisher>
    </journal-meta>
    <article-meta>
	<article-id pub-id-type="doi">10.16910/jemr.11.6.5</article-id> 
	  <article-categories>								
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
		</article-categories>
      <title-group>
        <article-title>Representative Scanpath Identification for Group Viewing Pattern Analysis</article-title>
      </title-group>
	   <contrib-group> 
				<contrib contrib-type="author">
					<name>
						<surname>Li</surname>
						<given-names>Aoqi</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Chen</surname>
						<given-names>Zhenzhong</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>				
        <aff id="aff1">
		<institution>Wuhan University, Wuhan</institution>,   <country>China</country>
        </aff>
		</contrib-group>   

		
	  <pub-date date-type="pub" publication-format="electronic"> 
		<day>22</day>  
		<month>11</month>
        <year>2018</year>
      </pub-date>
	  <pub-date date-type="collection" publication-format="electronic"> 
	  <year>2018</year>
	</pub-date>
      <volume>11</volume>
      <issue>6</issue>
	 <elocation-id>10.16910/jemr.11.6.5</elocation-id> 
	<permissions> 
	<copyright-year>2018</copyright-year>
	<copyright-holder>Li, A.. &#x26; Chen, Z.</copyright-holder>
	<license license-type="open-access">
  <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License, 
  (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">
    https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p>
</license>
	</permissions>
      <abstract>
        <p>Scanpaths are composed of fixations and saccades. Viewing trends reflected by scanpaths
play an important role in scientific studies like saccadic model evaluation and real-life applications
like artistic design. Several scanpath synthesis methods have been proposed to
obtain a scanpath that is representative of the group viewing trend. But most of them either
target a specific category of viewing materials like webpages or leave out some useful information
like gaze duration. Our previous work defined the representative scanpath as the
barycenter of a group of scanpaths, which actually shows the averaged shape of multiple
scanpaths. In this paper, we extend our previous framework to take gaze duration into account,
obtaining representative scanpaths that describe not only attention distribution and
shift but also attention span. The extended framework consists of three steps: Eye-gaze data
preprocessing, scanpath aggregation and gaze duration analysis. Experiments demonstrate
that the framework can well serve the purpose of mining viewing patterns and “barycenter”
based representative scanpaths can better characterize the pattern.</p>
      </abstract>
      <kwd-group>
        <kwd>eye movement</kwd>
        <kwd>eye tracking</kwd>
        <kwd>representative scanpath</kwd>
        <kwd>attention</kwd>
        <kwd>viewing pattern</kwd>
        <kwd>barycenter</kwd>	
        <kwd>gaze duration</kwd>        	
      </kwd-group>
    </article-meta>
  </front>	
  <body>

    <sec id="S1">
      <title>Introduction</title>

<p>Vision is the main channel through which humans acquire external information.
Based on the eye-mind hypothesis (<xref ref-type="bibr" rid="b1">1</xref>), what subjects see can help to
predict human's cognitive activities such as user intentions (<xref ref-type="bibr" rid="b2">2</xref>),
sarcasm understandability (<xref ref-type="bibr" rid="b3">3</xref>), risky decision making (<xref ref-type="bibr" rid="b4">4</xref>) and reading
effort (<xref ref-type="bibr" rid="b5">5</xref>). Eye trackers record human viewing behavior in the form of
raw eye tracking data, which can be processed into scanpaths (<xref ref-type="bibr" rid="b6">6</xref>)
composed of fixations and saccades.</p>

<p>Scanpaths reflect the ebbs and flows of visual attention. According
to Yarbus’ research (<xref ref-type="bibr" rid="b7">7</xref>), scanpaths from different observers for the same
visual stimuli in free viewing conditions are similar but not identical.
The scanning order of one subject is not perfectly congruent with that
of others as shown in Figure 1 (a), so it remains a challenging task to
identify from multiple scanpaths a pattern that reflects the attention
synchrony of different subjects as shown in Figure 1 (b). Such a pattern
not only plays an important role in understanding how humans perceive
and explore their surrounding scenes but also reveals some important
properties of visual stimuli, so it has a wide range of applications in
many fields. For example, in psychology, it can be used to identify
reading habits of experts and detect reading disorder; in marketing, it
can tell us which parts of an advertisement first grab customer
attention and help to design a more user-friendly interface; in computer
vision, it can be regarded as the group viewing pattern to train a
network for scanpath prediction.</p>

<fig id="fig01" fig-type="figure" position="float">
					<label>Figure 1.</label>
					<caption>
						<p>An example to illustrate the viewing pattern for a natural
image from MIT1003 dataset.</p>
					</caption>
					<graphic id="graph01" xlink:href="jemr-11-06-e-figure-01.png"/>
				</fig>

    </sec>
	
    <sec id="S2">
      <title>Related Work</title>

<p>Several methods were proposed to analyze scanpaths. For example,
T-pattern is a tool to discover repetitive scan patterns in each
individual scanpath (<xref ref-type="bibr" rid="b8 b9">8, 9</xref>). Others attempt to characterize complex
scanning patterns in dynamic tasks such as air traffic control (<xref ref-type="bibr" rid="b10">10</xref>).
However, to get the group viewing pattern, we need to take into account
all individual scanpaths rather than focus on a single one, like the
identified scanpath in Figure 1(b), which we call representative
scanpath. The surge of interest in dynamic visual attention gives rise
to various methods for representative scanpaths identification, most of
which either stem from sequence mining algorithms or target a specific
category of visual stimuli such as web pages (<xref ref-type="bibr" rid="b11 b12 b13 b14 b15 b16">11, 12, 13, 14, 15, 16</xref>).
So they have limitations when applied to analyze scanpaths. Existing
methods to analyze scanpaths include extracting common subsequences
shared by all the subjects (<xref ref-type="bibr" rid="b11 b17 b18 b19">11, 17, 18, 19</xref>). However, in the case where
there is no common component shared by individual scanpaths, methods in
this category will fail to produce any pattern. To be more tolerant of
individual differences, sequential pattern mining algorithms can be used
to obtain frequent subsequences supported by a specified number of
subjects (<xref ref-type="bibr" rid="b20">20</xref>). But a fixed threshold of subject number can hardly be
suitable for all the images due to the varying degree of scanpath
inconsistency incurred by personal viewing habits and visual stimuli
properties. Hence produced subsequences may still be too short to
reflect the complete viewing pattern. Instead of simply focusing on
subsequences, scanpath trend analysis (STA) (<xref ref-type="bibr" rid="b13">13</xref>) is proposed to acquire
the viewing pattern from a whole new perspective. STA first selects
representatively trending instances from scanpath components and then
rearranges them based on their average rank in all the individual
scanpaths. To make STA more tolerant, a new parameter <italic>tolerance
level</italic>, which allows trending instances to be shared by a subset
of scanpaths rather than all of them, is added to the original STA
algorithm (<xref ref-type="bibr" rid="b16">16</xref>), but it is difficult to propose a specific
<italic>tolerance level</italic>. The main limitation of STA and its
variant is that it targets web pages and relies on the natural
segmentation of visual elements (e.g., navigation bar, text box, etc.)
to denote scanpaths by character strings. Apart from the above studies,
researchers in computer vision community are also interested in eye
tracking data. Saliency models predicting fixation distribution and
saccadic models predicting scanpaths are two important topics in
computer vision. While fixation density map (<xref ref-type="bibr" rid="b21">21</xref>) has been widely
accepted as the baseline to evaluate saliency model performance, few
efforts are dedicated into finding an appropriate baseline for saccadic
models. Generally, researchers obtain the upper bound of scanpath
prediction performance based on inter-observer consistency and choose
from individual scanpaths the one that is the closest to the rest on
behalf of all the scanpaths for visualization (<xref ref-type="bibr" rid="b22">22</xref>). Similar to STA, the
inter-observer consistency method (IOC) also preprocesses recorded
individual scanpaths into sequences based on clustering results.
Scanpath similarity is measured by Needleman-Wunsch string matching
algorithm. Such simplification retains the viewing order but abandons
the spatial distribution of scanpaths.</p>

<p>However, it is fixation order and fixation distribution that jointly
determine scanpath shape. So some researchers adopted Dynamic Time
Warping (DTW) (<xref ref-type="bibr" rid="b23">23</xref>) to directly compare scanpaths without preprocessing
or simplication (<xref ref-type="bibr" rid="b24">24</xref>). In our previous work (<xref ref-type="bibr" rid="b25">25</xref>), we proposed the
Candidate-constrained DTW Barycenter Averaging (CDBA) algorithm to take
into account spatial distribution when analyzing the viewing trend. But
still there is little discussion about the important role that gaze
duration plays in characterizing scanpaths. Hence, in this paper we
extend the framework to generalize viewing trends in not only scanpath
shape but also gaze duration. Experiments are conducted to assess the
ability of obtained scanpaths to reflect viewing patterns.</p>
    </sec>
	
    <sec id="S3">
      <title>Methodology</title>


<p>The overall framework to obtain the representative scanpath is shown
in Figure 2. It consists of three steps: eye-gaze data preprocessing,
scanpath aggregation and gaze duration analysis. Fixation position,
order and duration are fully exploited to identify the viewing pattern.
The preprocessing step is divided into three substeps: outlier removal,
AOI extraction and center identification. The second step focuses on
scanpath shape, in which multiple scanpaths are aggregated into a single
one. Finally, based on the aggregated scanpath, we analyze the pattern
from the perspective of gaze duration and combine the analysis results
from all three aspects to obtain the representative scanpath.</p>

<fig id="fig02" fig-type="figure" position="float">
					<label>Figure 2.</label>
					<caption>
						<p>The extended framework to find a representative scanpath
that shows attention distribution, attention shift as well as attention
span.</p>
					</caption>
					<graphic id="graph02" xlink:href="jemr-11-06-e-figure-02.png"/>
				</fig>

    <sec id="S3a">
      <title>Eye-gaze Data Preprocessing</title>

<p>Eye-gaze data are generally expressed by sequences of fixations. Each
fixation is recorded as a point with coordinates and gaze duration. The
preprocessing step makes preparation for the next pattern mining
procedures: outlier removal ensures the consistency of remaining
scanpaths, AOI extraction facilitates a higher-level representation,
center identification retains the spatial distribution of scanpath
components.</p>

					<graphic id="graph09" xlink:href="jemr-11-06-e-figure-09.png"/>

<p><bold>Outlier Removal</bold>. With different preferences, subjects
allocate fixations in irregular and idiosyncratic manners. In addition,
inevitable errors in eye tracking and data processing increase the
uncertainty of recorded fixations. Therefore, fixations that are
isolated might come from interesting viewing behaviors of subjects or
measurement errors of eye trackers, leading to discrepancy among
scanpaths. Even fixation distributions are similar, how fixations are
sequentially arranged to reflect the actual viewing process still varies
with different individuals. Therefore, both fixation position and order
are potentially causes for scanpath inconsistency.</p>

<p>To eliminate the influence of outlier scanpaths on both spatial
distribution and temporal order, we exclude outlier scanpaths with
boxplot at the very beginning. Boxplot is a statistical tool that
enables us to detect outliers and observe the dispersion degree of data.
Algorithm 1 explains how the boxplot works in detail. In Algorithm 1, we
use Dynamic Time Warping (DTW) (<xref ref-type="bibr" rid="b23">23</xref>) to calculate the distance or
dissimilarity</p>

<p>between any two scanpaths. Outlier removal guarantees inter-observer
consistency to some degree so the result pattern can reflect the common
trend from the compatible majority.</p>

<p><bold>AOI Extraction</bold>. According to Gestalt theory (<xref ref-type="bibr" rid="b26">26</xref>), the
nature of unified whole is not simply the addition of its parts. So
visual attraction is not from a single pixel but a whole region of
interest. It is possible that for the same visual target, fixations
scatter on different locations due to the high degree of viewing
freedom. As a result, fixation based scanpaths do not facilitate an
abstract expression, making it hard to identify what is common in eye
tracking data. Therefore, we should express the representative scanpath
by higher level components such as AOIs. For example, ScanMatch (<xref ref-type="bibr" rid="b27">27</xref>)
algorithm uses grid mask to transform fixation based scanpaths to AOI
sequences. But the number of grids is flexibly determined and AOIs are
not associated with image content. Considering that fixations are
stimulus-driven, the clustering structure of fixations is closely
related to the distribution of visual attraction. Hence, the
representative scanpaths we discuss in this paper are composed of AOIs
that are associated with fixation clusters.</p>

<p>All the fixation points are clustered by the algorithm proposed by
Rodriguez et al. (<xref ref-type="bibr" rid="b28">28</xref>), which considers two properties of points: local
density <inline-formula>

<mml:math id="m1"><mml:mi>ρ</mml:mi></mml:math></inline-formula>
and distance from points with higher density
<inline-formula>

<mml:math id="m2"><mml:mi>δ</mml:mi></mml:math></inline-formula>.</p>

<p>Fixations with large values of <inline-formula>

<mml:math id="m3"><mml:mi>ρ</mml:mi></mml:math></inline-formula>
and <inline-formula>

<mml:math id="m4"><mml:mi>δ</mml:mi></mml:math></inline-formula>
are recognized as cluster examplars. To determine the number of
clusters, <inline-formula>

<mml:math id="m5"><mml:mrow><mml:mi>γ</mml:mi><mml:mo>=</mml:mo><mml:mi>ρ</mml:mi><mml:mo>×</mml:mo><mml:mi>δ</mml:mi></mml:mrow></mml:math></inline-formula>
is calculated for each fixation and all the values are sorted in
decreasing order. Then a threshold is set so that fixations with
<inline-formula>

<mml:math id="m6"><mml:mi>γ</mml:mi></mml:math></inline-formula>
larger than the threshold stick out and cluster number is accordingly
determined. The threshold can be set as the arithmetic mean or geometric
mean empirically. In our experiment, we used the weighted geometric
mean, which is calculated as follows:</p>

<fig id="eq01" fig-type="equation" position="anchor">
					<label>(1)</label>
					<graphic id="equation01" xlink:href="jemr-11-06-e-equation-01.png"/>
				</fig>

<p>where <inline-formula>

<mml:math id="m7"><mml:msub><mml:mi>γ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>,
<inline-formula>

<mml:math id="m8"><mml:msub><mml:mi>γ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula>,…,
<inline-formula>

<mml:math id="m9"><mml:msub><mml:mi>γ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math></inline-formula>
have been sorted in decreasing order. The weighted geometric mean puts
more emphasis on larger <inline-formula>

<mml:math id="m10"><mml:mi>γ</mml:mi></mml:math></inline-formula>
and leads to fewer and less overlapped clusters than the geometric
mean.</p>

<p><bold>Center Identification</bold>. Now all the fixations are
assigned to different AOIs. To retain the spatial information of
scanpaths, we need to take into account the locations of AOIs. Instead
of simply averaging coordinates or choosing points with large
<inline-formula>

<mml:math id="m11"><mml:mi>γ</mml:mi></mml:math></inline-formula>
as centers, we adopt a random walk based method (<xref ref-type="bibr" rid="b29">29</xref>) to identify AOI
centers, which is more robust and less likely to be affected by edge
points of a cluster. The random walk based method aims to obtain a
coefficient <inline-formula>

<mml:math id="m12"><mml:mi>l</mml:mi></mml:math></inline-formula>
for each fixation in the AOI and calculates the weighted center as the
final AOI center.</p>

<p>The coefficient <inline-formula>

<mml:math id="m13"><mml:mi>l</mml:mi></mml:math></inline-formula>
is updated by the following formula:</p>

<fig id="eq02" fig-type="equation" position="anchor">
					<label>(2)</label>
					<graphic id="equation02" xlink:href="jemr-11-06-e-equation-02.png"/>
				</fig>

<p>where <inline-formula>

<mml:math id="m14"><mml:mrow><mml:mi>w</mml:mi><mml:mo stretchy="false" form="prefix">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false" form="postfix">)</mml:mo></mml:mrow></mml:math></inline-formula>
is the initial coefficient of fixation <inline-formula>

<mml:math id="m15"><mml:mi>i</mml:mi></mml:math></inline-formula>
defined by fixation density, <inline-formula>

<mml:math id="m16"><mml:mi>η</mml:mi></mml:math></inline-formula>
is the normalizing parameter, <inline-formula>

<mml:math id="m17"><mml:mrow><mml:mi>q</mml:mi><mml:mo stretchy="false" form="prefix">(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false" form="postfix">)</mml:mo></mml:mrow></mml:math></inline-formula>
is the transition probability from fixation
<inline-formula>

<mml:math id="m18"><mml:mi>i</mml:mi></mml:math></inline-formula>
to fixation <italic>j</italic>.</p>

<fig id="eq03" fig-type="equation" position="anchor">
					<label>(3)</label>
					<graphic id="equation03" xlink:href="jemr-11-06-e-equation-03.png"/>
				</fig>

<p>where <inline-formula>

<mml:math id="m19"><mml:mrow><mml:mi>D</mml:mi><mml:mo stretchy="false" form="prefix">(</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false" form="postfix">)</mml:mo></mml:mrow></mml:math></inline-formula>
is the Euclidean distance from fixation <inline-formula>

<mml:math id="m20"><mml:mi>j</mml:mi></mml:math></inline-formula>
to fixation <inline-formula>

<mml:math id="m21"><mml:mi>i</mml:mi></mml:math></inline-formula>,
<inline-formula>

<mml:math id="m22"><mml:mi>σ</mml:mi></mml:math></inline-formula>
is introduced to influence the center distribution subtly.</p>

<p>Different from simple segmentation or grid mask that only allows
scanpaths to be treated as character strings, AOI centers make it
possible to denote scanpaths by sequences of coordinates and thus can
also be regarded as indicators of AOI distribution. AOIs with identified
centers are considered as candidate components for the representative
scanpath in the aggregation stage.</p>
    </sec>
	
    <sec id="S3b">
      <title>Scanpath Aggregation</title>

<p>Generally speaking, the barycenter of points in a cluster is regarded
as the representative or examplar of the cluster. Likewise, we aggregate
multiple scanpaths into a single one by computing the “barycenter” of
the scanpath set. In other words, we try to calculate a representative
scanpath that is the closest to individual scanpaths in terms of average
distance. Mathematically, the representative scanpath is defined as
follows:</p>

<fig id="eq04" fig-type="equation" position="anchor">
					<label>(4)</label>
					<graphic id="equation04" xlink:href="jemr-11-06-e-equation-04.png"/>
				</fig>

<p>where <inline-formula>

<mml:math id="m23"><mml:mi>r</mml:mi></mml:math></inline-formula>
is the representative scanpath, <inline-formula>

<mml:math id="m24"><mml:msup><mml:mi>s</mml:mi><mml:mi>′</mml:mi></mml:msup></mml:math></inline-formula>
is any scanpath that may become the representative scanpath,
<inline-formula>

<mml:math id="m25"><mml:mi>s</mml:mi></mml:math></inline-formula>
is an individual scanpath in the given scanpath set
<inline-formula>

<mml:math id="m26"><mml:mtext mathvariant="normal">sps</mml:mtext></mml:math></inline-formula>,
and <inline-formula>

<mml:math id="m27"><mml:mtext mathvariant="normal">Dist</mml:mtext></mml:math></inline-formula>
is a function calculating the distance or dissimilarity between two
scanpaths.</p>

<p>Here we utilize Dynamic Time Warping (DTW) to measure scanpath
distance. DTW was first put forward for speech recognition and then
widely used in time series analysis (<xref ref-type="bibr" rid="b30">30</xref>). Traditional string matching
algorithms like Needleman-Wunsch algorithm (<xref ref-type="bibr" rid="b31">31</xref>) and Levenshtein Distance
(<xref ref-type="bibr" rid="b32">32</xref>) simply treat scanpaths as strings and need to additionally
construct a cost matrix to take into account spatial proximity, while
DTW already involves the construction. In most cases, scanpaths are
recorded as sequences of components with coordinates. Given two
scanpaths <inline-formula>

<mml:math id="m28"><mml:mrow><mml:mi>A</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x3C;</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mspace width="0.222em"></mml:mspace><mml:msub><mml:mi>a</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi>⋯</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>&#x3E;</mml:mo></mml:mrow></mml:math></inline-formula>
and <inline-formula>

<mml:math id="m29"><mml:mrow><mml:mi>B</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x3C;</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mspace width="0.222em"></mml:mspace><mml:msub><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi>⋯</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>&#x3E;</mml:mo></mml:mrow></mml:math></inline-formula>,
the DTW distance is recursively computed by the following formula:</p>

<fig id="eq05" fig-type="equation" position="anchor">
					<label>(5)</label>
					<graphic id="equation05" xlink:href="jemr-11-06-e-equation-05.png"/>
				</fig>

<p>where <inline-formula>

<mml:math id="m30"><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>,
<inline-formula>

<mml:math id="m31"><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula>
are the subsequences of <inline-formula>

<mml:math id="m32"><mml:mi>A</mml:mi></mml:math></inline-formula>
and B, <inline-formula>

<mml:math id="m33"><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>
and <inline-formula>

<mml:math id="m34"><mml:msub><mml:mi>b</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula>
are components of scanpaths <inline-formula>

<mml:math id="m35"><mml:mi>A</mml:mi></mml:math></inline-formula>
and <inline-formula>

<mml:math id="m36"><mml:mi>B</mml:mi></mml:math></inline-formula>
respectively, <inline-formula>

<mml:math id="m37"><mml:mrow><mml:mi>δ</mml:mi><mml:mo stretchy="false" form="prefix">(</mml:mo><mml:mo stretchy="false" form="postfix">)</mml:mo></mml:mrow></mml:math></inline-formula>
is the Euclidean distance function. The distance or dissimilarity
between scanpath <inline-formula>

<mml:math id="m38"><mml:mi>A</mml:mi></mml:math></inline-formula>
and <inline-formula>

<mml:math id="m39"><mml:mi>B</mml:mi></mml:math></inline-formula>
is:</p>

<fig id="eq06" fig-type="equation" position="anchor">
					<label>(6)</label>
					<graphic id="equation06" xlink:href="jemr-11-06-e-equation-06.png"/>
				</fig>

<p>It is difficult to directly get the optimal solution of Equation (1).
Hence, we add the following constraints to make it feasible:</p>
<list list-type="bullet">
  <list-item>
    <p>The representative scanpath must be composed of abstract scanpath
    components such as AOIs;</p>
  </list-item>
  <list-item>
    <p>Any two contiguous components in the representative scanpath must
    be contiguous in at least one individual scanpath;</p>
  </list-item>
  <list-item>
    <p>The occurrence count of each component in the representative
    scanpath does not exceed the maximum occurrence count of the
    component in all the individual scanpaths.</p>
  </list-item>
</list>
<p>These constraints not only simplify the aggregation but also force
the obtained scanpath to be more reasonable. The first constraint
guarantees the aggregated scanpath is expressed at a higher level. The
second and the third constraints ensure that the aggregated scanpath
will not deviate too far from individual scanpaths. We propose two
methods for scanpath aggregation.</p>

<p><bold>Heuristic Method</bold>. The heuristic method first constructs
a candidate set for each AOI. The candidate set contains all the
potential subsequent AOIs for a certain AOI. In other words, AOIs in the
candidate set for <inline-formula>

<mml:math id="m40"><mml:msub><mml:mtext mathvariant="normal">AOI</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>
must follow <inline-formula>

<mml:math id="m41"><mml:msub><mml:mtext mathvariant="normal">AOI</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>
in at least one individual scanpath. Then all the possible scanpaths are
enumerated by extending scanpaths of 1 fixation to scanpaths of
<inline-formula>

<mml:math id="m42"><mml:mi>n</mml:mi></mml:math></inline-formula>
fixations. A scanpath is extended by choosing an AOI from the candidate
set of the last AOI on the scanpath and adding it to the end. When the
occurrence count of a certain AOI is equal to its maximum occurrence
count in individual scanpths, the AOI is removed from the candidate set
and thus will not appear in later enumerated scanpaths. Finally, the
scanpath with the smallest DTW from individual scanpaths is chosen from
all the enumerated scanpaths as the representative. <italic>n</italic>
is the specified maximum fixation number. When <italic>n</italic> is
large enough, we can get the theoretically optimal result for Equation
(2), which provides a lower bound of the average distance.</p>

<p><bold>Candidate-constrained DTW Barycenter Averaging (CDBA)
algorithm</bold>. Since the heuristic method is time and space
consuming, we propose another algorithm for scanpath aggregation by
imposing some constraints on the DTW Barycenter Averaging (DBA)
algorithm (<xref ref-type="bibr" rid="b33">33</xref>) as an approximation (<xref ref-type="bibr" rid="b25">25</xref>). CDBA also needs to construct a
candidate set for each AOI and adjust the set members like the heuristic
method. Then it defines an initial average scanpath as the reference
scanpath and then updates the reference scanpath iteratively. For each
iteration, CDBA consists of two steps: computing DTW between every
individual scanpath and the reference scanpath and updating the
components of the reference scanpath.</p>
<list list-type="bullet">
  <list-item>
    <p>DTW computation. When computing DTW between two sequences, we can
    obtain the accumulation matrix and find the path of cost
    accumulation, which indicates the optimal alignment between
    sequences. The process of DTW computation is repeated between every
    actual scanpath and the reference scanpath.</p>
  </list-item>
  <list-item>
    <p>Scanpath update. In the update step, each component of the
    reference scanpath is updated by the “constrained barycenter” of
    fixations that are aligned to it during the computation process. The
    “constrained barycenter” means an AOI belonging to the candidate set
    and having the minimum average distance with all the aligned
    fixations.</p>
  </list-item>
</list>
<p>The above two steps are repeated until the reference scanpath does
not change. The process of CDBA is summarized in Algorithm 2.</p>

					<graphic id="graph10" xlink:href="jemr-11-06-e-figure-10.png"/>
    </sec>
	
    <sec id="S3c">
      <title>Gaze Duration Analysis</title>

<p>After scanpath aggregation, we obtain an aggregated scanpath that can
tell us not only which areas draw our attention but also the priority of
attraction. In this section, we aim to embed gaze duration into the
aggregated scanpath. To specify how long an AOI can hold our attention,
we transform each individual scanpath (of fixations) into an AOI
sequence (of clusters) and statistically analyze the gaze duration of
each AOI for all the individual scanpaths. The gaze duration of each AOI
in the aggregated scanpath is obtained by averaging the gaze duration of
the same AOI in all the individual sequences. Note that when we analyze
AOI duration, one and the same AOI appearing more than once in a
sequence is regarded as different AOIs and will be distinguished by
their appearing order in the sequence.</p>
    </sec>
    </sec>

    <sec id="S4">
      <title>Eye Tracking Study</title>
    <sec id="S4a">
      <title>Eye Tracking Data</title>

<p>To investigate the rationality of representative scanpaths, we
conduct experiments on two large public eye-tracking data sets, namely
OSIE data set (<xref ref-type="bibr" rid="b34">34</xref>) and MIT1003 data set (<xref ref-type="bibr" rid="b35">35</xref>).</p>
<list list-type="bullet">
  <list-item>
    <p>OSIE Data Set contains 700 images. Each image is freely viewed by
    15 subjects for 3 seconds. All the images are of the size
    <inline-formula>

    <mml:math id="m43"><mml:mrow><mml:mn>800</mml:mn><mml:mo>×</mml:mo><mml:mn>600</mml:mn></mml:mrow></mml:math></inline-formula>
    pixels.</p>
  </list-item>
  <list-item>
    <p>MIT1003 Data Set includes 1003 scenes freely viewed by 15
    subjects for 3 seconds. The longest dimension of each image is 1024
    pixels.</p>
  </list-item>
</list>
    </sec>

    <sec id="S4b">
      <title>Procedure</title>

<p>The key process in our framework is scanpath aggregation, which can
be substituted by other methods like eMine (<xref ref-type="bibr" rid="b11">11</xref>), STA (<xref ref-type="bibr" rid="b13">13</xref>), SPAM (<xref ref-type="bibr" rid="b20">20</xref>) and
IOC (<xref ref-type="bibr" rid="b22">22</xref>). The first three of them can not directly operate on scanpaths
consisting of fixations with coordinates and need to convert scanpaths
into character strings. IOC also relies on some preprocessing steps for
scanpath quantization. To make sure the comparison is fair, we adopt the
same preprocessing step in our framework. The outlier removal process
averagely excludes 0.61 and 0.86 scanpaths per image for OSIE and MIT
1003 data sets, respectively. In addition, despite the outlier removal
process, eMine still fails to produce any result for some images, so for
eMine algorithm, we only consider cases in which eMine algorithm has
final outputs. For SPAM algorithm, we set the minimum supporting number
of subjects as the half of the total number, which may lead to more than
one frequent subsequences, so we choose from these frequent subsequences
the one that is optimal with regard to Equation (1) as the
representative scanpath. For IOC algorithm, we adapt it for our
framework by taking DTW as its distance function and choosing the
scanpath with the smallest average DTW. For the heuristic method, we
need to determine the specified maximum number
<inline-formula>

<mml:math id="m44"><mml:mi>n</mml:mi></mml:math></inline-formula>
when enumerating all the possible scanpaths. Figure 3 shows average DTW
varying with given maximum length <inline-formula>

<mml:math id="m45"><mml:mi>n</mml:mi></mml:math></inline-formula>.
For both data sets, when <inline-formula>

<mml:math id="m46"><mml:mi>n</mml:mi></mml:math></inline-formula>
is equal to or larger than 8, the average DTW does not change and the
heuristic method can get the theoretically best results. So the maximum
number is set as 8 in later discussion for the heuristic method unless
otherwise stated.</p>

<fig id="fig03" fig-type="figure" position="float">
					<label>Figure 3.</label>
					<caption>
						<p>Average DTW varying with the specified maximum fixation
number.</p>
					</caption>
					<graphic id="graph03" xlink:href="jemr-11-06-e-figure-03.png"/>
				</fig>

<p>Due to the high degree of viewing freedom, it is hard to define
ground truth representative scanpaths. The only way to evaluate the
rationality of the obtained scanpath is to compare it against each
individual scanpath with the standard string-edit algorithm as suggested
by Eraslan et al. (<xref ref-type="bibr" rid="b13 b14">13, 14</xref>). More sophisticated methods to compare
scanpaths like ScanMatch (<xref ref-type="bibr" rid="b27">27</xref>), MultiMatch (<xref ref-type="bibr" rid="b36">36</xref>), and ScanGraph (<xref ref-type="bibr" rid="b37">37</xref>) are
also developed, facilitating the evaluation.</p>

<p>In our experiment, the evaluation of representative scanpaths is
conducted at three different levels:</p>
<list list-type="bullet">
  <list-item>
    <p>Scanpath length: scanpath length reflects the frequency of
    attention shift, so we compare the length distribution to check
    whether representative scanpaths can reflect this property;</p>
  </list-item>
  <list-item>
    <p>Scanpath shape: scanpath shape, partly influenced by scanpath
    length, is related to both spatial distribution and temporal order,
    which is measured by DTW in our experiment;</p>
  </list-item>
  <list-item>
    <p>Overall scanpath similarity: overall scanpath similarity
    comprehensively considers scanpath shape and gaze duration.
    ScanMatch and MultiMatch can provide such comparison.</p>
  </list-item>
</list>
    </sec>

    <sec id="S4d">
      <title>Results</title>

<p><bold>Analysis of Scanpath Length</bold>. Scanpath length reflects
the frequency of attention shift. Figure 4 and Figure 5 analyze the
length of representative scanpaths for both OSIE and MIT1003 datasets.
From Figure 4 (a) and Figure 5 (a), we can find that length
distributions of individual scanpaths are similar to normal
distribution, which indicates that for only a small number of images,
people concentrate on certain areas (hardly shift) or roam over the
whole image (frequently shift) while for most images the shift frequency
is relatively stable, neither too large nor too small. Thus the
bell-shaped property should also be reflected by representative
scanpaths. Considering that all the representative scanpaths are AOI
based while individual scanpaths are fixations based, the absolute
values of scanpath length may be different but the bell-shaped property
of scanpath length distribution should be kept. However, eMine, STA and
SPAM fail to retain this property and obtain right-tailed distributions.
All of them are more likely to get shorter representative scanpaths,
which reflect the pattern that for most images, subjects tend to
concentrate on certain areas and hardly shift their attention. IOC, CDBA
and the heuristic method can keep the bell-shaped distributions.</p>

<fig id="fig04" fig-type="figure" position="float">
					<label>Figure 4.</label>
					<caption>
						<p>Length distribution of individual scanpaths and aggregated
scanpaths for OSIE data set.</p>
					</caption>
					<graphic id="graph04" xlink:href="jemr-11-06-e-figure-04.png"/>
				</fig>

<p><bold>Analysis of Scanpath Shape</bold>. In this part, we evaluate
the ability of representative scanpaths to reflect attention
distribution and attention shift, that is, the shape of representative
scanpaths. We measure this ability by computing the average distance
(DTW) between the representative scanpath and all the actually recorded
scanpaths as suggested by Le Meur et al. (<xref ref-type="bibr" rid="b24">24</xref>). Quantitative results are
shown in Table 1. A smaller DTW means a better result. The average DTW
between representative scanpaths obtained by the heuristic method and
all the recorded scanpaths is the smallest. In other words, the
heuristic method produces the best solutions for Equation (1), followed
by CDBA and IOC. The results of statistical analysis are presented in
Table 2, which shows there is a significant difference between the
results of our proposed “barycenter” based methods (CDBA and heuristic)
and other methods.</p>

<fig id="fig05" fig-type="figure" position="float">
					<label>Figure 5.</label>
					<caption>
						<p>Length distribution of individual scanpaths and aggregated
scanpaths for MIT1003 dataset.</p>
					</caption>
					<graphic id="graph05" xlink:href="jemr-11-06-e-figure-05.png"/>
				</fig>

<p><bold>Analysis of Overall Scanpath Similarity</bold>. In this part,
we estimate and assign gaze duration to scanpaths obtained in the
aggregation step. Note that none of the existing algorithms except for
STA have discussed representative scanpaths with gaze duration. Even
though STA employs the duration information when identifying trending
elements, it still focuses on the analysis of trending scanpaths and
does not further analyze gaze duration. To make fair comparisons, we
combine our gaze duration analysis method with all the methods proposed
for scanpath aggregation, i.e., eMine, SPAM, STA and IOC. The overall
scanpath similarity is evaluated by MultiMatch (Jarodzka et al., 2010)
and ScanMatch (Cristino et al., 2010). MultiMatch compares scanpaths
from five aspects: vector similarity, direction similarity, length
similarity, position similarity and duration similarity. ScanMatch only
outputs an integrated score reflecting order consistency, spatial
proximity and duration similarity. The parameters involved in ScanMatch
implementation are set as follows: Xbin = 24, Ybin = 18, Threshold =
3.5, GapValue = 0, TempBin = 100 (TempBin =0 when duration is not taken
into account). We compare the representative scanpath with each actually
recorded scanpath using both algorithms and compute the average scores.
Table 3 shows the results on both datasets. The larger the scores, the
better the results. Our proposed methods (CDBA* and Heuristic*) still
outperform eMine, STA and SPAM, but the advantages of our methods over
IOC are not so obvious. Then we further conduct statistical test on the
ScanMatch results (with duration). The difference between the proposed
methods and the first three methods, i.e., eMine, STA and SPAM, is
significant on both data sets but this is not the case with IOC. It can
be seen that although the heuristic method can get a smaller average
distance in terms of DTW, scores of MultiMatch and ScanMatch are neck
and neck with CDBA and IOC on both datasets. This may be caused by the
fact that DTW directly takes Euclidean distance as elements in the cost
matrix while both MultiMatch and ScanMatch conduct scanpath
simplification or quantization before comparison.</p>

<table-wrap id="t01" position="float">
					<label>Table 4.</label>
					<caption>
						<p>Average DTW on two data sets <inline-formula>

<mml:math id="m47"><mml:mo>↓</mml:mo></mml:math></inline-formula></p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">

    <thead>
      <tr>
        <th>Dataset</th>
        <th>eMine</th>
        <th>STA</th>
        <th>SPAM</th>
        <th>IOC</th>
        <th>CDBA</th>
        <th>heuristic</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>OSIE</td>
        <td>1644</td>
        <td>1418</td>
        <td>1050</td>
        <td>921</td>
        <td>899</td>
        <td>891</td>
      </tr>
      <tr>
        <td>MIT1003</td>
        <td>1319</td>
        <td>1467</td>
        <td>1007</td>
        <td>910</td>
        <td>882</td>
        <td>876</td>
      </tr>
    </tbody>
  </table>
</table-wrap>

<table-wrap id="t02" position="float">
					<label>Table 2.</label>
					<caption>
						<p>The Statistical Test Results of DTW. NA: Not applicable
because df is not related to the Wilcoxon Test. N: the number of images
for which both comparison algorithms can find the representative
scanpaths.; ***: p&#x3C;0.0001.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">

    <thead>
      <tr>
        <th>Dataset</th>
        <th>Algrotihm</th>
        <th>Test</th>
        <th>N</th>
        <th>df</th>
        <th>T or Z value</th>
        <th>Effect Size</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>OSIE</td>
        <td>CDBA-eMine</td>
        <td>Wilcoxon</td>
        <td>531</td>
        <td>NA</td>
        <td>-19.9092***</td>
        <td>-1.3536</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-STA</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-22.8043***</td>
        <td>-1.1528</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-SPAM</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-21.7021***</td>
        <td>-0.5653</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-IOC</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-14.4308***</td>
        <td>-0.1025</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-eMine</td>
        <td>Wilcoxon</td>
        <td>531</td>
        <td>NA</td>
        <td>-19.3612***</td>
        <td>-1.3612</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-SPAM</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-21.8317***</td>
        <td>-0.5999</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-STA</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-22.8062***</td>
        <td>-1.1689</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-IOC</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-18.2585***</td>
        <td>-0.1443</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-CDBA</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-13.6244***</td>
        <td>-0.0420</td>
      </tr>
      <tr>
        <td>MIT1003</td>
        <td>CDBA-eMine</td>
        <td>Wilcoxon</td>
        <td>484</td>
        <td>NA</td>
        <td>-18.6447***</td>
        <td>-0.9658</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-STA</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-27.0153***</td>
        <td>-1.0299</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-SPAM</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-25.1192***</td>
        <td>-0.4019</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-IOC</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-19.7691***</td>
        <td>-0.1033</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-eMine</td>
        <td>Wilcoxon</td>
        <td>484</td>
        <td>NA</td>
        <td>-18.6447***</td>
        <td>-0.9761</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-STA</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-27.1454***</td>
        <td>-1.0398</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-SPAM</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-25.3895***</td>
        <td>-0.4236</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-IOC</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-22.3411***</td>
        <td>-0.1273</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-CDBA</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-15.3338***</td>
        <td>-0.0243</td>
      </tr>
    </tbody>
  </table>
</table-wrap>

<table-wrap id="t03" position="float">
					<label>Table 3.</label>
					<caption>
						<p>Evaluating the representative scanpath by MultiMatch and
ScanMatch <inline-formula>

<mml:math id="m48"><mml:mo>↑</mml:mo></mml:math></inline-formula>
(* means the aggregation algorithm combined with the proposed duration
analysis method)</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">

    <thead>
      <tr>
        <th>Dataset</th>
        <th>Algorithm</th>
        <th></th>        
        <th colspan="3">MultiMatch</th>
        <th></th>        
        <th colspan="2">ScanMatch</th>
      </tr>

      <tr>
        <td></td>
        <td></td>
        <td>vector</td>
        <td>direction</td>
        <td>length</td>
        <td>position</td>
        <td>duration</td>
        <td>without duration</td>
        <td>with duration</td>
      </tr>
    </thead>
    <tbody>      
      <tr>
        <td>OSIE</td>
        <td>eMine*</td>
        <td>0.181</td>
        <td>0.123</td>
        <td>0.191</td>
        <td>0.176</td>
        <td>0.130</td>
        <td>0.120</td>
        <td>0.219</td>
      </tr>
      <tr>
        <td></td>
        <td>STA*</td>
        <td>0.567</td>
        <td>0.402</td>
        <td>0.604</td>
        <td>0.550</td>
        <td>0.409</td>
        <td>0.199</td>
        <td>0.311</td>
      </tr>
      <tr>
        <td></td>
        <td>SPAM*</td>
        <td>0.853</td>
        <td>0.655</td>
        <td>0.891</td>
        <td>0.837</td>
        <td>0.602</td>
        <td>0.244</td>
        <td>0.386</td>
      </tr>
      <tr>
        <td></td>
        <td>IOC*</td>
        <td>0.881</td>
        <td>0.744</td>
        <td><bold>0.906</bold></td>
        <td>0.871</td>
        <td>0.613</td>
        <td>0.348</td>
        <td>0.474</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA*</td>
        <td><bold>0.882</bold></td>
        <td><bold>0.749</bold></td>
        <td>0.905</td>
        <td><bold>0.875</bold></td>
        <td><bold>0.614</bold></td>
        <td><bold>0.351</bold></td>
        <td><bold>0.476</bold></td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic*</td>
        <td><bold>0.882</bold></td>
        <td><bold>0.749</bold></td>
        <td>0.905</td>
        <td>0.874</td>
        <td><bold>0.614</bold></td>
        <td>0.344</td>
        <td>0.474</td>
      </tr>
      <tr>
        <td>MIT1003</td>
        <td>eMine*</td>
        <td>0.083</td>
        <td>0.058</td>
        <td>0.088</td>
        <td>0.080</td>
        <td>0.060</td>
        <td>0.149</td>
        <td>0.224</td>
      </tr>
      <tr>
        <td></td>
        <td>STA*</td>
        <td>0.524</td>
        <td>0.399</td>
        <td>0.542</td>
        <td>0.503</td>
        <td>0.398</td>
        <td>0.251</td>
        <td>0.275</td>
      </tr>
      <tr>
        <td></td>
        <td>SPAM*</td>
        <td>0.734</td>
        <td>0.539</td>
        <td>0.754</td>
        <td>0.711</td>
        <td>0.555</td>
        <td>0.254</td>
        <td>0.324</td>
      </tr>
      <tr>
        <td></td>
        <td>IOC*</td>
        <td>0.842</td>
        <td>0.696</td>
        <td>0.849</td>
        <td>0.819</td>
        <td>0.620</td>
        <td><bold>0.355</bold></td>
        <td><bold>0.419</bold></td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA*</td>
        <td><bold>0.843</bold></td>
        <td>0.701</td>
        <td>0.849</td>
        <td>0.820</td>
        <td>0.617</td>
        <td>0.352</td>
        <td>0.416</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic*</td>
        <td><bold>0.843</bold></td>
        <td><bold>0.702</bold></td>
        <td><bold>0.850</bold></td>
        <td><bold>0.821</bold></td>
        <td><bold>0.620</bold></td>
        <td>0.349</td>
        <td>0.415</td>
      </tr>
    </tbody>
  </table>
</table-wrap>

<table-wrap id="t04" position="float">
					<label>Table 4.</label>
					<caption>
						<p>The statistical test results of ScanMatch scores (with
duration). NA: Not applicable because df is not related to the Wilcoxon
Test. N: the number of images for which both comparison algorithms can
find the representative scanpaths. *: p&#x3C;0.05; ***: p&#x3C;0.0001.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">

    <thead>
      <tr>
        <th>Dataset</th>
        <th>Algrotihm</th>
        <th>Test</th>
        <th>N</th>
        <th>df</th>
        <th>T or Z value</th>
        <th>Effect Size</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>OSIE</td>
        <td>CDBA-eMine</td>
        <td>Paired t-test</td>
        <td>531</td>
        <td>530</td>
        <td>54.1696***</td>
        <td>1.6125</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-STA</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>22.6372***</td>
        <td>1.2585</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-SPAM</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>20.9515***</td>
        <td>0.8052</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-IOC</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>1.7021</td>
        <td>0.0279</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-eMine</td>
        <td>Wilcoxon</td>
        <td>531</td>
        <td>NA</td>
        <td>19.8712***</td>
        <td>1.6068</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-STA</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>22.5733***</td>
        <td>1.2461</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-SPAM</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>20.9413***</td>
        <td>0.7875</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-IOC</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>0.1103</td>
        <td>0.0057</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-CDBA</td>
        <td>Wilcoxon</td>
        <td>700</td>
        <td>NA</td>
        <td>-1.8394</td>
        <td>-0.2222</td>
      </tr>
      <tr>
        <td>MIT1003</td>
        <td>CDBA-eMine</td>
        <td>Wilcoxon</td>
        <td>484</td>
        <td>NA</td>
        <td>18.6006***</td>
        <td>1.3676</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-STA</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>26.6705***</td>
        <td>1.0802</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-SPAM</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>24.0124***</td>
        <td>0.7038</td>
      </tr>
      <tr>
        <td></td>
        <td>CDBA-IOC</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-2.0228*</td>
        <td>-0.0253</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-eMine</td>
        <td>Wilcoxon</td>
        <td>484</td>
        <td>NA</td>
        <td>18.6003***</td>
        <td>1.3736</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-STA</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>26.5807***</td>
        <td>1.0766</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-SPAM</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>23.9828***</td>
        <td>0.6980</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-IOC</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-2.4115*</td>
        <td>-0.0353</td>
      </tr>
      <tr>
        <td></td>
        <td>Heuristic-CDBA</td>
        <td>Wilcoxon</td>
        <td>1003</td>
        <td>NA</td>
        <td>-1.1309</td>
        <td>-0.0099</td>
      </tr>
    </tbody>
  </table>
</table-wrap>

    </sec>

    <sec id="S4e">
      <title>Summary</title>

<p>In our experiment, we can regard the adaptation of IOC as
constructing a candidate set that contains AOI-level scanpaths
transformed from individual fixation-level scanpaths. In other words,
IOC actually finds an optimal solution of Equation (1) under stricter
constraints. In addition, CDBA and the heuristic method are also based
on Equation (1), and the outputs of CDBA can actually be regarded as
approximations of the heuristic results. Compared with the heuristic
method, IOC chooses from a smaller candidate set while CDBA searches the
set in a more efficient way, but these three algorithms share a similar
idea, choosing a scanpath from a candidate scanpath set as the
representative. In this sense, all the algorithms we discussed above can
be categorized as follows: (1) “barycenter” based: IOC, CDBA, heuristic;
(2) subsequence based: eMine, SPAM; (3) others: STA.</p>

<p>When evaluated by scanpath length, the “barycenter” based method can
well keep the bell shaped distribution of scanpath length. The
comparison by DTW also indicates that all the “barycenter” based methods
can produce representative scanpaths similar to actually recorded
individual scanpaths in scanpath shape. As for overall scanpath
similarity, the “barycenter” based methods improve the performance by a
large margin over others, which consolidates that “barycenter” based
aggregated scanpaths are more suitable to be combined with gaze duration
to get final representative scanpaths. In summary, representative
scanpaths obtained by “barycenter” based methods can better describe
viewing patterns.</p>
    </sec>

    <sec id="S4f">
      <title>Interpretation of Representative Scanpaths</title>

<p>Figures 6 shows the aggregated scanpaths obtained by different
algorithms. In Figure 6, red circles represent AOIs. Yellow arrows
indicate the direction and numbers indicate the order. Images 1009 and
1033 respectively contain only one conspicuous foreground object and
three objects without many distractors in the background while image
1263 and image 1270 both contain multiple objects with complex
background. eMine, STA and SPAM obviously produce shorter scanpaths that
may not be able reflect complete viewing patterns. In particular, eMine
only identifies one common AOI in all the individual scanpaths and fails
to provide any information about attention shift for images 1009, 1033
and 1263. The “barycenter” based methods (IOC, CDBA and the heuristic
method) produce identical results for images 1009 and 1263. For image
1009, the representative scanpaths show that attention is first
attracted by the dog head, then transferred to the body and finally go
back to the head. For image 1263, the pattern is that subjects are first
attracted by faces, then linger between faces, next explore objects with
which the female and the male are interacting (the food they are
eating), and finally redirect their attention to human faces. For images
1033 and 1270, representative scanpaths obtained by IOC, CDBA and the
heuristic method are a little different. It is difficult to conclude
which scanpath can better describe the viewing pattern since they
actually contain some common segments. Take image 1270 for example, all
the three representative scanpaths start by an AOI located near image
center, which is consistent with the well-known center bias. The main
difference between obtained patterns lies in the priority of the AOI on
the zip-top can and the AOI on the computer screen. The heuristic method
and CDBA prioritizes the AOI on the zip-top can while IOC is on the
contrary. Note that there are some letters on the can. Considering text
is a top-down factor capable of guiding visual attention (<xref ref-type="bibr" rid="b38">38</xref>), the
pattern obtained by the heuristic method and the CDBA algorithm may be
more reasonable. In addition, although we do not have any so-called
ground truth viewing pattern, the identified patterns seem to be
congruent with human intuition and some verified findings such as center
bias, top-down effect, etc, whether there are one or several foreground
objects, simple or complex backgrounds. However, in some cases where the
priorities of different visual stimuli are not clear (e.g., image 1033),
the identified patterns can only provide limitedly useful knowledge.</p>

<fig id="fig06" fig-type="figure" position="float">
					<label>Figure 6.</label>
					<caption>
						<p>Aggregated scanpaths for four different images from OSIE
data set. From top to bottom: individual scanpaths, eMine, STA, SPAM,
IOC, CDBA, heuristic.</p>
					</caption>
					<graphic id="graph06" xlink:href="jemr-11-06-e-figure-06.png"/>
				</fig>

<p>Figure 7 visualizes obtained representative scanpaths obtained by our
proposed methods (CDBA and huristic) with duration pattern for image
i1182314083 from MIT1003 data set (<xref ref-type="bibr" rid="b35">35</xref>). The radius of red circles is
proportional to the total gaze duration on the corresponding AOI. Figure
8 shows the duration patterns of individual scanpaths. It is can be seen
that the duration pattern of the representative scanpath is visually
consistent with the duration pattern of individual scanpaths and can
reflect the group trend from an overall perspective.</p>

<fig id="fig07" fig-type="figure" position="float">
					<label>Figure 7.</label>
					<caption>
						<p>Representative scanpaths with gaze duration of image
i1182314083 from MIT data set.</p>
					</caption>
					<graphic id="graph07" xlink:href="jemr-11-06-e-figure-07.png"/>
				</fig>
    </sec>
    </sec>

    <sec id="S5">
      <title>Discussion</title>

<p>In this article, we extend our previous framework to identify
representative scanpaths from multiple individual scanpaths for natural
images. Different from most existing work, we also analyze the duration
pattern. The proposed framework consists of three steps: eye-gaze data
preprocessing, scanpath aggregation and gaze duration analysis.
Experiments demonstrate that our proposed framework is able to identify
representative scanpaths reflecting group viewing patterns on natural
images.</p>

<p>Based on the algorithms for scanpath aggregation, we further
categorize representative scanpaths as follows: (1) “barycenter” based;
(2) subsequence based; (3) others. Some algorithms are specially
designed to identify viewing patterns on a specific kind of visual
stimuli so their performances are not so satisfactory when visual
stimuli are changed. For natural images, we find that “barycenter” based
representative scanpaths are the closest to individual scanpaths. Such
representative scanpaths for natural images are useful in various
fields. For example, computer vision researchers attempt to build
plausible saccadic models to predict human scanpaths and they need a
reliable ground truth scanpath against which predicted scanpaths can be
compared. In addition, it is much easier for us to visualize and analyze
one representative scanpath than multiple individual scanpaths that are
largely overlapped, which makes it possible to validate some assumptions
about visual attention and eye movements such as center-bias and
top-down bias. The representative scanpath with duration pattern can
also give us a hint about what first grabs visual attention and what
holds attention for a long period, providing knowledge about what kinds
of images are obvious visual attractors.</p>

<p>However, there are some limitations of our work. For example, the eye
tracking data set only involves 15 participants, which means there are
at most 15 scanpaths for each image. So it is necessary to construct a
much larger data set with more participants. The size of the data set
can arouse some challenges for the proposed algorithm, like how to
efficiently determine the initial reference scanpath for CDBA and how to
reduce the space and time cost of the heuristic method. In addition, we
use a data-driven approach to obtain AOIs but it could be better to
associate AOIs with semantically meaningful objects. The incorporation
of semantic segmentation in the preprocessing step needs further
investigation.</p>

<fig id="fig08" fig-type="figure" position="float">
					<label>Figure 8.</label>
					<caption>
						<p>Gaze duration of individual scanpaths of image i1182314083
from MIT data set.</p>
					</caption>
					<graphic id="graph08" xlink:href="jemr-11-06-e-figure-08.png"/>
				</fig>

    </sec>

    <sec id="S6">
      <title>Conclusions</title>


<p>Eye tracking data provide insights into how humans perceive and
explore their surroundings. Traditional methods to analyze scanpaths
target a specific kind of viewing stimuli such as web pages and neglect
the duration pattern, so the scanpaths obtained by such methods are not
able to reflect the viewing pattern on natural images correctly or
comprehensively. In this paper, we extend our previous framework to
identify representative scanpaths, considering temporal order, spatial
distribution and gaze duration. The framework consists of three steps:
eye-gaze data preprocessing, scanpath aggregation and gaze duration
analysis. The second step is the key to representative scanpaths
identification and can be replaced by traditional methods such as eMine.
Based on the algorithms chosen, we further categorize the obtained
representative scanpaths as subsequence based, “barycenter” based and
others. Experiments demonstrate that our framework can well serve the
purpose of generalizing viewing patterns and the “barycenter” based
representative scanpaths can better describe the patterns.</p>
    </sec>

    <sec id="S7">
      <title>Ethics and Conflict of Interest</title>

<p>The author(s) declare(s) that the contents of the article are in
agreement with the ethics described in
<ext-link ext-link-type="uri" xlink:href="http://biblio.unibe.ch/portale/elibrary/BOP/jemr/ethics.html">http://biblio.unibe.ch/portale/elibrary/BOP/jemr/ethics.html</ext-link>
and that there is no conflict of interest regarding the publication of
this paper.</p>
    </sec>

    <sec id="S8">
      <title>Acknowledgements</title>

<p>This work is supported in part by National Natural Science Foundation
of China under Grant 61771348 and 61471273, and Wuhan Morning Light Plan
of Youth Science and Technology under Grant 2017050304010302.</p>
    </sec>
</body>
<back>
<ref-list>
<ref id="b30"><mixed-citation publication-type="book" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Berndt</surname> <given-names>DJ</given-names></name>, <name><surname>Clifford</surname> <given-names>J</given-names></name></person-group> (<year>1994</year>). <article-title>Using dynamic time warping to find patterns in time series.</article-title><source>In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining Workshop.</source></mixed-citation></ref>
<ref id="b9"><mixed-citation publication-type="unknown" specific-use="linked"><person-group person-group-type="author"><name><surname>Burmester</surname> <given-names>M</given-names></name>, <name><surname>Mast</surname> <given-names>M.</given-names></name></person-group> <article-title>Repeated Web Page Visits and the Scanpath Theory: A Recurrent Pattern De-tection Approach.</article-title> Journal of Eye Movement Re-search. <year>2010</year>; 3(4):5, 1-20. http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.16910/jemr.3.4.5</pub-id></mixed-citation></ref>
<ref id="b29"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Chen</surname>, <given-names>X.</given-names></name>, &#x26; <name><surname>Chen</surname>, <given-names>Z.</given-names></name></person-group> (<year>2017</year>). <article-title>Exploring visual attention using random walks based eye tracking protocols.</article-title> <source>Journal of Visual Communication and Image Representation</source>, <volume>45</volume>, <fpage>147</fpage>&#8211;<lpage>155</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1016/j.jvcir.2017.02.005</pub-id><issn>1047-3203</issn></mixed-citation></ref>
<ref id="b27"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Cristino</surname>, <given-names>F.</given-names></name>, <name><surname>Math&#244;t</surname>, <given-names>S.</given-names></name>, <name><surname>Theeuwes</surname>, <given-names>J.</given-names></name>, &#x26; <name><surname>Gilchrist</surname>, <given-names>I. D.</given-names></name></person-group> (<year>2010</year>). <article-title>ScanMatch: A novel method for comparing fixation sequences.</article-title> <source>Behavior Research Methods</source>, <volume>42</volume>(<issue>3</issue>), <fpage>692</fpage>&#8211;<lpage>700</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.3758/BRM.42.3.692</pub-id><pub-id pub-id-type="pmid">20805591</pub-id><issn>1554-351X</issn></mixed-citation></ref>
<ref id="b37"><mixed-citation publication-type="unknown" specific-use="linked"><person-group person-group-type="author"><name><surname>Dolezalova</surname> <given-names>J</given-names></name>, <name><surname>Popelka</surname> <given-names>S.</given-names></name></person-group> <article-title>ScanGraph: A Novel Scanpath Comparison Method Using Visualization of Graph Cliques.</article-title> Journal of Eye Movement Re-search. <year>2016</year>; 9(4), 5:1-13.&#160;&#160;&#160;http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.16910/jemr.9.4.5</pub-id></mixed-citation></ref>
<ref id="b21"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Engelke</surname>, <given-names>U.</given-names></name>, <name><surname>Liu</surname>, <given-names>H.</given-names></name>, <name><surname>Wang</surname>, <given-names>J.</given-names></name>, <name><surname>Le Callet</surname>, <given-names>P.</given-names></name>, <name><surname>Heynderickx</surname>, <given-names>I.</given-names></name>, <name><surname>Zepernick</surname>, <given-names>H. J.</given-names></name>, &#x26; <name><surname>Maeder</surname>, <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Comparative study of fixation density maps.</article-title> <source>IEEE Transactions on Image Processing</source>, <volume>22</volume>(<issue>3</issue>), <fpage>1121</fpage>&#8211;<lpage>1133</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1109/TIP.2012.2227767</pub-id><pub-id pub-id-type="pmid">23193452</pub-id><issn>1057-7149</issn></mixed-citation></ref>
<ref id="b12"><mixed-citation publication-type="unknown" specific-use="linked"><person-group person-group-type="author"><name><surname>Eraslan</surname> <given-names>S</given-names></name>, <name><surname>Yesilada</surname> <given-names>Y</given-names></name>, <name><surname>Harper</surname> <given-names>S.</given-names></name></person-group> Eye Tracking Scanpath Analysis Techniques on Web Pages: A Survey, Evaluation and Comparison. Journal of Eye Movement Research. <year>2016</year>; 9(1):2, 1-19. http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.16910/jemr.9.1.2</pub-id></mixed-citation></ref>
<ref id="b15"><mixed-citation publication-type="unknown" specific-use="linked"><person-group person-group-type="author"><name><surname>Eraslan</surname> <given-names>S</given-names></name>, <name><surname>Yesilada</surname> <given-names>Y</given-names></name>, <name><surname>Harper</surname> <given-names>S.</given-names></name></person-group> Less Users More Confidence: How AOIs don&#8217;t Affect Scanpath Trend Analysis. Journal of Eye Movement Research. <year>2017</year>; 10(4):6, 1-18.&#160;&#160;http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.16910/jemr.10.4.6</pub-id></mixed-citation></ref>
<ref id="b13"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Eraslan</surname> <given-names>S</given-names></name>, <name><surname>Yesilada</surname> <given-names>Y</given-names></name>, <name><surname>Harper</surname> <given-names>S.</given-names></name></person-group> Scanpath Trend Analysis on Web Pages: Clustering Eye Tracking Scanpaths. ACM Transactions on the Web. <year>2016</year>a; 10(4), 20:1-20:35.     <ext-link ext-link-type="uri" xlink:href="http://doi.acm.org/10.1145/2970818">http://doi.acm.org/10.1145/2970818</ext-link></mixed-citation></ref>
<ref id="b14"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Eraslan</surname> <given-names>S</given-names></name>, <name><surname>Yesilada</surname> <given-names>Y</given-names></name>, <name><surname>Harper</surname> <given-names>S</given-names></name></person-group>. Trends in eye tracking scanpaths: Segmentation effect? HT 2016: Proceedings of the 27th ACM Conference on Hypertext and Social Media. <year>2016</year>b. p. 15-25. <ext-link ext-link-type="uri" xlink:href="http://doi.acm.org/10.1145/2914586.2914591">http://doi.acm.org/10.1145/2914586.2914591</ext-link></mixed-citation></ref>
<ref id="b11"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><name><surname>Eraslan</surname> <given-names>S</given-names></name>, <name><surname>Yesilada</surname> <given-names>Y</given-names></name>, <name><surname>Harper</surname> <given-names>S</given-names></name></person-group>. <article-title>Identifying pat-terns in eyetracking scanpaths in terms of visual elements of web pages.</article-title> <source>ICWE 2014: Proceedings of the 14th International Conference on Web Engineering</source>.<year>2014</year>. p. <fpage>163</fpage>-<lpage>180</lpage>.</mixed-citation></ref>
<ref id="b16"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Eraslan</surname> <given-names>S</given-names></name>, <name><surname>Yesilada</surname> <given-names>Y</given-names></name>, <name><surname>Harper</surname> <given-names>S</given-names></name></person-group>. (<year>2017</year>b) Engineer-ing Web-based Interactive Systems: Trend Analysis in Eye Tracking Scanpaths with a Tolerance. EICS 2017: Proceedings of the 9th ACM SIGCHI Sympo-sium on Engineering Interactive Computing Sys-tems. 2017.http:// <ext-link ext-link-type="uri" xlink:href="doi.acm.org/10.1145/3102113.3102116">doi.acm.org/10.1145/3102113.3102116</ext-link></mixed-citation></ref>
<ref id="b17"><mixed-citation publication-type="web-page" specific-use="linked"><person-group person-group-type="author"><name><surname>Goldberg</surname> <given-names>JH</given-names></name>, <name><surname>Helfman</surname> <given-names>JI</given-names></name></person-group>. <article-title>Scanpath clustering and aggregation.</article-title> ETRA 2010: Proceedings of the 2010 Symposium on Eye Tracking Research &#x26; Applica-tions. <year>2010</year>. p. 227-234.http://<ext-link ext-link-type="uri" xlink:href="doi.acm.org/10.1145/1743666.1743721">doi.acm.org/10.1145/1743666.1743721</ext-link> <pub-id pub-id-type="doi" specific-use="author">10.1145/1743666.1743721</pub-id></mixed-citation></ref>
<ref id="b20"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Hejmady</surname> <given-names>P</given-names></name>, <name><surname>Narayanan</surname> <given-names>NH</given-names></name></person-group>. <article-title>Visual attention pat-terns during program debugging with an ide.</article-title> ETRA 2012: Proceedings of the 2012 Symposium on Eye Tracking Research &#x26; Applications. <year>2012</year>. p. 197-200. <ext-link ext-link-type="uri" xlink:href="http://doi.acm.org/10.1145/2168556.2168592">http://doi.acm.org/10.1145/2168556.2168592</ext-link></mixed-citation></ref>
<ref id="b18"><mixed-citation publication-type="web-page" specific-use="linked"><person-group person-group-type="author"><name><surname>Hembrooke</surname> <given-names>H</given-names></name>, <name><surname>Feusner</surname> <given-names>M</given-names></name>, <name><surname>Gay</surname> <given-names>G</given-names></name></person-group>. <article-title>Averaging scan patterns and what they can tell us.</article-title> ETRA 2006: Pro-ceedings of the 2006 Symposium on Eye Tracking Research &#x26; Applications. <year>2006</year>. p. 41-41. <ext-link ext-link-type="uri" xlink:href="http://doi.acm.org/10.1145/1117309.1117325">http://doi.acm.org/10.1145/1117309.1117325</ext-link> <pub-id pub-id-type="doi" specific-use="author">10.1145/1117309.1117325</pub-id></mixed-citation></ref>
<ref id="b36"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Jarodzka</surname> <given-names>H</given-names></name>, <name><surname>Holmqvist</surname> <given-names>K</given-names></name>, <name><surname>Nyström</surname> <given-names>M.</given-names></name></person-group> <article-title>A vector-based, multidimensional scanpath similarity meas-ure.</article-title> ETRA 2010: Proceedings of the 2010 Sympo-sium on Eye-Tracking Research &#x26; Applications. <year>2010</year>. p. 211-218.http://<ext-link ext-link-type="uri" xlink:href="doi.acm.org/10.1145/1743666.1743718">doi.acm.org/10.1145/1743666.1743718</ext-link></mixed-citation></ref>
<ref id="b22"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Jiang</surname>, <given-names>M.</given-names></name>, <name><surname>Boix</surname>, <given-names>X.</given-names></name>, <name><surname>Roig</surname>, <given-names>G.</given-names></name>, <name><surname>Xu</surname>, <given-names>J.</given-names></name>, <name><surname>Van Gool</surname>, <given-names>L.</given-names></name>, &#x26; <name><surname>Zhao</surname>, <given-names>Q.</given-names></name></person-group> (<year>2016</year>). <article-title>Learning to predict sequences of human visual fixations.</article-title> <source>IEEE Transactions on Neural Networks and Learning Systems</source>, <volume>27</volume>(<issue>6</issue>), <fpage>1241</fpage>&#8211;<lpage>1252</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1109/TNNLS.2015.2496306</pub-id><pub-id pub-id-type="pmid">26761903</pub-id><issn>2162-237X</issn></mixed-citation></ref>
<ref id="b35"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Judd</surname> <given-names>T</given-names></name>, <name><surname>Ehinger</surname> <given-names>K</given-names></name>, <name><surname>Durand</surname> <given-names>F</given-names></name>, <name><surname>Torralba</surname> <given-names>A</given-names></name></person-group>. <article-title>Learning to predict where humans look.</article-title> ICCV <year>2009</year>: <source>Proceedings of 2009 IEEE 12th International Conference on Computer Vision</source>. <conf-date>2009</conf-date>.p. <fpage>2106</fpage>-<lpage>2113</lpage>.http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.1109/ICCV.2009.5459462</pub-id></mixed-citation></ref>
<ref id="b1"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Just</surname>, <given-names>M. A.</given-names></name>, &#x26; <name><surname>Carpenter</surname>, <given-names>P. A.</given-names></name></person-group> (<year>1980</year>). <article-title>A theory of reading: From eye fixations to comprehension.</article-title> <source>Psychological Review</source>, <volume>87</volume>(<issue>4</issue>), <fpage>329</fpage>&#8211;<lpage>354</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1037/0033-295X.87.4.329</pub-id><pub-id pub-id-type="pmid">7413885</pub-id><issn>0033-295X</issn></mixed-citation></ref>
<ref id="b26"><mixed-citation publication-type="book" specific-use="restruct"><person-group person-group-type="author"><name><surname>Kanizsa</surname>, <given-names>G.</given-names></name></person-group> (<year>1979</year>). <source>Organization in vision: Essays on Gestalt perception</source>. <publisher-name>Praeger Publishers</publisher-name>.</mixed-citation></ref>
<ref id="b24"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Le Meur</surname>, <given-names>O.</given-names></name>, &#x26; <name><surname>Liu</surname>, <given-names>Z.</given-names></name></person-group> (<year>2015</year>). <article-title>Saccadic model of eye movements for free-viewing condition.</article-title> <source>Vision Research</source>, <volume>116</volume>(<supplement>Pt B</supplement>), <fpage>152</fpage>&#8211;<lpage>164</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1016/j.visres.2014.12.026</pub-id><pub-id pub-id-type="pmid">25724662</pub-id><issn>0042-6989</issn></mixed-citation></ref>
<ref id="b32"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Levenshtein</surname>, <given-names>V. I.</given-names></name></person-group> (<year>1965</year>). <article-title>Binary codes capable of correcting deletions, insertions, and reversals.</article-title> <source>Soviet Physics, Doklady</source>, <volume>10</volume>, <fpage>707</fpage>&#8211;<lpage>710</lpage>.<issn>0038-5689</issn></mixed-citation></ref>
<ref id="b25"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>A</given-names></name>, <name><surname>Zhang</surname> <given-names>Y</given-names></name>, <name><surname>Chen</surname> <given-names>Z</given-names></name></person-group>. <article-title>Scanpath mining of eye movement trajectories for visual attention analysis.</article-title> ICME 2017:Proceedings of 2017 IEEE International Conference on Multimedia and Expo. <year>2017</year>. p. 535&#8211;540. http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.1109/ICME.2017.8019507</pub-id></mixed-citation></ref>
<ref id="b8"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Magnusson</surname>, <given-names>M. S.</given-names></name></person-group> (<year>2000</year>). <article-title>Discovering hidden time patterns in behavior: T-patterns and their detection.</article-title> <source>Behavior Research Methods, Instruments, &#x26; Computers</source>, <volume>32</volume>(<issue>1</issue>), <fpage>93</fpage>&#8211;<lpage>110</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.3758/BF03200792</pub-id><pub-id pub-id-type="pmid">10758668</pub-id><issn>0743-3808</issn></mixed-citation></ref>
<ref id="b10"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>McClung</surname>, <given-names>S. N.</given-names></name>, &#x26; <name><surname>Kang</surname>, <given-names>Z.</given-names></name></person-group> (<year>2016</year>). <article-title>Characterization of visual scanning patterns in air traffic control.</article-title> <source>Computational Intelligence and Neuroscience</source>, <volume>2016</volume>, <fpage>8343842</fpage>. <pub-id pub-id-type="doi" specific-use="author">10.1155/2016/8343842</pub-id><pub-id pub-id-type="pmid">27239190</pub-id><issn>1687-5265</issn></mixed-citation></ref>
<ref id="b3"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><name><surname>Mishra</surname> <given-names>A</given-names></name>, <name><surname>Kanojia</surname> <given-names>D</given-names></name>, <name><surname>Bhattacharyya</surname> <given-names>P</given-names></name></person-group>. (<year>2016</year>). <article-title>Predicting readers sarcasm understandability by modeling gaze behavior.</article-title><source>AAAI 2016: Proceedings of the 30th AAAI Conference on Artificial Intelligence</source>. <conf-date>2016</conf-date>. p. <fpage>3747</fpage>-<lpage>3753</lpage>.</mixed-citation></ref>
<ref id="b5"><mixed-citation publication-type="conference" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Mishra</surname> <given-names>A</given-names></name>, <name><surname>Kanojia</surname> <given-names>D</given-names></name>, <name><surname>Nagar</surname> <given-names>S</given-names></name>, <name><surname>Dey</surname> <given-names>K</given-names></name>, <name><surname>Bhattacha-ryya</surname> <given-names>P.</given-names></name></person-group> Scanpath complexity: Modeling reading ef-fort using gaze information. AAAI 2017: Proceed-ings of the 31st AAAI Conference on Artificial In-telligence. <year>2017</year>. p. 4429-4436.</mixed-citation></ref>
<ref id="b31"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Needleman</surname>, <given-names>S. B.</given-names></name>, &#x26; <name><surname>Wunsch</surname>, <given-names>C. D.</given-names></name></person-group> (<year>1970</year>). <article-title>A general method applicable to the search for similarities in the amino acid sequence of two proteins.</article-title> <source>Journal of Molecular Biology</source>, <volume>48</volume>(<issue>3</issue>), <fpage>443</fpage>&#8211;<lpage>453</lpage>. <pub-id pub-id-type="doi">10.1016/0022-2836(70)90057-4</pub-id><pub-id pub-id-type="pmid">5420325</pub-id><issn>0022-2836</issn></mixed-citation></ref>
<ref id="b6"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Noton</surname>, <given-names>D.</given-names></name>, &#x26; <name><surname>Stark</surname>, <given-names>L.</given-names></name></person-group> (<year>1971</year>). <article-title>Scanpaths in eye movements during pattern perception.</article-title> <source>Science</source>, <volume>171</volume>(<issue>3968</issue>), <fpage>308</fpage>&#8211;<lpage>311</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1126/science.171.3968.308</pub-id><pub-id pub-id-type="pmid">5538847</pub-id><issn>0036-8075</issn></mixed-citation></ref>
<ref id="b33"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Petitjean</surname>, <given-names>F.</given-names></name>, <name><surname>Ketterlin</surname>, <given-names>A.</given-names></name>, &#x26; <name><surname>Gancarski</surname>, <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>A global aver-aging method for dynamic time warping with appli-cations to clustering.</article-title> <source>Pattern Recognition</source>, <volume>44</volume>(<issue>3</issue>), <fpage>678</fpage>&#8211;<lpage>693</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1016/j.patcog.2010.09.013</pub-id><issn>0031-3203</issn></mixed-citation></ref>
<ref id="b38"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Ramanishka</surname> <given-names>V</given-names></name>, <name><surname>Das</surname> <given-names>A</given-names></name>, <name><surname>Zhang</surname> <given-names>J</given-names></name>, <name><surname>Saenko</surname> <given-names>K</given-names></name></person-group>. (<year>2017</year>) <article-title>Top-down visual saliency guided by captions.</article-title> CVPR 2017:Proceedings of 2017 IEEE Internation-al Conference on Computer Vision and Pattern Rec-ognition. 2017. http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.1109/CVPR.2017.334</pub-id></mixed-citation></ref>
<ref id="b2"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><name><surname>Razin</surname> <given-names>Y</given-names></name>, <name><surname>Feigh</surname> <given-names>K</given-names></name></person-group>. <article-title>Learning to predict intent from gaze during robotic hand-eye coordination.</article-title> <source>AAAI 2017: Proceedings of the 31st AAAI Conference on Artificial Intelligence</source>.<year>2017</year>. p. <fpage>4596</fpage>-<lpage>4602</lpage>.</mixed-citation></ref>
<ref id="b28"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Rodriguez</surname>, <given-names>A.</given-names></name>, &#x26; <name><surname>Laio</surname>, <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Machine learning. Clustering by fast search and find of density peaks.</article-title> <source>Science</source>, <volume>344</volume>(<issue>6191</issue>), <fpage>1492</fpage>&#8211;<lpage>1496</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1126/science.1242072</pub-id><pub-id pub-id-type="pmid">24970081</pub-id><issn>0036-8075</issn></mixed-citation></ref>
<ref id="b23"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Sakoe</surname>, <given-names>H.</given-names></name>, &#x26; <name><surname>Chiba</surname>, <given-names>S.</given-names></name></person-group> (<year>1978</year>). <article-title>Dynamic programming algorithm optimization for spoken word recognition.</article-title> <source>IEEE Transactions on Acoustics, Speech, and Signal Processing</source>, <volume>26</volume>(<issue>1</issue>), <fpage>43</fpage>&#8211;<lpage>49</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1109/TASSP.1978.1163055</pub-id><issn>0096-3518</issn></mixed-citation></ref>
<ref id="b19"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><name><surname>West</surname> <given-names>JM</given-names></name></person-group>, Haake. R, Rozanski EP, Karn KS. eyepat-terns: software for identifying patterns and similari-ties across fixation sequences. ETRA 2006: Pro-ceedings of the 2006 Symposium on Eye Tracking Research &#x26; Applications. <year>2006</year>.p. 149-154. <ext-link ext-link-type="uri" xlink:href="http://doi.acm.org/10.1145/1117309.1117360">http://doi.acm.org/10.1145/1117309.1117360</ext-link></mixed-citation></ref>
<ref id="b34"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Xu</surname>, <given-names>J.</given-names></name>, <name><surname>Jiang</surname>, <given-names>M.</given-names></name>, <name><surname>Wang</surname>, <given-names>S.</given-names></name>, <name><surname>Kankanhalli</surname>, <given-names>M. S.</given-names></name>, &#x26; <name><surname>Zhao</surname>, <given-names>Q.</given-names></name></person-group> (<year>2014</year>). <article-title>Predicting human gaze beyond pixels.</article-title> <source>Journal of Vision (Charlottesville, Va.)</source>, <volume>14</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>20</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1167/14.1.28</pub-id><pub-id pub-id-type="pmid">24474825</pub-id><issn>1534-7362</issn></mixed-citation></ref>
<ref id="b7"><mixed-citation publication-type="book" specific-use="restruct"><person-group person-group-type="author"><name><surname>Yarbus</surname>, <given-names>A. L.</given-names></name></person-group> (<year>1967</year>). <source>Eye Movements and Vision</source>. <publisher-name>Plenum Press</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-1-4899-5379-7</pub-id></mixed-citation></ref>
<ref id="b4"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Zhou</surname>, <given-names>L.</given-names></name>, <name><surname>Zhang</surname>, <given-names>Y.</given-names></name>, <name><surname>Wang</surname>, <given-names>Z.</given-names></name>, <name><surname>Rao</surname>, <given-names>L.</given-names></name>, <name><surname>Wang</surname>, <given-names>W.</given-names></name>, <name><surname>Li</surname>, <given-names>S.</given-names></name>, <etal>. . .</etal> <name><surname>Liang</surname>, <given-names>Z.</given-names></name></person-group> (<year>2016</year>). <article-title>A Scanpath Analysis of the Risky Decision-Making Process.</article-title> <source>Journal of Behavioral Decision Making</source>, <volume>29</volume>(<issue>2-3</issue>), <fpage>169</fpage>&#8211;<lpage>182</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1002/bdm.1943</pub-id><issn>0894-3257</issn></mixed-citation></ref>
</ref-list>
</back>
</article>
