<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">

<article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
     <journal-meta>
	<journal-id journal-id-type="publisher-id">Jemr</journal-id>
      <journal-title-group>
        <journal-title>Journal of Eye Movement Research</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1995-8692</issn>
	  <publisher>								
	  <publisher-name>Bern Open Publishing</publisher-name>
	  <publisher-loc>Bern, Switzerland</publisher-loc>
	</publisher>
    </journal-meta>
    <article-meta><article-id pub-id-type="doi">10.16910/jemr.11.1.2 </article-id> 
	  <article-categories>								
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
		</article-categories>
      <title-group>
        <article-title>A Method to Compensate Head Movements for Mobile Eye Tracker Using Invisible Markers</article-title>
      </title-group>
         <contrib-group> 
				<contrib contrib-type="author">
					<name>
						<surname>Osawa</surname>
						<given-names>Rie</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Shirayama</surname>
						<given-names>Susumu</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>
				
        <aff id="aff1">
		<institution>The University of Tokyo</institution>, <country>Japan</country>
        </aff>
		</contrib-group>
     
	  <pub-date date-type="pub" publication-format="electronic"> 
		<day>6</day>  
		<month>1</month>
        <year>2018</year>
      </pub-date>
	  <pub-date date-type="collection" publication-format="electronic"> 
	  <year>2018</year>
	</pub-date>
      <volume>11</volume>
      <issue>1</issue> 
	  <elocation-id>10.16910/jemr.11.1.2</elocation-id>
	
	<permissions> 
	<copyright-year>2018</copyright-year>
	<copyright-holder>Osawa et al.</copyright-holder>
	<license license-type="open-access">
  <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License, 
  (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">
    https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p>
</license>
	</permissions>
      <abstract>
        <p>Although mobile eye-trackers have wide measurement range of gaze, and high flexibility, it is difficult to judge what a subject is actually looking at based only on obtained coordinates, due to the influence of head movement. In this paper, a method to compensate for head movements while seeing the large screen with mobile eye-tracker is proposed, through the use of NIR-LED markers embedded on the screen. The head movements are compensated by performing template matching on the images of view camera to detect the actual eye position on the screen. As a result of the experiment, the detection rate of template matching was 98.6%, the average distance between the actual eye position and the corrected eye position was approximately 16 pixels for the projected image (1920 x 1080).</p>
      </abstract>
      <kwd-group>
        <kwd>eye movements</kwd>
        <kwd>gaze behavior</kwd>
        <kwd>eye tracking</kwd>
        <kwd>head movements</kwd>
        <kwd>template matching</kwd>
        <kwd>invisible marker</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
  
  
  
  
    <sec id="s1">
      <title>Introduction</title>
      <p>Recently, various eye-tracking devices have been
introduced to the market, and eye movement analyses is
being conducted in many domains. The difference in gaze
behavior between novices and experts can be utilized to
develop efficient training methods [
        <xref ref-type="bibr" rid="b17 b8">17, 8</xref>
        ]. Also, the
difference when changing color or arrangements of objects
can also help for product development or marketing [
        <xref ref-type="bibr" rid="b2 b18">2, 18</xref>
        ].</p>
      <p>There are generally two types of eye-gaze
measurement devices, based on the pupil center corneal reflection
method which uses near infrared (NIR) illuminators.
One is a display installation type, where the NIRs are installed
on a PC display to obtain the eye position. The other is a
head mounted type, which obtains the coordinates through
identifying the gaze position on a viewed image or movie.</p>
      <p>In psychological studies, it is common for the subjects&#x2019;
heads to be fixed, in order to obtain accurate
eye-movement measurements. However, in experiments to measure
human gaze behavior realistically, restricting the subjects&#x2019;
head motion is far from the actual conditions, because
humans are known to move their heads, consciously or
unconsciously. Head motion represents one of the major
human physiological behaviors and is essential in daily life
[
        <xref ref-type="bibr" rid="b4">4</xref>
        ], which is why the decision against any motion
restriction was made.</p>


      <p>For eye tracking, a head-mounted type of device is
suitable when considering reality and flexibility. However,
there is one problem specific of such devices: the output
eye position data is affected by head movements. Sun et al.
[<xref ref-type="bibr" rid="b12">12</xref>] mention that it is important to remove noise such as
head movements from the obtained gaze data in order to
detect the degree of concentration of the driver. Therefore,
several methods to detect the exact eye position while
excluding the effect of head movements have been
developed.</p>

      <p>In our study, eye tracking is done utilizing a large
screen with artificial feature points created by NIR-LEDs
which cannot be seen by the naked human eye. Image
processing is performed on the image of the view camera
in which feature points are recorded, thereby
compensating for the head movements. Finally, a method is
proposed for automatic output of the exact part of the large
screen being viewed by the subject.</p>





    </sec>
    <sec id="s2">
      <title>Related Work</title>
      <p>There are four major solutions that have been proposed
to address the issue of matching eye positions in the view
camera with actual eye positions on the screen or to
compensate for head movements from entire eye-tracking data.</p>

 <sec id="s2a">
      <title>Methods based on features in images</title>

   
      <p> Toyama et al. [
        <xref ref-type="bibr" rid="b16">16</xref>
        ] proposed using SIFT (Scale-Invariant
Feature Transform) features. Points that have high contrast
characteristics, or points at the corner, are regarded as key
points with highly noticeable features. These are suitable
for matching because they are not affected by rotation or
scaling. Jensen et al. [
        <xref ref-type="bibr" rid="b7">7</xref>
        ] applied SIFT features to construct
a 3D AOI (Area Of Interest) from eye-tracking data
obtained by a head-mounted eye-tracker. Takemura et al. [
        <xref ref-type="bibr" rid="b14">14</xref>
		] proposed using PTAM (Parallel Tracking and
Mapping) and Chanijani et al. [
        <xref ref-type="bibr" rid="b3">3</xref>
        ] applied LLAH (Locally
Likely Arrangement Hashing) to find feature points.
However, these methods require a sufficient number of feature
points in the image for accuracy, which may or may not be
present depending on the contents of the view camera
image.
      </p> </sec>
      <sec id="s2b">
        <title>Methods with markers</title>
        <p>
          NAC Image Technology [
          <xref ref-type="bibr" rid="b11">11</xref>
          ] offers a method using
AR markers to create artificial feature points, where AR
markers captured in the view camera are matched with
spatial coordinates. Tomi and Rambli [
          <xref ref-type="bibr" rid="b15">15</xref>
          ] also proposed
using an eye-tracker with an AR application in the
calibration of a head-mounted display. Huang and Tan [
          <xref ref-type="bibr" rid="b5">5</xref>
          ] used
circular patterns as markers. However, large markers could
influence eye movements due to their size and appearance.
Kocejko et al. [
          <xref ref-type="bibr" rid="b9">9</xref>
          ] proposed an algorithm to compensate
for head movements with three cameras (to observe the
eye, scene, and head angle) and LED markers. However,
objects of view were limited in the monitor as were the
movement of the subjects.
        </p> </sec>
		 <sec id="s2c">
        <title>A method with infrared data communication</title>
		
        <p>Tobii Technology offered a solution that uses infrared
data communication markers. Eight such markers
(approximately 30 mm3) are required for position detection, where
each marker communicates with the eye-tracking device
and matches the image of the view camera with the
respective spatial coordinates. However, the size of such markers
could have a significant impact on eye movements. Note
that this device is not currently available.</p> </sec>
  <sec id="s2d">
        <title>Methods sensing head movements</title>       
	
        <p>
          Ahlstrom et al. [
          <xref ref-type="bibr" rid="b1">1</xref>
          ] proposed compensating for head
movements using the recorded gaze behaviors in actual
driving scenes with a video camera. However, detection of
head movement is performed manually for each frame,
which is costly. Larsson et al. [
          <xref ref-type="bibr" rid="b10">10</xref>
          ] applied a gyro, an
accelerometer, and a magnetometer. Even though the
accuracy has been improved, the synchronization of
eye-tracking data and other sensors still remains an issue.
        </p>
      </sec>
    </sec>
    <sec id="s3">
      <title>Proposed Methodology</title>
      <p>As mentioned above, methods based on features
require a sufficient number of feature points in the image for
accuracy. Markers could influence eye movements in the
method with AR markers or infrared data communication.
In the method sensing head movements, the
synchronization of eye-tracking data and other sensors still remains an
issue. In our proposed methodology, eye tracking is done
utilizing a large screen with artificial feature points created
by NIR-LEDs which cannot be seen by the naked human
eye. This methodology does not rely on the content of
visual stimuli therefore can be applied even when there are
not sufficient features there. Furthermore, markers created
by NIR-LEDs does not affect eye movements. Image
processing is performed on the image of the view camera in
which feature points are recorded, thereby compensating
for the head movements. Since template matching is
automatically performed using image processing, cost is low
and processing is fast relatively. Finally, a method is
proposed for automatic output of the exact part of the large
screen being viewed by the subject.</p>



  <sec id="s3a">
        <title>Overview of the experimental apparatus</title>  
 <p><xref ref-type="fig" rid="fig01">Figure 1</xref> illustrates the overview of the experimental
apparatus devised to measure the subjects’ gaze behavior
while watching the large screen. <xref ref-type="fig" rid="fig02">Figure </xref> illustrates the
actual experimental environment.</p> 
<fig id="fig01" fig-type="figure" position="float">
					<label>Figure 1</label>
					<caption>
						<p>Overview of the experimental apparatus.</p>
					</caption>
					<graphic id="graph01" xlink:href="jemr-11-01-b-figure-01.png"/>
				</fig>
				
				
<fig id="fig02" fig-type="figure" position="float">
					<label>Figure 2</label>
					<caption>
						<p>Experimental apparatus for eye tracking.</p>
					</caption>
					<graphic id="graph02" xlink:href="jemr-11-01-b-figure-02.png"/>
				</fig>				
 </sec>
    <sec id="s3b">
      <title>Eye-tracking apparatus</title>
	 <p>To record the data of the eye position, we selected
NAC Image Technology’s EMR-9 as the eye-tracking device,
which includes a view camera attached to the subject’s
forehead for video recording. The eye position is indicated
by the x&#x2013;y coordinates in the area recorded by the
view camera (<xref ref-type="fig" rid="fig03">Figure 3</xref>). Even if the eye position is fixed
on a specific item, head movements will cause shifts in the
view camera area and the x&#x2013;y coordinates, leading to difficulties
in identifying the target object, as seen in <xref ref-type="fig" rid="fig04">Figure 4a</xref>,<xref ref-type="fig" rid="fig05">Figure 4b</xref><xref ref-type="fig" rid="fig06">Figure 4c</xref>.</p>


<fig id="fig03" fig-type="figure" position="float">
					<label>Figure 3</label>
					<caption>
						<p>Output image of the view camera.</p>
					</caption>
					<graphic id="graph03" xlink:href="jemr-11-01-b-figure-03.png"/>
				</fig>
<fig id="fig04" fig-type="figure" position="float">
					<label>Figure 4a</label>
					<caption>
						<p>Variations in the view and axis caused by head movements.</p>
					</caption>
					<graphic id="graph04" xlink:href="jemr-11-01-b-figure-04.png"/>
				</fig>	
				<fig id="fig05" fig-type="figure" position="float">
					<label>Figure 4b</label>
					<caption>
						<p>Variations in the view and axis caused by head movements.</p>
					</caption>
					<graphic id="graph05" xlink:href="jemr-11-01-b-figure-05.png"/>
				</fig>	
				<fig id="fig06" fig-type="figure" position="float">
					<label>Figure 4c</label>
					<caption>
						<p>Variations in the view and axis caused by head movements.</p>
					</caption>
					<graphic id="graph06" xlink:href="jemr-11-01-b-figure-06.png"/>
				</fig>				

</sec>
    <sec id="s3c">
      <title>New method using artificial feature points
with infrared LED markers</title>
	 <p>In this paper, a new eye-tracking method is proposed
via the creation of artificial feature points made of
invisible NIR (near-infrared)-LED markers and image
processing. NIR-LEDs are invisible to the human naked eye,
therefore reducing their effect on eye tracking despite their
presence. At the same time, NIR-LEDs are visible through
IR filters, as seen in <xref ref-type="fig" rid="fig07">Figure 5a</xref>and<xref ref-type="fig" rid="fig08">Figure 5b</xref>. In robot technology, it is
popular to use NIR-LEDs to detect locations or to follow
target objects [
            <xref ref-type="bibr" rid="b13">13</xref>
            ]. However, to the authors&#x2019; best
knowledge, there have been no NIR-LED applications
used for eye tracking, which has the potential to enable
eye-movement detection even with head movements.</p>
<fig id="fig07" fig-type="figure" position="float">
					<label>Figure 5a</label>
					<caption>
						<p>IR markers with the naked eye (left) and through the
filter (right).</p>
					</caption>
					<graphic id="graph07" xlink:href="jemr-11-01-b-figure-07.png"/>
				</fig>
				
				
				<fig id="fig08" fig-type="figure" position="float">
					<label>Figure 5b</label>
					<caption>
						<p>IR markers with the naked eye (left) and through the
filter (right).</p>
					</caption>
					<graphic id="graph08" xlink:href="jemr-11-01-b-figure-08.png"/>
				</fig>
		  
		  
		  
		  
		  
		  
		  
		  
		  
		  
		  
          <p>The view camera with IR filters captures the feature points
of NIR-LEDs installed on the projection screen. This
image can be used to verify the eye position relative to the
NIR-LED feature points, which can then be used to
calculate exactly what the subject is looking at on the screen by
image processing. We call these invisible NIR-LED
markers &#x201C;IR markers&#x201D; hereafter.
          </p>
		  
		  
          <p>Image processing is another question that requires
attention. SIFT features could be a potential option.
However, these methods are not adequate for images of IR
markers received through the IR filter, because single
NIR-LED IR markers are homogeneous and less
characteristic, as shown in the image on the right side of Figure 5. As a countermeasure, several patterns composed of
multiple NIR-LEDs have been developed as matching
templates, as described below. The overall flow is described
later. </p>
 </sec>
      <sec id="s3d">
        <title>Patterns of IR markers</title>
        <p>IR Marker patterns have been created taking into account
the four following conditions. </p>
		  
		  
          <p>1. Patterns should have a sufficient number of features.

</p>
		  
		  
          <p>2. Patterns should be composed of the smallest number
of markers possible.</p>
		  
		  
          <p>3. Patterns should be sufficiently differentiable from
one another.</p>
		  
		  
          <p>4. Patterns should be easily produced.</p>
		  
		  
          <p>To decide on the exact patterns, the similarities
between patterns of filtered IR markers (<xref ref-type="fig" rid="fig09">Figure 6</xref>) have been
schematically calculated. Taking condition 2 into
consideration, a three-point pattern was selected from a 5 &#xD7; 5 dot
matrix for each pattern, which was the best balance to
ensure noticeable differentiation. Similarities are calculated
by Hu invariant moment algorithm (Hu, 1962).</p>


<fig id="fig09" fig-type="figure" position="float">
					<label>Figure 6</label>
					<caption>
						<p>Patterns used for the experiment.</p>
					</caption>
					<graphic id="graph09" xlink:href="jemr-11-01-b-figure-09.png"/>
				</fig>
		  
		  
<p>For a two-dimensional continuous function &#x192;(x,y)
the moment of order (p + q) is defined as <xref ref-type="fig" rid="eq01">Equation 1</xref>.</p>


<fig id="eq01" fig-type="equation" position="anchor">
					<graphic id="equation01" xlink:href="jemr-11-01-b-equation-01.png"/>
				</fig>

















     
        <p>The image moment is the variance value of the pixel
centered on the origin of the image. Here, the suffix
represents the weight in the axial direction. Subsequently, the
centroid is obtained by <xref ref-type="fig" rid="eq02">Equation 2</xref>.</p>

<fig id="eq02" fig-type="equation" position="anchor">
					<graphic id="equation02" xlink:href="jemr-11-01-b-equation-02.png"/>
				</fig>

  
        <p>The pixel point (x&#x305;,y&#x305;  ) are the centroid of the image
&#x192;(x,y). Based on the coordinates of this centroid, the
moment considering the centroid is obtained by <xref ref-type="fig" rid="eq03">Equation 3</xref>.</p>
<fig id="eq03" fig-type="equation" position="anchor">
					<graphic id="equation03" xlink:href="jemr-11-01-b-equation-03.png"/>
				</fig>





        
        <p>Further, normalize this moment of centroid by
<xref ref-type="fig" rid="eq04">Equation 4</xref> to find the normalized centroid.</p>
        
		<fig id="eq04" fig-type="equation" position="anchor">
					<graphic id="equation04" xlink:href="jemr-11-01-b-equation-04.png"/>
				</fig>
		
		<p>where <xref ref-type="fig" rid="eq05">Equation 5</xref></p>
		<fig id="eq05" fig-type="equation" position="anchor">
					<graphic id="equation05" xlink:href="jemr-11-01-b-equation-05.png"/>
				</fig>
           <p>By normalizing, the variance no longer affects the moment
value, therefore it is invariant to the scale.</p> 


   <p>Seven kinds of Hu invariant moment are defined by
using the normalized centroid moment, in this study, the
moment is calculated by <xref ref-type="fig" rid="eq06">Equation 6</xref></p>
<fig id="eq06" fig-type="equation" position="anchor">
					<graphic id="equation06" xlink:href="jemr-11-01-b-equation-06.png"/>
				</fig>






 <p>This is the sum of variances in the x-axis direction and
the y-axis direction.</p>



 <p><xref ref-type="fig" rid="fig27">Table 1</xref> shows the result of the similarity calculation
using the Hu invariant moment.</p> 
<fig id="fig27" fig-type="figure" position="float">
					<label>Table 1</label>
					<caption>
						<p>the user interface of ELAN. The software supports multiple synchronized media sources and an arbitrary number of annotation tiers. Videos are blurred to protect participants.</p>
					</caption>
					<graphic id="graph27" xlink:href="jemr-11-01-b-figure-27.png"/>
				</fig>

<p>The template images are on the top of the table and the
searched images are on the side of the table; lower matching
evaluation scores indicate higher similarity and are
represented with red cells. The Hu invariant moment allows
checks of both rotational and scale invariance; therefore,
relevant combinations of patterns with high similarity
scores can be calculated.</p> 


 <p>Based on the findings, several patterns were chosen
and created with IR markers. Specifically, NIR-LEDs and
resistors were attached to a solder-less breadboard and
were mounted onto a polystyrene board. To ensure the
high accuracy of the template matching, it was found that
twelve patterns were required to be on the board for at least
three patterns to be within the view camera at a given time
for image processing. The layout of the IR markers was
decided based on the similarity results seen in Table 1, and
the actual implementation can be seen in <xref ref-type="fig" rid="fig10">Figure 7a</xref> and <xref ref-type="fig" rid="fig11">Figure 7b</xref>.</p>




<fig id="fig10" fig-type="figure" position="float">
					<label>Figure 7a</label>
					<caption>
						<p>IR Marker-embedded screen as seen with the naked eye
(top) and through a filter (bottom).</p>
					</caption>
					<graphic id="graph10" xlink:href="jemr-11-01-b-figure-10.png"/>
				</fig>
				
				
				
				<fig id="fig11" fig-type="figure" position="float">
					<label>Figure 7b</label>
					<caption>
						<p>IR Marker-embedded screen as seen with the naked eye
(top) and through a filter (bottom).</p>
					</caption>
					<graphic id="graph11" xlink:href="jemr-11-01-b-figure-11.png"/>
				</fig>
        </sec>
		
		
		
        <sec id="s3e">
          <title>Procedure</title>
          <p>The operation principle and pattern creating method of
NIR-LEDs are described in the previous section. In this
section, we will introduce the process and the algorithm of
calculating the subject&#x2019;s view point on the screen, derived
from the LED points on the screen and the eye positions.
</p>
          <p>1. Distortion of the image is caused by the lens of
the view camera, therefore calibration is
performed for each frame of the obtained movie.</p>
          <p>2.Apply template matching on the
distortion-corrected images of the view camera to detect the IDs
of the IR markers and their coordinates.</p>
          <p>3. Detect three points with high matching rates, and
obtain their coordinates. In order to calculate the
line-of-sight positions on the screen, apply affine
transformation to the known coordinates of the
markers on the screen.</p>
          <p>4. Map the corrected eye coordinates on the image
projected on the screen (<xref ref-type="fig" rid="fig08">Figure 8</xref>).</p>


<fig id="fig12" fig-type="figure" position="float">
					<label>Figure 8</label>
					<caption>
						<p>Affine transformation between the image of the view
camera and the screen.</p>
					</caption>
					<graphic id="graph12" xlink:href="jemr-11-01-b-figure-12.png"/>
				</fig>
          <p>5. Output the image or movie with the mapped eye
positions (format depends on the visual source).</p>





          <p>Affine transformation is used to map coordinates of
eye positions in the images of view camera onto the
screen. Specifically, scaling is required to adjust the
difference in resolution between the view camera and the
image projected on the screen, rotation and translation are
required to compensate the head movements. Affine
transformation is a movement and deformation of a shape
that preserves collinearity, including geometric
contraction, expansion, dilation, reflection, rotation, shear,
similarity transformations, spiral similarities, translation and
compositions of them in any combination and sequence.</p>



          <p>These transformations for point  p(x,y) on a plane to
be mapped to point  p&#x2032;(x&#x2032;,y&#x2032;)  
on another plane are expressed as <xref ref-type="fig" rid="eq07">Equation 7</xref>.</p>

<fig id="eq07" fig-type="equation" position="anchor">
					<graphic id="equation07" xlink:href="jemr-11-01-b-equation-07.png"/>
				</fig>

<p>where <xref ref-type="fig" rid="eq05">Equation 8</xref></p><fig id="eq08" fig-type="equation" position="anchor">
					<graphic id="equation08" xlink:href="jemr-11-01-b-equation-08.png"/>
				</fig>



<p>&#x391; represents a linear transformation, and t represents a
translation. Scaling can be expressed as <xref ref-type="fig" rid="eq09">Equation 9</xref>.</p>

<fig id="eq09" fig-type="equation" position="anchor">
					<graphic id="equation09" xlink:href="jemr-11-01-b-equation-09.png"/>
				</fig>

<p>&#x3B1; and &#x3B2; are scale factors of x-axis and y-axis
direction respectively. Similarly, rotation can be expressed as
<xref ref-type="fig" rid="eq08">Equation 8</xref>.</p>

<fig id="eq10" fig-type="equation" position="anchor">
					<graphic id="equation10" xlink:href="jemr-11-01-b-equation-10.png"/>
				</fig>


 
 <p>&#x3B8; is the angle of rotation in the mapped plane. Scaling,
rotation and translation are used in this research because
distortion caused by the lens of the view camera is
calibrated before affine transformation, and scale factor is
common to x-axis and y-axis. Therefore, affine
transformation matrix required to detect eye positions are obtained
by <xref ref-type="fig" rid="eq09">Equation 9</xref>.</p>


<fig id="eq11" fig-type="equation" position="anchor">
					<graphic id="equation11" xlink:href="jemr-11-01-b-equation-11.png"/>
				</fig>

          <p><xref ref-type="fig" rid="fig12">Figure 8</xref> illustrates the image of affine transformation
used in our method.</p>
        </sec>
    
	  
	  
	  
	  
    </sec>
    <sec id="s4">
      <title>Verification experiment</title>  
	  <sec id="s4a">
      <title>Implementation of the screen for eye tracking</title>
      <p>Verification experiments were conducted to examine
the proposed method&#x2019;s correlation between the eye
position, as seen through the view camera, and the actual
projected image. Since our method assumes covering the
field camera with a filter, the image from the view
camera won&#x2019;t allow detection of what the subject is looking
at. In order to verify the results, template matching was
conducted by creating a simulated filtered image, by
projecting an identical image of that seen on the view camera
onto the screen through an IR filter. The image projected
on the screen is shown in <xref ref-type="fig" rid="fig13">Figure 9</xref>.</p>


<fig id="fig13" fig-type="figure" position="float">
					<label>Figure 9</label>
					<caption>
						<p>Projected image on the screen for verification experiment.</p>
					</caption>
					<graphic id="graph13" xlink:href="jemr-11-01-b-figure-13.png"/>
				</fig>


 </sec>
      <sec id="s4b">
        <title>Preliminary experiment</title>
        <p>Before conducting template matching of all gaze data,
preliminary experiments were conducted to confirm
template matching performance. The numbers 1 through 3 were added to the image seen in <xref ref-type="fig" rid="fig13">Figure 9</xref> and projected as
shown in <xref ref-type="fig" rid="fig14">Figure 10a</xref> and <xref ref-type="fig" rid="fig15">Figure 10b</xref>, where the subjects wearing the
EMR9 eye tracker were requested to look at them in order.
<xref ref-type="fig" rid="fig16">Figure 11</xref> shows an image clipped from the view camera
movie during eye-tracking measurements, and <xref ref-type="fig" rid="fig17">Figure 12</xref>
represents six template matching results with obtained
gaze data.</p>


<fig id="fig14" fig-type="figure" position="float">
					<label>Figure 10a</label>
					<caption>
						<p>Image source projected on the screen for preliminary
experiment (top; yellow circles added for enhancement) and
screen with image source projected (bottom).</p>
					</caption>
					<graphic id="graph14" xlink:href="jemr-11-01-b-figure-14.png" />
				</fig>
				
				
				<fig id="fig15" fig-type="figure" position="float">
					<label>Figure 10b</label>
					<caption>
						<p>Image source projected on the screen for preliminary
experiment (top; yellow circles added for enhancement) and
screen with image source projected (bottom).</p>
					</caption>
					<graphic id="graph15" xlink:href="jemr-11-01-b-figure-15.png"/>
				</fig>
				
				
				
				
				<fig id="fig16" fig-type="figure" position="float">
					<label>Figure 11</label>
					<caption>
						<p>Captured image of the view camera while the subject
watching the number 1 on the screen.</p>
					</caption>
					<graphic id="graph16" xlink:href="jemr-11-01-b-figure-16.png"/>
				</fig>
				
				
				
				<fig id="fig17" fig-type="figure" position="float">
					<label>Figure 12</label>
					<caption>
						<p>Result of preliminary experiment (enlarged). Eye positions
while watching the number 1 through 3 plotted on the
projected image through the template matching.</p>
					</caption>
					<graphic id="graph17" xlink:href="jemr-11-01-b-figure-17.png"/>
				</fig>
				
				
				
        </sec>
   
      <sec id="s4c">
        <title>Result of gaze plot</title>
        <p>Gaze behaviors of the subjects were measured with
EMR-9 at 30fps, in a zigzag manner from the upper left
marker to the lower right marker of the image shown in
Figure 9. Subjects could move their heads freely. To verify
template matching performance, Affine transformation
was manually conducted based on the template shown in
the view camera&#x2019;s image, and eye positions were mapped
onto the projected image. <xref ref-type="fig" rid="fig18">Figure 13</xref> represents template
matching results, including a comparison with manually
mapped eye points.</p>





<fig id="fig18" fig-type="figure" position="float">
					<label>Figure 13</label>
					<caption>
						<p>Projected image with eye position mapped (circles: results of template matching, X: results of manual mapping)</p>
					</caption>
					<graphic id="graph18" xlink:href="jemr-11-01-b-figure-18.png"/>
				</fig>
        <p>Approximately 250 eye points were mapped, where
data suggests a very high correlation between template
matching and manually conducted mapping results,
although some deviation does remain.</p>



        <p>Let &#x2206;d<sub>i</sub> be the distance between the actual eye
position and the corrected eye position, where i denotes the
 i`th eye points. The detection rate of template matching was
98.6%. Averaged &#x2206;d was 15.9 pixels. Note that the
resolution of the projected image was 1920 x 1080 pixels.
Points containing detection errors can be seen in <xref ref-type="fig" rid="fig19">Figure 14</xref>
and the histogram of &#x2206;d<sub>i</sub> is represented in <xref ref-type="fig" rid="fig20">Figure 15</xref>.
More than 90% of &#x2206;d<sub>i</sub> are within 30 pixels. The main
cause of such errors is due to view camera image capture
failures, caused by very quick head motions and camera
shake, leading to image blur which prevents accurate
template matching. However, for example, &#x2206;d<sub>i</sub> of 30 pixels
falls within the range of rear combination lamp of the car
shown in the top of <xref ref-type="fig" rid="fig21">Figure 17a</xref>,  <xref ref-type="fig" rid="fig22">Figure 17b</xref>,  <xref ref-type="fig" rid="fig23">Figure 17c</xref>and  <xref ref-type="fig" rid="fig24">Figure 17d</xref> (a white circle at point A
represents 30 pixels). It can therefore be assumed that our
method works in practical use.</p>


<fig id="fig19" fig-type="figure" position="float">
					<label>Figure 14</label>
					<caption>
						<p>Points containing the errors of template matching (light pink: relatively large gap, dark pink: no correspondence)</p>
					</caption>
					<graphic id="graph19" xlink:href="jemr-11-01-b-figure-19.png"/>
				</fig>
				
				<fig id="fig20" fig-type="figure" position="float">
					<label>Figure 15</label>
					<caption>
						<p>Histogram of distance between the actual eye position and the corrected eye position (&#x2206;d<sub>i</sub> ).</p>
					</caption>
					<graphic id="graph20" xlink:href="jemr-11-01-b-figure-20.png"/>
				</fig>
				
				<fig id="fig21" fig-type="figure" position="float">
					<label>Figure 17a</label>
					<caption>
						<p>the user interface of ELAN. The software supports multiple synchronized media sources and an arbitrary number of annotation tiers. Videos are blurred to protect participants.</p>
					</caption>
					<graphic id="graph21" xlink:href="jemr-11-01-b-figure-23.png"/>
				</fig>	
				<fig id="fig22" fig-type="figure" position="float">
					<label>Figure 17b</label>
					<caption>
						<p>the user interface of ELAN. The software supports multiple synchronized media sources and an arbitrary number of annotation tiers. Videos are blurred to protect participants.</p>
					</caption>
					<graphic id="graph22" xlink:href="jemr-11-01-b-figure-24.png"/>
				</fig>	
				<fig id="fig23" fig-type="figure" position="float">
					<label>Figure 17c</label>
					<caption>
						<p>the user interface of ELAN. The software supports multiple synchronized media sources and an arbitrary number of annotation tiers. Videos are blurred to protect participants.</p>
					</caption>
					<graphic id="graph23" xlink:href="jemr-11-01-b-figure-25.png"/>
				</fig>	
				<fig id="fig24" fig-type="figure" position="float">
					<label>Figure 17d</label>
					<caption>
						<p>the user interface of ELAN. The software supports multiple synchronized media sources and an arbitrary number of annotation tiers. Videos are blurred to protect participants.</p>
					</caption>
					<graphic id="graph24" xlink:href="jemr-11-01-b-figure-26.png"/>
				</fig>	
				
      </sec>
	  
	  
	  
	  
	  
	  
      <sec id="s4d">
        <title>Gaze plot on the movie</title>
        <p>Our method can also be used for gaze measurement
while watching a movie, and output the movie with eye
point mapped on each frame automatically. Here, a driving
video footage taken from the inside of a vehicle while
driving was adopted as a visual stimulus. <xref ref-type="fig" rid="fig25">Figure 16a</xref> and <xref ref-type="fig" rid="fig25">Figure 16b</xref> shows
the images clipped from the movie and <xref ref-type="fig" rid="fig21">Figure 17</xref> shows
the images of view camera and the corresponding
corrected eye positions mapped on the source movie.</p>


					<fig id="fig25" fig-type="figure" position="float">
					<label>Figure 16a</label>
					<caption>
						<p>Scene images from a projected movie on the screen.</p>
					</caption>
					<graphic id="graph25" xlink:href="jemr-11-01-b-figure-21.png"/>
				</fig>	
				
				<fig id="fig26" fig-type="figure" position="float">
					<label>Figure 16b</label>
					<caption>
						<p>Scene images from a projected movie on the screen.</p>
					</caption>
					<graphic id="graph26" xlink:href="jemr-11-01-b-figure-22.png"/>
				</fig>
				


      </sec>
    </sec>
    <sec id="s5">
      <title>Conclusions and Remarks</title>
      <p>A new method to compensate for head movement
during eye-tracking has been developed, using invisible
markers. This will enable higher eye position detection
accuracy, which is a problem specific to mobile eye-tracker.
However, our methodology has two limitations: First is
that the eye tracking is limited on the screen with IR
markers embedded. When expanding the range of
measurement, it is necessary to add new screens and increase the
number of markers newly. Secondly, current apparatus
does not allow to confirm the correspondence between the
projected image and the eye position in the image of view
camera because the view camera is covered with the IR
filter. In order to solve this issue, we will add a view
camera without filter in the future work.</p>
      <p>In addition to the issues to be solved in the future works
shown above, error in positioning still remains, due to the
error of template matching in some cases, which does have
room for improvement for better eye position recognition.
Potential solutions to reduce such error include (i) the use
of a view camera with higher sensitivity and resolution
with shorter exposure time, and (ii) adopting a more robust
template matching method. As (i) is less realistic due to
the wide use of commercially available eye-trackers with
limited performance, a more effective approach would be
(ii) through image preprocessing with edge detection as an
example.</p>
    </sec>
  </body>  
  <back>
<ref-list>
<ref id="b1"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Ahlstrom</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Victor</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Wege</surname>, <given-names>C.</given-names></string-name>, &amp; <string-name><surname>Steinmetz</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2012</year>). <article-title>Processing of eye/head-tracking data in large-scale naturalistic driving data sets</article-title>. <source>IEEE Transactions on Intelligent Transportation Systems</source>, <volume>13</volume>(<issue>2</issue>), <fpage>553</fpage>–<lpage>564</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1109/tits.2011.2174786</pub-id> <pub-id pub-id-type="doi">10.1109/TITS.2011.2174786</pub-id><issn>1524-9050</issn></mixed-citation></ref>
<ref id="b2"><mixed-citation publication-type="other" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Chandon</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Hutchinson</surname>, <given-names>J. W.</given-names></string-name>, <string-name><surname>Bradlow</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Young</surname>, <given-names>S. H.</given-names></string-name></person-group> (<year>2006</year>). Measuring the value of point-of-purchase marketing with commercial eye-tracking data. INSEAD Business School Research Paper, 2007/22/MKT/AC-GRD. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.2139/ssrn.1032162</pub-id></mixed-citation></ref>
<ref id="b3"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Chanijani</surname>, <given-names>S. S. M.</given-names></string-name>, <string-name><surname>Al-Naser</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Bukhari</surname>, <given-names>S. S.</given-names></string-name>, <string-name><surname>Borth</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Allen</surname>, <given-names>S. E. M.</given-names></string-name>, &amp; <string-name><surname>Dengel</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2016</year>). <article-title>An eye move-ment study on scientific papers using wearable eye tracking technology</article-title>, <source>9th International Conference on Mobile Computing and Ubiquitous Networking (ICMU)</source>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.1109/icmu.2016.7742085</pub-id> <pub-id pub-id-type="doi">10.1109/ICMU.2016.7742085</pub-id></mixed-citation></ref>
<ref id="b4"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><string-name><surname>Hammal</surname>, <given-names>Z.</given-names></string-name>, &amp; <string-name><surname>Cohn</surname>, <given-names>J. F.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Intra- and interper-sonal functions of head motion in emotion communica-tion</article-title>, <source>Proceedings of the 2014 Workshop on Roadmap-ping the Future of Multimodal Interaction Research in-cluding Business Opportunities and Challenges (RFMIR)</source>, <fpage>19</fpage>-<lpage>22</lpage>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.1145/2666253.2666258</pub-id></mixed-citation></ref>
<ref id="b5"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Huang</surname>, <given-names>C. W.</given-names></string-name>, &amp; <string-name><surname>Tan</surname>, <given-names>W. C.</given-names></string-name></person-group> (<year>2016</year>). <article-title>An approach of head movement compensation when using a head mounted eye tracker</article-title>, <source>International Conference of Consumer Electronics-Taiwan (ICCE-TW)</source>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.1109/icce-tw.2016.7520987</pub-id> <pub-id pub-id-type="doi">10.1109/ICCE-TW.2016.7520987</pub-id></mixed-citation></ref>
<ref id="b6"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Hu</surname>, <given-names>M. K.</given-names></string-name></person-group> (<year>1962</year>). <article-title>Visual pattern recognition by moment invariants</article-title>. <source>I.R.E. Transactions on Information Theory</source>, <volume>8</volume>(<issue>2</issue>), <fpage>179</fpage>–<lpage>187</lpage>. <pub-id pub-id-type="doi" specific-use="author">10.1109/tit.1962.1057692</pub-id> <pub-id pub-id-type="doi">10.1109/TIT.1962.1057692</pub-id><issn>0096-1000</issn></mixed-citation></ref>
<ref id="b7"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Jensen</surname>, <given-names>R. R.</given-names></string-name>, <string-name><surname>Stets</surname>, <given-names>J. D.</given-names></string-name>, <string-name><surname>Suurmets</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Clement</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Aanas</surname>, <given-names>H.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Wearable gaze trackers: Mapping visual attention in 3D</article-title>. <source>Lecture Notes in Computer Science</source>, <volume>10269</volume>, <fpage>66</fpage>–<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-59126-1_6</pub-id><issn>0302-9743</issn></mixed-citation></ref>
<ref id="b8"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Klostermann</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Kredel</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name><surname>Hossner</surname>, <given-names>E.-J.</given-names></string-name></person-group> (<year>2014</year>). <article-title>On the interaction of attentional focus and gaze: The quiet eye inhibits focus-related performance decrements.</article-title> <source>Journal of Sport &amp; Exercise Psychology</source>, <volume>36</volume>(<issue>4</issue>), <fpage>392</fpage>–<lpage>400</lpage>. <pub-id pub-id-type="doi">10.1123/jsep.2013-0273</pub-id><issn>0895-2779</issn></mixed-citation></ref>
<ref id="b9"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Kocejko</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Bujnowski</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Ruminski</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Bylinska</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Wtorek</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Head movement compensation algorithm in multi-display communication by gaze</article-title>, <source>7th International Conference on Human System In-teractions (HSI)</source>, <fpage>88</fpage>-<lpage>94</lpage>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi" specific-use="author">10.1109/hsi.2014.6860454</pub-id> <pub-id pub-id-type="doi">10.1109/HSI.2014.6860454</pub-id></mixed-citation></ref>
<ref id="b10"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><string-name><surname>Larsson</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Shwaller</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Holmqvist</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Nystrom</surname>, <given-names>M.</given-names></string-name> &amp; <string-name><surname>Stridh</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Compensation of head movements in mobile eye-tracking data using an inertial measure-ment unit</article-title>, <source>Proceedings of the 2014 ACM Interna-tional Joint Conference on Pervasive and Ubiquitous Computing</source>, <fpage>1161</fpage>-<lpage>1167</lpage>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.1145/2638728.2641693</pub-id></mixed-citation></ref>
<ref id="b11"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><string-name><given-names>NAC</given-names> <surname>Image Technology</surname></string-name></person-group>. (<year>2008</year>) EMR-dStream: Retrieved from <ext-link ext-link-type="uri" xlink:href="http://www.eyemark.jp/prod-uct/emr_dstream/">http://www.eyemark.jp/prod-uct/emr_dstream/</ext-link></mixed-citation></ref>
<ref id="b12"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Sun</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Xia</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Falkmer</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name><surname>Lee</surname>, <given-names>H.</given-names></string-name></person-group> (<year>2016</year>). Investi-gating the spatial pattern of older drivers’ eye fixa-tion bahaviour and associations with their visual ca-pacity. Journal of Eye Movement Research, 9(6):2, 1- 16. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.16910/jemr.9.6.2</pub-id></mixed-citation></ref>
<ref id="b13"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><string-name><surname>Sohn</surname>, <given-names>B</given-names></string-name>, <string-name><surname>Lee</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Chae</surname>, <given-names>H.</given-names></string-name> &amp; <string-name><surname>Yu</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2007</year>). <article-title>Localization system for mobile robot using wire-less communica-tion with IR landmark</article-title>, <source>Proceedings of the 1st Inter-national Conference on Robot Communication and Coordination</source>, <fpage>1</fpage>-<lpage>6</lpage>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.4108/icst.robocomm2007.2173</pub-id></mixed-citation></ref>
<ref id="b14"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><string-name><surname>Takemura</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Kohashi</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Suenaga</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Takamatsu</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Ogasawara</surname>, <given-names>T.</given-names></string-name></person-group> (<year>2010</year>). <article-title>Estimating 3D point-of-re-gard and visualizing gaze trajectories under natural head movements</article-title>, <source>Proceedings of 6th ACM Sympo-sium on Eye Tracking Research &amp; Applications (ETRA)</source>, <fpage>157</fpage>-<lpage>160</lpage>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.1145/1743666.1743705</pub-id></mixed-citation></ref>
<ref id="b15"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><string-name><surname>Tomi</surname>, <given-names>A. B.</given-names></string-name>, &amp; <string-name><surname>Rambli</surname>, <given-names>D. R. A.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Automated cal-ibration for optical see-through head mounted display using display screen space based eye tracking</article-title>, <source>3rd In-ternational Conference on Computer and Information Science (ICCOINS)</source>, <fpage>448</fpage>-<lpage>453</lpage>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.1109/iccoins.2016.7783257</pub-id></mixed-citation></ref>
<ref id="b16"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Toyama</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Kieninger</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Shafait</surname>, <given-names>F.</given-names></string-name>, &amp; <string-name><surname>Dengel</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2012</year>). <article-title>Gaze guided object recognition using a head-mounted eye tracker</article-title>, <source>Proceedings of 7th ACM Sym-posium on Eye Tracking Research &amp; Applications (ETRA)</source>, <fpage>91</fpage>-<lpage>98</lpage>. Retrieved from http://dx.doi.org/<pub-id pub-id-type="doi">10.1145/2168556.2168570</pub-id></mixed-citation></ref>
<ref id="b17"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Vickers</surname>, <given-names>J. N.</given-names></string-name></person-group> (<year>1996</year>). <article-title>Visual control when aiming at a far target.</article-title> <source>Journal of Experimental Psychology. Human Perception and Performance</source>, <volume>22</volume>(<issue>2</issue>), <fpage>342</fpage>–<lpage>354</lpage>. <pub-id pub-id-type="doi">10.1037/0096-1523.22.2.342</pub-id><pub-id pub-id-type="pmid">8934848</pub-id><issn>0096-1523</issn></mixed-citation></ref>
<ref id="b18"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Wedel</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Pieters</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2008</year>). <article-title>Eye tracking for visual marketing</article-title>. <source>Foundations and Trends in Marketing</source>, <volume>1</volume>(<issue>4</issue>), <fpage>231</fpage>–<lpage>320</lpage>. <pub-id pub-id-type="doi">10.1561/1700000011</pub-id></mixed-citation></ref>
</ref-list>
  </back>
</article>
