<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">

<article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
 <front>
    <journal-meta>
	<journal-id journal-id-type="publisher-id">Jemr</journal-id>
      <journal-title-group>
        <journal-title>Journal of Eye Movement Research</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1995-8692</issn>
	  <publisher>								
	  <publisher-name>Bern Open Publishing</publisher-name>
	  <publisher-loc>Bern, Switzerland</publisher-loc>
	</publisher>
    </journal-meta>
    <article-meta>
	<article-id pub-id-type="doi">10.16910/jemr.11.3.5</article-id> 
	  <article-categories>								
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
		</article-categories>
      <title-group>
        <article-title>An investigation of the distribution of gaze estimation errors in head mounted gaze trackers using polynomial functions</article-title>
      </title-group>
	   <contrib-group> 
				<contrib contrib-type="author">
					<name>
						<surname>Mardanbegi</surname>
						<given-names>Diako</given-names>
					</name>
					<xref ref-type="aff" rid="aff1"></xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Kurauchi</surname>
						<given-names>Andrew T. N.</given-names>
					</name>
					<xref ref-type="aff" rid="aff2"></xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Morimoto</surname>
						<given-names>Carlos H.</given-names>
					</name>
					<xref ref-type="aff" rid="aff2"></xref>
				</contrib>				
        <aff id="aff1">
		<institution>Department of Management Engineering, Technical University of Denmark</institution>,   <country>Denmark</country>
        </aff>
        <aff id="aff2">
		<institution>Department of Computer Science (IME), University of Sao Paulo, Sao Paulo</institution>,   <country>Brazil</country>
        </aff>		
		</contrib-group>     
	  <pub-date date-type="pub" publication-format="electronic"> 
		<day>30</day>  
		<month>6</month>
        <year>2018</year>
      </pub-date>
	  <pub-date date-type="collection" publication-format="electronic"> 
	  <year>2018</year>
	</pub-date>
      <volume>11</volume>
      <issue>3</issue>
 <elocation-id>10.16910/jemr.11.3.5</elocation-id>
	<permissions> 
	<copyright-year>2018</copyright-year>
	<copyright-holder>Mardanbegi, D., Kurauchi, A. T. N. &#x26; Morimoto, C. H. </copyright-holder>
	<license license-type="open-access">
  <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License, 
  (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">
    https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p>
</license>
	</permissions>
      <abstract>
        <p>Second order polynomials are commonly used for estimating the point-of-gaze in headmounted eye trackers. Studies in remote (desktop) eye trackers show that although some non- standard 3rd order polynomial models could provide better accuracy, high-order polynomials do not necessarily provide better results. Different than remote setups though, where gaze is estimated over a relatively narrow field-of-view surface (e.g. less than 30&#xD7;20 degrees on typical computer displays), head-mounted gaze trackers (HMGT) are often desired to cover a relatively wider field-of-view to make sure that the gaze is detected in the scene image even for extreme eye angles. In this paper we investigate the behavior of the gaze estimation error distribution throughout the image of the scene camera when using polynomial functions. Using simulated scenarios, we describe effects of four different sources of error: interpolation, extrapolation, parallax, and radial distortion. We show that the use of third order polynomials result in more accurate gaze estimates in HMGT, and that the use of wide angle lenses might be beneficial in terms of error reduction.</p>
      </abstract>
      <kwd-group>
        <kwd>Eye Movement</kwd>
        <kwd>eye tracking</kwd>
        <kwd>saccades</kwd>
        <kwd>microsaccades</kwd>
        <kwd>antisaccades</kwd>
        <kwd>smooth pursuit</kwd>
        <kwd>scanpath</kwd>
        <kwd>convergence</kwd>
        <kwd>attention</kwd>		
      </kwd-group>
    </article-meta>
  </front>	
  <body>

    <sec id="S1">
      <title>Introduction</title>
	  
      <p>Monocular video-based head mounted gaze trackers use at 
least one camera to capture the eye image and another to capture 
the field-of-view (FoV) of the user. Probably due to the simplicity 
of regression-based me-thods when compared to model-based methods [
        <xref ref-type="bibr" rid="b10">10</xref>
		], 
regression-based methods are commonly used in head-mounted gaze 
trackers (HMGT) to estimate the user&#x2019;s gaze as a point within the 
scene image, despite the fact that such methods do not achieve 
the same accuracy levels of model-based methods.</p>	  
	  
      <p>In this paper we define and investigate four different
sources of error to help us characterize the low
performance of regression-based methods in HMGT. The first
source of error is the inaccuracy of the gaze mapping
function in interpolating the gaze point (e<sub>int</sub>) within the
calibration box, the second source is the limitation of the
mapping function to extrapolate the results outside the
calibration box required in HMGT (e<sub>ext</sub>), the third is the
misalignment between the scene camera and the eye
known as parallax error (e<sub>par</sub>), and the fourth error source
is the radial distortion in the scene image when using a
wide angle lens (e<sub>dis</sub>).</p>

      <p>Most of these sources of error have been investigated
before independently. Cerrolaza et al. [
        <xref ref-type="bibr" rid="b8">8</xref>
        ] have studied the
performance, based on the interpolation error, of different
polynomial functions using combinations of eye features
in re- mote eye trackers. Mardanbegi and Hansen [
        <xref ref-type="bibr" rid="b12">12</xref>
        ]
have described the parallax error in HMGTs using
epipolar geometry in a stereo camera setup. They have
investigated how the pattern of the parallax error changes for
different camera configurations and calibration distances.
However, no experimental result was presented in their
work showing the actual error in a HMGT. Barz et al. [
        <xref ref-type="bibr" rid="b2">2</xref>
        ]
have proposed a method for modeling and predicting the
gaze estimation error in HMGT. As part of their study,
they have empirically investigated the effect of
extrapolation and parallax error independently. In this paper, we
describe the nature of the four sources of error introduced
above in more details providing a better understanding of
how these different components contribute to the gaze
estimation error in the scene image. The rest of the paper
is organized as follows: The simulation methodology
used in this study is described in the first section and the
next section describes related work regarding the use of
regression-based methods for gaze estimation in HMGT.
We then propose alternative polynomial models and
compare them with the existing models. We also show
how precision and accuracy of different polynomial
models change in different areas of the scene image. Section
Parallax Error describes the parallax error in HMGT and
its following section investigates the effect of radial
distortion in the scene image on gaze estimation accuracy.
The combination of errors caused by different factors is
discussed in Section Combined Error and we conclude in
Section Conclusion.</p>
    </sec>
	
    <sec id="S2">
      <title>Simulation</title>
	  
      <p>All the results presented in the paper are based on
simulation and the proposed methods are not tested on
real setups. The simulation code for head-mounted gaze
tracking that was used in this paper was developed based
on the eye tracking simulation framework proposed by B&#xF6;hme, Dorr,
Graw, Martinetz, &#x26; Barth [
        <xref ref-type="bibr" rid="b6">6</xref>
        ].</p>
		
      <p>The main four components of a head-mounted eye
tracker (eye globe, eye camera, scene camera and light
source) are modeled in the simulation. After defining the
relationship between these components, points can be
projected from 3D to the camera images, and vice versa.</p>

      <p>Positions of the relevant features in the eye image are
computed directly based on the geometry between the
components (eye, camera and light) and no 3D rendering
algorithms and image analysis are used in the simulation.
Pupil center in the eye image is obtained by projecting
the center of pupil into the image and no ellipse fitting is
used for the tests in this paper. The eyeball can be
oriented in 3D either by defining its rotation angles or by
defining a fixation point in space. Fovea displacement and
light refraction on the surface of the cornea are
considered in the eye model.</p>

      <p>The details of the parameters used in the simulation
are described in each subsequent section.</p>
    </sec>
	
    <sec id="S3">
      <title>Regression-based methods in HMGT</title>	

      <p>The pupil center (PC) is a common eye feature used
for gaze estimation [
        <xref ref-type="bibr" rid="b10">10</xref>
        ]. Geometry-based gaze estimation
methods [
        <xref ref-type="bibr" rid="b15 b9">9,15</xref>
        ] mostly rely on calculating the 3D position
of the pupil center as a point along the optical axis of the
eye. Feature-based gaze estimation methods, on the other
hand, directly use the image of the pupil center (its 2D
location in the eye image) as input for their mapping
function.</p>

      <p>Infrared light sources are frequently used to create
corneal reflections, or glints, that are used as reference
points. When combined, the pupil-center and glint (first
Purkinje image [
        <xref ref-type="bibr" rid="b13">13</xref>
        ]) forms a vector (in the eye image)
that can be used for gaze estimation instead of the
pupilcenter alone. In remote eye trackers, the use of the
pupilglint vector (PCR) improves the performance of the gaze
tracker for small head motions [
        <xref ref-type="bibr" rid="b16">16</xref>
        ]. However, eye
movements towards the periphery of the FoV are often
not tolerated when using glints as the reflections tend to
fall off the corneal surface. For the sake of simplicity, in
the following, we use pupil center instead of PCR as the
eye feature used for gaze mapping.</p>

      <p>Figure 1 illustrates the general setup for a pupil-based
HMGT consisting of 3 components: the eye, the eye
camera, and the scene camera. Gaze estimation essentially
maps the position of the pupil center in the eye image
(p<sub>x</sub>) to a point in the scene image (x) when the eye is
looking at a point (X) in 3D.</p>

<fig id="fig01" fig-type="figure" position="float">
					<label>Figure 1.</label>
					<caption>
						<p>Sagittal view of a HMGT</p>
					</caption>
					<graphic id="graph01" xlink:href="jemr-11-03-e-figure-01.png"/>
				</fig>

      <p>Interpolation-based (regression-based) methods have
been widely used for gaze estimation in both commercial
eye trackers and research prototypes in remote (or
desktop) scenarios [
        <xref ref-type="bibr" rid="b17 b7 b8">7,8,17</xref>
        ]. Compared to geometry-based
methods [
        <xref ref-type="bibr" rid="b10">10</xref>
        ], they are in general more sensitive to head
movements though they present reasonable accuracy
around the calibration position, they do not require any
calibrated hardware (e.g. camera calibration, and
predefined geometry for the setup), and their software is
simpler to implement. Interpolation-based methods use linear
or non-linear mapping functions (usually a first or second
order polynomial). The unknown coefficients of the
mapping function are fitted by regression based on
correspondence data collected during a calibration procedure.
It is desirable to have a small number of calibration
points to simplify the calibration procedure, so a small
number of unknown coefficients is desirable for the
mapping function.</p>

      <p>In a remote gaze tracker (RGT) system, one may
assume that the useful range of gaze directions is limited to
the computer display. Performance of regression-based
methods that map eye features to a point in a computer
display have been well studied for RGT [
        <xref ref-type="bibr" rid="b18 b5 b8">5,8,18</xref>
        ]. Cerrolaza et al. [
        <xref ref-type="bibr" rid="b8">8</xref>
        ] present an extensive study on how
different polynomial functions perform on remote setups. The
maximum range of eye rotation used in their study was
about (16&#xB0; &#xD7; 12&#xB0;) (looking at a 17 inches display at the
distance 58 cm). Blignaut [
        <xref ref-type="bibr" rid="b4">4</xref>
        ] showed that a third order
polynomial model with 8 coefficients for S<sub>x</sub> and 7
coefficients for S<sub>y</sub> provides a good accuracy (about 0.5&#xB0;) on a
remote setup when using 14 or more calibration points.</p>

      <p>However, performance of interpolation-based
methods for HMGT have not yet been thoroughly studied. The
mapping function used in a HMGT maps the eye features
extracted from the eye image to a 2D point in the scene
image that is captured by a front view camera (scene
camera) [
        <xref ref-type="bibr" rid="b11">11</xref>
        ]. For HMGT it is common to use a wide FoV
scene camera (FoV &#x3E; 60&#xB0;) so gaze can be observed over a
considerably larger region than RGT. Nonetheless,
HMGTs are often calibrated for only a narrow range of
gaze directions. Because gaze must be estimated over the
whole region covered by the scene camera, the
polynomial function must extrapolate the gaze estimate outside
the bounding box that contains the points used for
calibration (called the calibration box). To study the behavior
of the error inside and outside the calibration box, we will
refer to the error inside the box as interpolation error and
outside as extrapolation error. The use of wide FoV
lenses also increases radial distortions which affect the
quality of the scene image.</p>

      <p>On the other hand, if the gaze tracker is calibrated for
a wide FoV that spans over the whole scene image, it will
increase the risk of poor interpolation. This has to do with
the significant non-linearity that we get in the domain of
the regression function (due to the spherical shape of the
eye) for extreme viewing angles. Besides the
interpolation and extrapolation errors, we should take into account
the polynomial function is adjusted for a particular
calibration distance while in practice the distance might vary
significantly during the use of the HMGT.</p>
    </sec>
	
    <sec id="S4">
      <title>Derivation of alternative polynomial models</title>

      <p>To find a proper polynomial function for HMGTs and
to see whether the commonly used polynomial model is
suitable for HMGTs, we will use a systematic approach
similar to the one proposed by Blignaut [
        <xref ref-type="bibr" rid="b4">4</xref>
        ] for RGTs.
The systematic approach consists of considering each
dependent variable S<sub>x</sub> and S<sub>y</sub> (horizontal and vertical
components of the gaze position on the scene image)
separately. We first fix the value for the independent
variable P<sub>y</sub> (vertical component of the eye feature in our
case, pupil center or PCR - on the eye image) and vary
the value of P<sub>x</sub> (horizontal component of the eye feature
on the eye image) to find the relationship between S<sub>x</sub> and
P<sub>x</sub>. Then the process is repeated fixing P<sub>x</sub> and varying P<sub>y</sub>
to find the relationship between coefficients of the
polynomial model and P<sub>y</sub>.</p>

      <p>We simulated a HMGT with a scene camera
described in Table 2. A grid of 25&#xD7;25 points in the scene
image (the whole image covered) are back-projected to
fixation points on a plane at 1 m away from C<sub>E</sub> and the
corresponding pupil position is obtained for each point.
We run the simulation for 9 different eyes defined by
combining 3 different values for each of the parameters
shown in Table 1 (3 parameters and &#xB1;25% of their default
values). We extract the samples for two different
conditions, one with pupil center and the second condition with
pupil-glint vector as our independent variable.</p>

<table-wrap id="t01" position="float">
					<label>Table 1.</label>
					<caption>
						<p>Default eye measures used in the simulation</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<tbody>					
							<tr>
								<td rowspan="1" colspan="1">r_cornea</td>
								<td rowspan="1" colspan="1">7.98 mm</td>
							</tr>
						</tbody>							
						<tbody>							
							<tr>
								<td rowspan="1" colspan="1">Horizontal fovea offset (&#x3B1;)</td>
								<td rowspan="1" colspan="1">6&#xB0;</td>
							</tr>
						</tbody>							
						<tbody>							
							<tr>
								<td rowspan="1" colspan="1">Verical fovea offset (&#x3B2;)</td>
								<td rowspan="1" colspan="1">2&#xB0;</td>
							</tr>							
						</tbody>
					</table>
					</table-wrap>
					
<table-wrap id="t02" position="float">
					<label>Table 2.</label>
					<caption>
						<p>Default configuration for the cameras and the light source
used in the simulation. All measures are relative to the world
coordinate system with the origin at the center of the eyeball
(C<sub>E</sub>) (see Figure 1). The symbols R and Tr stands for rotation
and translation respectively.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<tbody>					
							<tr>
								<td rowspan="1" colspan="1">Scene camera</td>
								<td rowspan="1" colspan="1">FoV = H : 65&#xB0; &#xD7; V : 40&#xB0;, R = (pan, tilt, yaw) = (0, 0, 0), Tr = (10mm, 30mm, 35mm), no radial distortion, res=(1280 &#xD7; 768)</td>
							</tr>
						</tbody>							
						<tbody>							
							<tr>
								<td rowspan="1" colspan="1">Eye camera</td>
								<td rowspan="1" colspan="1">focal length: providing an eye image with W<sub>eye</sub>/W<sub>img</sub>= 90% where W<sub>eye</sub> is the horizontal dimension of the eye area in the image and W<sub>img</sub> is the image width. R: satisfying the assumption of camera being towards eyeball center, Tr = (0mm, -10mm, 60mm), res=(1280 &#xD7; 960)</td>
							</tr>
						</tbody>							
						<tbody>							
							<tr>
								<td rowspan="1" colspan="1">Light source</td>
								<td rowspan="1" colspan="1">Tr = (0, 0, 60mm)</td>
							</tr>							
						</tbody>
					</table>
					</table-wrap>					

      <p>Figure 2 shows a virtual eye socket and the pupil
center coordinates corresponding to 625 (grid of 25&#xD7;25)
target points in the scene image for one eye model. Let X
and Y axis correspond to the horizontal and vertical axis
of the eye camera respectively. To express S<sub>x</sub> in terms of
P<sub>x</sub> we need to make sure the other variable P<sub>y</sub> is kept
constant. However, we have no control on the pupil
center coordinates and even taking a specific value for S<sub>y</sub> in
the target space (as it was suggested in [
        <xref ref-type="bibr" rid="b4">4</xref>
        ]) will not result
in a constant P<sub>y</sub> value. Thus, we split the sample points
along the Y axis into 7 groups based on their P<sub>y</sub> values by
discretizing the Y axis. 7 groups give us enough samples
in each group that are distributed over the X axis. This
grouping makes it possible to select only the samples that
have a (relatively) constant P<sub>y</sub>.</p>

<fig id="fig02" fig-type="figure" position="float">
					<label>Figure 2.</label>
					<caption>					
						<p>Virtual eye socket showing 625 pupil centers.
Each center corresponds to an eye orientation that points the
optical-axis of the eye towards a scene target on a plane 1
m from the eye, and each point on the plane corresponds to
an evenly distributed 25 &#xD7; 25 grid point in the scene camera.
Samples were split into 7 groups based on their P<sub>y</sub> values
by discretizing the Y axis. Samples in the middle group are
shown in a different color.</p>
					</caption>
					<graphic id="graph02" xlink:href="jemr-11-03-e-figure-02.png"/>
				</fig>
	  
      <p>By keeping the independent variable P<sub>y</sub> within a
specific range (e.g., from pixel 153 to 170, which roughly
corresponds to the gaze points at middle of the scene
image), we can write about 88 relationships for S<sub>x</sub> in
terms of P<sub>x</sub>.</p>

      <p>Figure 3 shows this relationship which suggests the
use of a third order polynomial with the following
general form:</p>

<fig id="fig03" fig-type="figure" position="float">
					<label>Figure 3.</label>
					<caption>
						<p>Relationship between the input P<sub>x</sub> (pupil<sub>x</sub>) and
output (S<sub>x</sub>). Different curves show the result for different
parameters in the eye models.</p>
					</caption>
					<graphic id="graph03" xlink:href="jemr-11-03-e-figure-03.png"/>
				</fig>

<fig id="eq01" fig-type="equation" position="anchor">
					<label>(1)</label>
					<graphic id="equation01" xlink:href="jemr-11-03-e-equation-01.png"/>
				</fig>					

      <p>We then look at the effect of changing the
variable P<sub>y</sub> on coeffcients a<sub>i</sub>. To keep the distribution of	  
independent variable P<sub>y</sub> on coefficients ai. To keep the distribution
of samples across the X axis uniform when changing the
P<sub>y</sub> level, we skip the first level of P<sub>y</sub> (Figure 2). The
changes of <sub>a</sub>i against 6 levels of P<sub>y</sub> are shown in Figure 4.
From the figure we can see that relationship between
coefficients a<sub>i</sub> and the Y coordinate of the pupil center is
best represented by a second order polynomial:</p>

<fig id="fig04" fig-type="figure" position="float">
					<label>Figure 4.</label>
					<caption>
						<p>Relationship between the coeffcients a<sub>i</sub> of the regression
function S<sub>x</sub> against Y coordinate of the pupil center.</p>
					</caption>
					<graphic id="graph04" xlink:href="jemr-11-03-e-figure-04.png"/>
				</fig>

<fig id="eq02" fig-type="equation" position="anchor">
					<label>(2)</label>
					<graphic id="equation02" xlink:href="jemr-11-03-e-equation-02.png"/>
				</fig>					

      <p>The general form of the polynomial function for S<sub>x</sub> is
then obtained by substituting these relationships into
(Eq.1) which will be a third order polynomial with 12
terms:</p>

<fig id="eq03" fig-type="equation" position="anchor">
					<label>(3)</label>
					<graphic id="equation03" xlink:href="jemr-11-03-e-equation-03.png"/>
				</fig>					

      <p>We follow a similar approach to obtain the
polynomial function for S<sub>y</sub>. Figure 5a shows the relationship
between S<sub>y</sub> and the independent variable P<sub>y</sub> from which it
can be inferred that a straight line should fit the samples
for 27 different eye conditions. Based on this assumption
we look at the relationship between the two coefficients
of the quadratic function and P<sub>x</sub>. The result is shown in
Figure 5b &#x26; 5c which suggests that both coefficients
could be approximated by second order polynomials
resulting that S<sub>y</sub> to be a function with the following terms:</p>

<fig id="fig05" fig-type="figure" position="float">
					<label>Figure 5.</label>
					<caption>
						<p>(5a) Relationship between the regression function
S<sub>y</sub> against the Y coordinate of the pupil center. (5b &#x26; 5c)
Relationship between the coeffcients a<sub>i</sub> of S<sub>y</sub> against P<sub>x</sub></p>
					</caption>
					<graphic id="graph05" xlink:href="jemr-11-03-e-figure-05.png"/>
				</fig>

<fig id="eq04" fig-type="equation" position="anchor">
					<label>(4)</label>
					<graphic id="equation04" xlink:href="jemr-11-03-e-equation-04.png"/>
				</fig>					

      <p>To determine the coefficients for S<sub>x</sub> at least 12
calibration points are required, while S<sub>y</sub> only requires 6. In
practice the polynomial functions for S<sub>x</sub> and S<sub>y</sub> are determined
from the same data. As at least 12 calibration points will
already be collected for S<sub>x</sub>, a more complex function
could be used for S<sub>y</sub>. In the evaluation section we show
results using the same polynomial function (Eq.3) for
both S<sub>x</sub> and S<sub>y</sub>. However, to better characterize the
simulation results we first introduce the concept of
interpolation and extrapolation regions in the scene image.</p>


      <sec id="S4a">
        <title>Interpolation and extrapolation regions</title>
		
        <p>Gaze mapping calibration is done by taking
corresponding sample points from the range and the domain.
This is usually done by asking the user to look at a set of
co-planar points at a fixed distance (a.k.a calibration
plane). For each point, the corresponding position in the
scene image and the pupil position in the eye image are
stored. Any gaze point inside the bounding box of the
calibration pattern (the calibration box) will be
interpolated by the polynomial function. If a gaze point is
outside the calibration box it will be extrapolated. This is
illustrated in Figure 6, where T<sub>c</sub>B<sub>c</sub> is the area in the
calibration plane (&#x3C0;<sub>cal</sub>) that is visible in the scene image. Let
CL&#x2081; and CL&#x2081; be the edges of the calibration pattern. Any
gaze position in &#x3C0;<sub>cal</sub> within the range from T<sub>c</sub> to CL&#x2081; or
from CL&#x2081; to B<sub>c</sub> will be extrapolated by the polynomial
function. These two regions in the calibration plane are marked
in red in the figure. We can therefore divide the scene
image into two regions depending on whether the gaze
point is interpolated (calibration box) or extrapolated (out
of the calibration box).</p>

<fig id="fig06" fig-type="figure" position="float">
					<label>Figure 6.</label>
					<caption>
						<p>Sagittal view of a HMGT</p>
					</caption>
					<graphic id="graph06" xlink:href="jemr-11-03-e-figure-06.png"/>
				</fig>

        <p>In order to be able to express the relative coverage of
these two regions on the scene image, we use a measure
similar to the one suggested by Barz et al.[
          <xref ref-type="bibr" rid="b2">2</xref>
          ]. We define S<sub>int</sub> as the
ratio between the interpolation area and the total scene
image area:</p>

<fig id="eq05" fig-type="equation" position="anchor">
					<label>(5)</label>
					<graphic id="equation05" xlink:href="jemr-11-03-e-equation-05.png"/>
				</fig>					

        <p>We also refer to S<sub>int</sub> as the interpolation ratio in the
image.</p>
    </sec>

      <sec id="S4b">
        <title>Gaze estimation error when changing fixation depth</title>

        <p>From now on, we refer to any fixation point in 3D by
its distance from the eye along the Z axis. Therefore, we
define fixation plane as the plane that includes the
fixation point and is parallel to the calibration plane. T<sub>f</sub>B<sub>f</sub> in
Figure 6 shows the part of the fixation plane that is
visible in the image. We can see that the interpolated (green)
and extrapolated (red) regions in the scene image would
change when the fixation plane &#x3C0;<sub>fix</sub> diverges from the
calibration plane. Projecting the red segment on the
fixation plane &#x3C0;<sub>fix</sub> into the scene image will define a larger
extrapolated area in the image. Accordingly, the
interpolated region in the image gets smaller when the fixation
plane goes further away. Therefore, the interpolation ratio
that we get for the calibration plane (S<sup>cal</sup><sub>int</sub>
) is not
necessarily equal to the interpolation ratio that we have for
different depths. Not only the size of the interpolation area
changes when changing the fixation depth, but also the
position of the interpolation region changes in the image.</p>

        <p>Figure 6 illustrates a significant change in the value of
S<sub>int</sub> for a small variation of fixation distance which
happens at very close distances to the eye. We simulate a
HMGT with the simplified eye model (described in Table
1) and a typical scene camera configuration described in
Table 2 to see whether changes of S<sub>int</sub> are significant in
practice. The result is shown in Table 3 for different
fixation distances on a gaze tracker calibrated at distances
0.6 m and 3.0 m. We assume that the calibration pattern
covers about 50% of the image (S<sup>cal</sup><sub>int</sub> = 0.5).</p>

<table-wrap id="t03" position="float">
					<label>Table 3.</label>
					<caption>
						<p>S<sub>int</sub> at different fixation distances for two different calibration distances</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">d<sub>cal</sub>=0.6 m</td>
            <td rowspan="1" colspan="1">d<sub>cal</sub>=3 m</td>
          </tr>
						</thead>
						<tbody>
          <tr>
            <td rowspan="1" colspan="1">0.6 m</td>
            <td rowspan="1" colspan="1">48.8%</td>
            <td rowspan="1" colspan="1">47%</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">1 m</td>
            <td rowspan="1" colspan="1">45.3%</td>
            <td rowspan="1" colspan="1">48.8%</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">3 m</td>
            <td rowspan="1" colspan="1">42%</td>
            <td rowspan="1" colspan="1">49.9%</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">5 m</td>
            <td rowspan="1" colspan="1">41%</td>
            <td rowspan="1" colspan="1">49.3%</td>
          </tr>		  
						</tbody>
					</table>
					</table-wrap>

        <p>The amount of change in the expansion of the
interpolation region depends on the configuration of the camera
and the epipole location in the scene image which is
described by epipolar geometry (see Section Parallax
Error). However, the result shows that for an ordinary
camera setup, these changes are not significant.</p>
      </sec>
	  
      <sec id="S4c">
        <title>Practical grid size and distance for calibration</title>
		
        <p>There are different ways to carry out calibration in
HMGTs. The common way is to ask the user to look at
different targets located at a certain distance from the eye
(calibration distance) and recording sample points from
the eye and scene images while user is fixating on each
target. Target points in the scene image could be either
marked and picked manually by clicking on the image
(direct pointing) or it could be detected automatically
(indirect pointing) using computer-vision-based methods.
The targets are usually markers printed out on papers and
attached to a wall or are displayed on a big screen (or
projectors) in front of the user during calibration.</p>

        <p>Alternatively, targets could be projected by a laser
diode [
          <xref ref-type="bibr" rid="b1">1</xref>
          ] allowing the calibration pattern to cover a wider
range of the field of view of the scene camera. However,
the practical size (angular expansion) for the calibration
grid is limited to a certain range of the FoV of the eye.
The further the calibration plane is from the subject the
smaller the angular expansion of the calibration grid will
be. Calibration distance for HMGTs is usually less than 3
m in practice, and the size is smaller than 50&#xB0; horizontally
and 30&#xB0; vertically and it will not be convenient for the
user to fixate on targets that have larger viewing angles.
The other thing that affects the size is the hardware
components that clutter user&#x2019;s view (e.g. eye camera and
goggles&#x2019; frame). With these considerations, it is very
unlikely that a calibration pattern covers the entire scene
image, thus S<sup>cal</sup><sub>int</sub> is usually less than 40% when using a
lens with a field of view larger than 70&#xB0;&#xD7;50&#xB0; on the scene
camera. Whereas, the calibration grid usually covers
more than 80% of the computer display in a remote eye
tracking setup.</p>

        <p>The number of calibration points is another important
factor to consider. Manually selecting the calibration
targets in the image slows down the calibration procedure
and it could also affect the calibration result due to the
possible head (and therefore camera) movements during
the calibration. Therefore, to minimize the calibration
time and accuracy, HMGTs with manual calibration often
use no more than 9 calibration points. However, detecting
the targets automatically allows for collecting more
points in an equivalent amount of time when the user
looks at a set of target points in the calibration plane or
by following a moving target. Thus the practical number
of points for calibration really depends on the calibration
method. It might for example be worth to collect 12 or 16
points instead of 9 points if this improves the accuracy
significantly.</p>
      </sec>
    </sec>
	
    <sec id="S5">
      <title>Evaluation of different polynomial functions</title>
	  
      <p>The performance of the polynomial functions derived
earlier are compared to an extension of the second order
polynomial model suggested by Mitsugami, Ukita, &#x26; Kidode [
        <xref ref-type="bibr" rid="b14">14</xref>
        ] and with two
models suggested by Blignaut [
        <xref ref-type="bibr" rid="b3">3</xref>
        ] and Blignaut [
        <xref ref-type="bibr" rid="b4">4</xref>
        ]. These models are
summarized in Table 4.</p>

<table-wrap id="t04" position="float">
					<label>Table 4</label>
					<caption>
						<p>Summary of models tested in the simulation. Functions are shown with only their terms without coeffcients.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
							<tr>
								<td rowspan="1" colspan="1">No.</td>
								<td rowspan="1" colspan="1">reference</td>
								<td rowspan="1" colspan="1">S<sub>x</sub></td>
								<td rowspan="1" colspan="1">S<sub>y</sub></td>								
							</tr>
						</thead>
						<tbody>
							<tr>
								<td rowspan="1" colspan="1">1</td>
								<td rowspan="1" colspan="1">Blignaut, 2014</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, y&#x00B2;, x&#x00B2;y&#x00B2;</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, y&#x00B2;, x&#x00B2;y&#x00B2;</td>								
							</tr>
							<tr>
								<td rowspan="1" colspan="1">2</td>
								<td rowspan="1" colspan="1">Blignaut, 2013</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, x&#x00B2;y&#x00B2;, x&#x00B3;, x&#x00B3;y</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, y&#x00B2;, x&#x00B2;y</td>								
							</tr>
							<tr>
								<td rowspan="1" colspan="1">3</td>
								<td rowspan="1" colspan="1">Blignaut, 2014</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, y&#x00B2;, x&#x00B2;y, x&#x00B3;, y&#x00B3;, x&#x00B3;y</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, x&#x00B2;y</td>								
							</tr>
							<tr>
								<td rowspan="1" colspan="1">4</td>
								<td rowspan="1" colspan="1">Derived above</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, y&#x00B2;, x&#x00B2;y, xy&#x00B2;, x&#x00B2;y&#x00B2;, x&#x00B3;, x&#x00B3;y, x&#x00B3;y&#x00B2;</td>
								<td rowspan="1" colspan="1">1, x, y, xy, y&#x00B2;, xy&#x00B2;</td>								
							</tr>
							<tr>
								<td rowspan="1" colspan="1">5</td>
								<td rowspan="1" colspan="1">Derived above</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, y&#x00B2;, x&#x00B2;y, xy&#x00B2;, x&#x00B2;y&#x00B2;, x&#x00B3;, x&#x00B3;y, x&#x00B3;y&#x00B2;</td>
								<td rowspan="1" colspan="1">1, x, y, xy, x&#x00B2;, y&#x00B2;, x&#x00B2;y, xy&#x00B2;, x&#x00B2;y&#x00B2;, x&#x00B3;, x&#x00B3;y, x&#x00B3;y&#x00B2;</td>								
							</tr>							
						</tbody>
					</table>
					</table-wrap>

      <p>Model 5 is similar to model 4 except that it uses Eq. 3
for both S<sub>x</sub> and S<sub>y</sub>. The scene camera was configured with
the properties from Table 2. The 4&#xD7;4 calibration grid was
positioned 1 m from the eye and 16&#xD7;16 points uniformly
distributed on the scene image were used for testing.</p>

      <p>We tested the five polynomial models using 2
interpolation ratios (20% and 50%). Besides the 4&#xD7;4 calibration
grid, we used a 3&#xD7;3 calibration grid for polynomial model
1.</p>

      <p>The gaze estimation result for these configurations are
shown in Figure 7 for the interpolation and extrapolation
regions. Each boxplot show s the gaze error in a
particular region measured in degrees. These figures are only
meant to give an idea of how different gaze estimation
functions per- form. The result shows that there is no
significant difference between models 3 and 4 in the
interpolation area. Increasing the calibration ratio
increases the error in the interpolation region but overall
gives a better accuracy for the whole image. For this test,
no significant difference was observed between the
models 3, 4 and 5.</p>

<fig id="fig07" fig-type="figure" position="float">
					<label>Figure 7.</label>
					<caption>
						<p>Gaze estimation error obtained from different regression models for interpolation and extrapolation regions of the
scene image. Gaze estimation was based on the Pupil center and no measurement noise was applied to the eye image. Errors
are measured in degrees.</p>
					</caption>
					<graphic id="graph07" xlink:href="jemr-11-03-e-figure-07.png"/>
				</fig>

      <p>Similar test was performed with
pupil-cornealreflection (PCR) instead of pupil. The result for PCR
condition is shown in Figure 8. The result shows that
model 5 with PCR over performs other models when
calibration ratio is greater than 20% even though the
model was derived based on pupil position only.</p>

<fig id="fig08" fig-type="figure" position="float">
					<label>Figure 8.</label>
					<caption>
						<p>Gaze estimation error obtained from different regression models for interpolation and extrapolation regions of the
scene image. Gaze estimation was based on the PCR feature and no measurement noise was applied to the eye image. Errors
are measured in degrees.</p>
					</caption>
					<graphic id="graph08" xlink:href="jemr-11-03-e-figure-08.png"/>
				</fig>

      <p>To have a more realistic comparison between
different models, in Section Combined Error we look at the
effect of noise in the gaze estimation result by applying a
measurement error on the eye image.</p>
    </sec>
	
    <sec id="S6">
      <title>Parallax Error</title>
	  
      <p>Assuming that the mapping function returns a precise
gaze point all over the scene image, the estimated gaze
point will still not correspond to the actual gaze point
when it is not on the calibration plane. We refer to this
error as parallax error which is due to the misalignment
between the eye and the scene camera.</p>

      <p>Figure 9, illustrates a head-mounted gaze tracking setup 
in 2D (sagittal view). It shows the offset between the 
actual gaze point in the image x2 and the estimated 
gaze point x1 when the gaze tracker is calibrated 
for plane &#x3C0;<sub>cal</sub> and eye is fixating on the point X2<sub>cal</sub>. 
The figure is not to scale and for the sake of clarity 
the calibration and fixation planes (respectively &#x3C0;<sub>cal</sub> and &#x3C0;<sub>fix</sub>) 
are placed very close to the eye. Here, the eye and scene cameras 
can both be considered as pinhole cameras forming a 
stereo-vision setup.</p> 

<fig id="fig09" fig-type="figure" position="float">
					<label>Figure 9.</label>
					<caption>
						<p>Sagittal view of a HMGT illustrating the epipolar
geometry of the eye and the scene camera.</p>
					</caption>
					<graphic id="graph09" xlink:href="jemr-11-03-e-figure-09.png"/>
				</fig>

      <p>We define the parallax error as the vector between
the actual gaze point and the estimated gaze point in the
scene image (e<sub>par</sub>(x2) = x2x1) when the mapping function
works precisely.</p>

      <p>When the eye fixates at points along the same gaze
direction, there will be no change in the eye image and
consequently the estimated gaze point in the scene image
remains the same. As a result, when the point of gaze
(X2<sub>fix</sub>) moves along the same gaze direction the origin of
the error vector e<sub>par</sub> moves in the image, while the
endpoint of the vector remains fixed.</p>

      <p>The parallax error e<sub>par</sub> for any point x in the scene
image can be geometrically derived by first back-projecting
the desired point onto the fixation plane (point X<sub>fix</sub>):</p>

<fig id="eq06" fig-type="equation" position="anchor">
					<label>(6)</label>
					<graphic id="equation06" xlink:href="jemr-11-03-e-equation-06.png"/>
				</fig>					

      <p>Where P<sup>+</sup> is the pseudo-inverse of the projection
matrix P of the scene camera. And then, intersecting the
gaze vector for X<sub>fix</sub> with &#x3C0;<sub>cal</sub>:</p>

<fig id="eq07" fig-type="equation" position="anchor">
					<label>(7)</label>
					<graphic id="equation07" xlink:href="jemr-11-03-e-equation-07.png"/>
				</fig>					

      <p>Where d<sub>c</sub> is the distance from the center of the
eyeball to the calibration plane and d<sub>f</sub> is the distance to the
fixation plane along the Z axis. Finally, projecting the
point X<sub>cal</sub> onto the scene camera gives us the end-point of
the vector e<sub>par</sub> while the initial point x in the image is
actually the start-point of the vector.</p>

      <p>By ignoring the visual axis deviation and taking the
optical axis of the eye as the gaze direction, the epipole e
in the scene image can be defined by projecting the center
of eyeball C<sub>E</sub> onto the scene image. According to
epipolar geometry this can be described as:</p>

<fig id="eq08" fig-type="equation" position="anchor">
					<label>(8)</label>
					<graphic id="equation08" xlink:href="jemr-11-03-e-equation-08.png"/>
				</fig>					

      <p>Where K is the eye camera matrix and  <sup>E</sup><sub>C</sub>R<sup>T</sup>  and  <sup>E</sup><sub>C</sub>Tr
respectively rotation and translation of the scene camera
related to center of the eyeball. Mardanbegi and Hansen [
        <xref ref-type="bibr" rid="b12">12</xref>
        ] have shown that taking the visual axis deviation into
account does not make a significant difference in the
location of epipole in the scene image.</p>

      <p>Figure 10 shows an example distribution of the
parallax error in the scene image for d<sub>cal</sub> = 1 m and d<sub>fix</sub> = 3 m
on the setup described in Table 2 when having an ideal
mapping function with zero error for the calibration
distance in the entire image.</p>

<fig id="fig10" fig-type="figure" position="float">
					<label>Figure 10.</label>
					<caption>
						<p>parallax error in the scene image for fixation distance
at 3 m when d<sub>cal</sub> = 1m on the setup described in Table2.
This figure assumes an ideal mapping function with
zero interpolation and extrapolation error in the entire image
for the calibration distance d<sub>cal</sub>.</p>
					</caption>
					<graphic id="graph10" xlink:href="jemr-11-03-e-figure-10.png"/>
				</fig>
    </sec>
	
    <sec id="S7">
      <title>Effect of radial lens distortion</title>
	  
      <p>In this section we show how radial distortion in the
scene image, that is more noticeable when using
wideangle lenses, affects the gaze estimation accuracy in
HMGT.</p>

      <p>Figure 2 shows the location of pupil centers in the eye
image when the eye fixates at points that are uniformly
distributed in the scene image. These pupil-centers are
obtained by back projecting the corresponding target
point in the scene image onto the calibration plane, and
rotating the eye optical axis towards that fixation point in
the scene. When the scene image has no radial distortion,
the back-projection of the scene image onto the
calibration plane is shaped as a quadrilateral (dotted line in
Figure 11).</p>

<fig id="fig11" fig-type="figure" position="float">
					<label>Figure 11.</label>
					<caption>
						<p>Calibration grid (small circles) and working area
(red rectangle) marked in the calibration plane and borders of
the scene image when it is back-projected onto the calibration
plane with (dashed line) and without (dotted line) lens
distortion. This figure was drawn according to the settings
described in Table 5.</p>
					</caption>
					<graphic id="graph11" xlink:href="jemr-11-03-e-figure-11.png"/>
				</fig>

      <p>However, when the scene image is strongly affected
by radial distortion, the back-projection of the scene
image onto the calibration plane is shaped as a
quadrilateral with a pincushion distortion effect (dashed line in
Figure 11). Figure 13 shows the corresponding pupil
positions for these fixation points. By comparing Figure
13 with Figure 2, we can see that the positive radial
distortion in the pattern of fixation targets caused by lens
distortion, to some extent will compensate for the
nonlinearity of the pupil positions and adds a positive radial
distortion to the normal eye samples.</p>

<fig id="fig13" fig-type="figure" position="float">
					<label>Figure 13.</label>
					<caption>
						<p>A sample eye image with pupil centers corresponding
to 625 target points in the scene image when having
a lens distortion.</p>
					</caption>
					<graphic id="graph13" xlink:href="jemr-11-03-e-figure-13.png"/>
				</fig>

      <p>To see whether this could potentially improve the
result of the regression we compared 2 different conditions
one with and the other without lens distortion. We want
to compare different conditions independently of the
camera FoV and focal length. Since adding lens
distortion to the projection algorithm of the simulation may
change the FoV of the cam- era we define a &#x201C;working
area&#x201D; which corresponds to the region where we want to
have gaze estimated on. Also, a fixed calibration grid in
the center of the working area is used for all conditions.
Two different polynomial functions are used for gaze
mapping in both conditions using the pupil center: Model
1 with a calibration grid of 3 &#xD7; 3 points, and model 5 with
4 &#xD7; 4 calibration points. The test is done with the
parameters described in Table 5. Also, lens distortion in the
simulation is modeled with a 6th order polynomial [
        <xref ref-type="bibr" rid="b19">19</xref>
        ]:</p>
		
<fig id="eq09" fig-type="equation" position="anchor">
					<label>(9)</label>
					<graphic id="equation09" xlink:href="jemr-11-03-e-equation-09.png"/>
				</fig>					

<table-wrap id="t05" position="float">
					<label>Table 5.</label>
					<caption>
						<p>Parameters used in the simulation for testing the effect of lens distortion</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<tbody>					
							<tr>
								<td rowspan="1" colspan="1">wide-angle lens</td>
								<td rowspan="1" colspan="1">FoV = H : 90&#xB0; &#xD7; V : 60&#xB0;, R = (pan, tilt, yaw) = (0, 0, 0), Tr = (10mm, 30mm, 35mm), focal length=965 pixels, distortion coeffcients= [-0.42, 0.17, -0.00124, 0.0015, -0.034], res=(1280 &#xD7; 960)</td>
							</tr>
						</tbody>							
						<tbody>							
							<tr>
								<td rowspan="1" colspan="1">calibration</td>
								<td rowspan="1" colspan="1">FoV = H : 30&#xB0; &#xD7; V : 25&#xB0;, calibration distance=1m</td>
							</tr>
						</tbody>							
						<tbody>							
							<tr>
								<td rowspan="1" colspan="1">working area</td>
								<td rowspan="1" colspan="1">FoV = H : 50&#xB0; &#xD7; V : 30&#xB0;</td>
							</tr>							
						</tbody>
					</table>
					</table-wrap>					
		
      <p>Figure 12 shows a sample scene image showing the
calibration and the working areas conveying the amount
of distortion in the image that we get from the lens
defined in Table 5.</p>

<fig id="fig12" fig-type="figure" position="float">
					<label>Figure 12.</label>
					<caption>
						<p>A sample image with radial distortion showing the
calibration region (gray) and the working area (red curve).</p>
					</caption>
					<graphic id="graph12" xlink:href="jemr-11-03-e-figure-12.png"/>
				</fig>

      <p>Figure 14 shows a significant improvement in
accuracy when having lens distortion with a second order
polynomial. However, lens distortion does not have a huge
impact on the performance of the model 5 (Figure 15)
because this 3rd order polynomial has already
compensated for the non- linearity of the pupil movements.</p>

<fig id="fig14" fig-type="figure" position="float">
					<label>Figure 14.</label>
					<caption>
						<p>Gaze estimation error in the scene image showing the effect of radial distortion on polynomial function 1 (3 &#xD7; 3
calibration points) (a) with and (b) without lens distortion. The error in the working area for both conditions is shown in (c).</p>
					</caption>
					<graphic id="graph14" xlink:href="jemr-11-03-e-figure-14.png"/>
				</fig>
				
<fig id="fig15" fig-type="figure" position="float">
					<label>Figure 15.</label>
					<caption>
						<p>Gaze estimation error in the scene image showing the effect of radial distortion on polynomial function 5 (4 &#xD7; 4
calibration points) (a) with and (b) without lens distortion. The error in the working area for both conditions is shown in (c).</p>
					</caption>
					<graphic id="graph15" xlink:href="jemr-11-03-e-figure-15.png"/>
				</fig>				

      <p>Besides affecting gaze mapping result, lens distortion
also distorts the pattern of error vectors in the image. For
example, in a condition where we have parallax error,
and no error from the polynomial function, the
assumption of having one epipole in the image at which all
epipolar lines intersect does not hold when we have lens
distortion.</p>
    </sec>
	
    <sec id="S8">
      <title>Combined Errors</title>

      <p>In the previous sections we discussed different factors
that contribute to the final vector field of gaze estimation
error in the scene image. These four factors do not affect
the gaze estimation independently and we cannot
combine their errors by simply adding the resultant vector
field of errors obtained from each. For instance, when we
have e<sub>par</sub> and e<sub>int</sub> vector fields, the final error at point x2
in the scene image is not the sum of two e<sub>par</sub>(x2) and
e<sub>int</sub>(x2) vectors. According to Figure 9, the estimated gaze
point is actually Map(p<sub>x1cal</sub>) = x1 + e<sub>int</sub>(x1) which is the
mapping result of pupil center p<sub>x1cal</sub> that corresponds to
the point x1<sub>cal</sub> on &#x3C0;<sub>cal</sub>. Thus, the final error at point x2 will
be:</p>

<fig id="eq10" fig-type="equation" position="anchor">
					<label>(10)</label>
					<graphic id="equation10" xlink:href="jemr-11-03-e-equation-10.png"/>
				</fig>					

      <p>An example error pattern in Figure 16 illustrates how
much the parallax error could be deformed when it is
combined with interpolation and extrapolation errors.</p>

<fig id="fig16" fig-type="figure" position="float">
					<label>Figure 16.</label>
					<caption>
						<p>An example of error pattern in the image when
having mapping error and parallax error combined.</p>
					</caption>
					<graphic id="graph16" xlink:href="jemr-11-03-e-figure-16.png"/>
				</fig>

      <p>The impact of lens distortion factor is even more
complicated as it both affects the calibration and causes a
non- linear distortion in the error field. Although
mathematically expressing the error vector field might be a
complex task, we could still use the simulation software
to generate the error vector field. This could in practice
be useful if direction of the vectors in the vector field is
fully defined by the geometry of the setup in HMGT.
This could help manufacturers to know about the error
distribution for a specific configuration which could later
be used in the analysis software by weighting different
areas of the image in terms of gaze estimation validity.
Therefore, it will be valuable to investigate whether the
error vector field is consistent and could be defined only
by knowing the geometry of the HMGT.</p>

      <p>The four main factors described in the paper are those
that resulting from the geometry of different components
of a HMGT system. There are other sources of error that
we have not discussed such as: image resolution of both
cam- eras, having noise (measurement error) in pupil
tracking, pupil detection method itself, and the position of
the light source when using pupil and corneal reflection. 
We have observed that noise and inaccuracy in
detecting eye features in the eye image has the most impact
in the accuracy of gaze estimation. Applying noise in the
eye tracking algorithm in the simulation allows us to have
a more realistic comparison between different gaze
estimation functions and also shows us how much the error
vectors in the scene image are affected by inaccuracy in
the measurement both in terms of magnitude and
direction. We did the same comparison between different
models that was done in the evaluation section, but this
time with two levels of noise with a Gaussian distribution
(mean=0, standard deviation=0.5 and 1.0 pixel).</p>

      <p>Figure 18 shows how much the pupil detection in the
image (1280 &#xD7; 960) gets affected by noise level 0.5 in the
measurement. Pupil centers in the eye image
corresponding to a grid of 16 &#xD7; 16 fixation points on the calibration
plane, are shown in red for the condition with noise, and
blue for the condition without noise.</p>

<fig id="fig18" fig-type="figure" position="float">
					<label>Figure 18.</label>
					<caption>
						<p>Pupil centers in the eye image corresponding to
a grid of 16 &#xD7; 16 fixation points on the calibration plane, are
shown by red for noisy condition, and blue for without noise.
Image resolution is 1280 &#xD7; 960.</p>
					</caption>
					<graphic id="graph18" xlink:href="jemr-11-03-e-figure-18.png"/>
				</fig>

      <p>Figure 17 shows the gaze estimation result for noise
level 0.5 with PCR method. No radial distortion was included and
the noise was added both during and after the calibration.
The result shows how the overall error gets lower when increasing
the calibration ratio from 20% to 50%.</p>

<fig id="fig17" fig-type="figure" position="float">
					<label>Figure 17.</label>
					<caption>
						<p>Gaze estimation error obtained from different polynomial models. Gaze estimation was based on the PCR feature.
Resolution of the eye image is set to 1280 &#xD7; 960 and a noise level of 0.5 is applied. Errors are measured in degrees of visual
angle.</p>
					</caption>
					<graphic id="graph17" xlink:href="jemr-11-03-e-figure-17.png"/>
				</fig>

      <p>To see the impact of noise on the direction of vectors
in the image, a cosine similarity measure is used for
comparing the two vector fields (each containing 16 &#xD7; 16
vectors). We compare the vector fields obtained from 2
different noise levels (0.5 and 1.5) with the vector field
obtained from the condition with no noise. For this test, the
calibration and the fixation distances are respectively set
to 0.7 m and 3 m. Adding parallax error makes the vector
field more meaningful in the no-noise condition. In this
comparison we ignore the differences in magnitude of the
error and only compare the direction of vectors in the
image. Figure 19 shows how much, direction of vectors
deviates when having measurement noise in practice with
model 1 and 5.</p>

<fig id="fig19" fig-type="figure" position="float">
					<label>Figure 19.</label>
					<caption>
						<p>This figure shows how much different levels of
measurement noise (in model 1 and 5) affects the direction
of error vectors when having parallax error. The vertical axis
represents the angular deviation.</p>
					</caption>
					<graphic id="graph19" xlink:href="jemr-11-03-e-figure-19.png"/>
				</fig>

      <p>Based on the results shown in Figure 17 we can
conclude that we get almost the same gaze estimation error in
the interpolation region for all the polynomial functions.
Having too much noise, has a great impact on the
magnitude of the error vectors in the extrapolation region and
the effect is even greater in the 3rd order polynomial
models. Figure 19 indicates that despite the changes in
the magnitude of the vectors, when having noise,
direction of the vectors does not change significantly. 
This means that the vector field obtained based on
the geometry could be used as a reference for predicting
at which parts of the scene image the error is larger
(relative to the other parts) and how the overall pattern of
error would be. However, this needs to be validated
empirically on real HMGT.</p>


      <p>Another test was conducted to
check the performance of higher order polynomial
models. The test was done with calibration ratio of 20% and
noise level 0.5 using a pupil-only method. A 4 &#xD7; 4
calibration grid was used for models 1, 2 and 5 and a 5 &#xD7; 5
grid for 4th and 5th order standard polynomial models.
The gaze estimation result of this comparison is shown in
Figure 20 confirming that performance does not improve
with higher order polynomial models even with more
calibration points.</p>

<fig id="fig20" fig-type="figure" position="float">
					<label>Figure 20.</label>
					<caption>
						<p>Comparing the performance of higher order polynomials (4th and 5th order) with 25 calibration points with 3rd
order polynomial models using 16 points for calibration</p>
					</caption>
					<graphic id="graph20" xlink:href="jemr-11-03-e-figure-20.png"/>
				</fig>
    </sec>
	
    <sec id="S9">
      <title>Conclusion</title>
	  
      <p>In this paper we have investigated the error
distribution of polynomial functions used for gaze estimation in
head-mounted gaze trackers (HMGT). To describe the
performance of the functions we have characterized four
different sources of error. The interpolation error is
measured within the bounding box defined by the
calibration points as seen by the scene camera. The
extrapolation error is measured in the remaining area of the scene
camera outside the calibration bounding box. The other
two types of error are due to the parallax between the
scene camera and the eye, and the radial distortion of the
lens used in the scene camera. Our results from
simulations show that third order polynomials provide better
overall performance than second order and even higher
order polynomial models.</p>

      <p>We didn&#x2019;t find any significant
improvement of model 5 over model 4, especially when
the noise is present in the input (comparing figures 17
and 8). This means that it&#x2019;s not necessary to use higher
order polynomials for S<sub>y</sub>.</p>

      <p>Furthermore, we have shown that using wide angle
lens scene cameras actually reduces the error caused by
non- linearity of the eye features used for gaze estimation
in HMGT. This could improve the results of the second
order polynomial models significantly as these models
suffer more from the non-linearity of the input. Although
the 3rd order polynomials provide more robust results
with and without lens distortion, the 2nd order models
have the advantage of requiring fewer calibration points.
We replicated the same analysis we did for deriving
model 4 but with the effect of radial distortion in the
scene image. We found linear relationships between S<sub>x</sub>
and P<sub>x</sub> and also between S<sub>y</sub> and P<sub>y</sub>. The relationship
between S and the coefficients were also linear suggesting
the following model for both S<sub>x</sub> and S<sub>y</sub>:</p>

<fig id="eq11" fig-type="equation" position="anchor">
					<label>(11)</label>
					<graphic id="equation11" xlink:href="jemr-11-03-e-equation-11.png"/>
				</fig>					

      <p>As a future work we would like compare the
performance of the models discussed in this paper on a real
head-mounted eye tracking setup and see if the results
obtained from the simulation could be verified. It would
also be interesting to compare the performance of a
model based on Eq.11 on a wide angle lens with model 4 on a
non-distorted image. The simulation shows that the gaze
estimation accuracy obtained from a model based on
Eq.11 with 4 calibration points on a distorted image is as
good as the accuracy obtained from model 4 with 16
points on a non-distorted image. This, however, needs to
be verified on a real eye tracker.</p>

      <p>Though an analytical model describing the behavior
of the errors might be feasible, the simulation software
developed for this investigation might help other
researchers and manufacturers to have a better
understanding of how the accuracy and precision of the gaze
estimates vary over the scene image for different
configuration scenarios and help them to define configurations
(e.g. different cameras, lenses, mapping functions, etc)
that will be more suitable for their purposes.</p>
    </sec>
  </body>
<back>
<ref-list>
<ref id="b1"><label>1</label><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Babcock</surname>, <given-names>J. S.</given-names></name>, &#x26; <name><surname>Pelz</surname>, <given-names>J. B.</given-names></name></person-group> (<year>2004</year>). <article-title>Building a lightweight eye tracking headgear.</article-title> In <source>Proceedings of the 2004 acm symposium on eye tracking research and applications</source> (pp. <fpage>109</fpage>– <lpage>114</lpage>). <pub-id pub-id-type="doi">10.1145/968363.968386</pub-id></mixed-citation></ref>
<ref id="b2"><label>2</label><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Barz</surname>, <given-names>M.</given-names></name>, <name><surname>Daiber</surname>, <given-names>F.</given-names></name>, &#x26; <name><surname>Bulling</surname>, <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Prediction of gaze estimation error for error-aware gaze-based interfaces.</article-title> In Proceedings of the ninth biennial acm symposium on eye tracking research &#x26;applications (pp. 275–278). New York, NY, USA: ACM. <pub-id pub-id-type="doi">10.1145/2857491.2857493</pub-id></mixed-citation></ref>
<ref id="b3"><label>3</label><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Blignaut</surname>, <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>A new mapping function to improve the accuracy of a video-based eye tracker.</article-title> In Proceedings of the south african institute for computer scientists and in- formation technologists conference (pp. 56–59). <pub-id pub-id-type="doi">10.1145/2513456.2513461</pub-id></mixed-citation></ref>
<ref id="b4"><label>4</label><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Blignaut</surname>, <given-names>P.</given-names></name></person-group> (<year>2014</year>). <article-title>Mapping the pupil-glint vector to gaze coordinates in a simple video-based eye tracker.</article-title> <source>Journal of Eye Movement Research</source>, <volume>7</volume>, <fpage>1</fpage>–<lpage>11</lpage>.<issn>1995-8692</issn></mixed-citation></ref>
<ref id="b5"><label>5</label><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Blignaut</surname>, <given-names>P.</given-names></name>, &#x26; <name><surname>Wium</surname>, <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>The effect of mapping function on the accuracy of a video-based eye tracker.</article-title> In <source>Proceedings of the 2013 conference on eye tracking south africa</source>(pp. <fpage>39</fpage>–<lpage>46</lpage>). <pub-id pub-id-type="doi">10.1145/2509315.2509321</pub-id></mixed-citation></ref>
<ref id="b6"><label>6</label><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Böhme</surname>, <given-names>M.</given-names></name>, <name><surname>Dorr</surname>, <given-names>M.</given-names></name>, <name><surname>Graw</surname>, <given-names>M.</given-names></name>, <name><surname>Martinetz</surname>, <given-names>T.</given-names></name>, &#x26; <name><surname>Barth</surname>, <given-names>E.</given-names></name></person-group> (<year>2008</year>). <article-title>A software framework for simulating eye trackers.</article-title> In <source>Proceedings of the 2008 symposium on eye tracking research and applications</source> (pp. <fpage>251</fpage>–<lpage>258</lpage>). <pub-id pub-id-type="doi">10.1145/1344471.1344529</pub-id></mixed-citation></ref>
<ref id="b7"><label>7</label><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Cerrolaza</surname>, <given-names>J. J.</given-names></name>, <name><surname>Villanueva</surname>, <given-names>A.</given-names></name>, &#x26; <name><surname>Cabeza</surname>, <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Taxonomic study of polynomial regressions applied to the calibration of video-oculographic systems.</article-title> In <source>Proceedings of the 2008 symposium on eye tracking research and applications</source> (pp. <fpage>259</fpage>–<lpage>266</lpage>). <pub-id pub-id-type="doi">10.1145/1344471.1344530</pub-id></mixed-citation></ref>
<ref id="b8"><label>8</label><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Cerrolaza</surname>, <given-names>J. J.</given-names></name>, <name><surname>Villanueva</surname>, <given-names>A.</given-names></name>, &#x26; <name><surname>Cabeza</surname>, <given-names>R.</given-names></name></person-group> (<year>2012</year>, July). Study of polynomial mapping functions in video- oculography eye trackers. ACM Trans. Comput.-Hum. In- teract., 19(2), 10:1–10:25.</mixed-citation></ref>
<ref id="b9"><label>9</label><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Guestrin</surname>, <given-names>E. D.</given-names></name>, &#x26; <name><surname>Eizenman</surname>, <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>General theory of remote gaze estimation using the pupil center and corneal reflections.</article-title> <source>IEEE Transactions on</source>, <volume>53</volume>(<issue>6</issue>), <fpage>1124</fpage>–<lpage>1133</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2005.863952</pub-id><pub-id pub-id-type="pmid">16761839</pub-id><issn>0018-9294</issn></mixed-citation></ref>
<ref id="b10"><label>10</label><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Hansen</surname>, <given-names>D. W.</given-names></name>, &#x26; <name><surname>Ji</surname>, <given-names>Q.</given-names></name></person-group> (<year>2010</year>). <article-title>In the eye of the beholder: A survey of models for eyes and gaze. Pattern Analysis and Machine Intelligence</article-title>. <source>IEEE Transactions on</source>, <volume>32</volume>(<issue>3</issue>), <fpage>478</fpage>–<lpage>500</lpage>.</mixed-citation></ref>
<ref id="b11"><label>11</label><mixed-citation publication-type="book-chapter" specific-use="restruct"><person-group person-group-type="author"><name><surname>Majaranta</surname>, <given-names>P.</given-names></name>, &#x26; <name><surname>Bulling</surname>, <given-names>A.</given-names></name></person-group> (<year>2014</year>). <chapter-title>Eye tracking and eye- based human–computer interaction</chapter-title>. In <source>Advances in physio- logical computing</source> (pp. <fpage>39</fpage>–<lpage>65</lpage>). <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-1-4471-6392-3_3</pub-id></mixed-citation></ref>
<ref id="b12"><label>12</label><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><name><surname>Mardanbegi</surname>, <given-names>D.</given-names></name>, &#x26; <name><surname>Hansen</surname>, <given-names>D. W.</given-names></name></person-group> (<year>2012</year>). <article-title>Parallax error in the monocular head-mounted eye trackers.</article-title> In <source>Proceedings of the 2012 acm conference on ubiquitous computing</source> (pp. <fpage>689</fpage>–<lpage>694</lpage>). <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>ACM</publisher-name>. <pub-id pub-id-type="doi">10.1145/2370216.2370366</pub-id></mixed-citation></ref>
<ref id="b13"><label>13</label><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Merchant</surname>, <given-names>J.</given-names></name>, <name><surname>Morrissette</surname>, <given-names>R.</given-names></name>, &#x26; <name><surname>Porterfield</surname>, <given-names>J. L.</given-names></name></person-group> (<year>1974</year>). Remote measurement of eye direction allowing subject mo- tion over one cubic foot of space. Biomedical Engineering, IEEE Transactions on(4), 309–317.</mixed-citation></ref>
<ref id="b14"><label>14</label><mixed-citation publication-type="conference" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Mitsugami</surname>, <given-names>I.</given-names></name>, <name><surname>Ukita</surname>, <given-names>N.</given-names></name>, &#x26; <name><surname>Kidode</surname>, <given-names>M.</given-names></name></person-group> (<year>2003</year>). Estimation of 3d gazed position using view lines. In Image analysis and processing, 2003. proceedings. 12th international conference on (pp. 466–471).</mixed-citation></ref>
<ref id="b15"><label>15</label><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Model</surname>, <given-names>D.</given-names></name>, &#x26; <name><surname>Eizenman</surname>, <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>An automatic personal calibration procedure for advanced gaze estimation systems.</article-title> <source>IEEE Transactions on Biomedical Engineering</source>, <volume>57</volume>(<issue>5</issue>), <fpage>1031</fpage>–<lpage>1039</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2009.2039351</pub-id><pub-id pub-id-type="pmid">20172802</pub-id><issn>0018-9294</issn></mixed-citation></ref>
<ref id="b16"><label>16</label><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Morimoto</surname>, <given-names>C. H.</given-names></name>, &#x26; <name><surname>Mimica</surname>, <given-names>M. R.</given-names></name></person-group> (<year>2005</year>). <article-title>Eye gaze tracking techniques for interactive applications.</article-title> <source>Computer Vision and Image Understanding</source>, <volume>98</volume>(<issue>1</issue>), <fpage>4</fpage>–<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/j.cviu.2004.07.010</pub-id><issn>1077-3142</issn></mixed-citation></ref>
<ref id="b17"><label>17</label><mixed-citation publication-type="book" specific-use="restruct"><person-group person-group-type="author"><name><surname>Ramanauskas</surname>, <given-names>N.</given-names></name>, <name><surname>Daunys</surname>, <given-names>G.</given-names></name>, &#x26; <name><surname>Dervinis</surname>, <given-names>D.</given-names></name></person-group> (<year>2008</year>). <source>Investigation of calibration techniques in video based eye tracking system</source>. <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-3-540-70540-6_182</pub-id></mixed-citation></ref>
<ref id="b18"><label>18</label><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Sesma-Sanchez</surname>, <given-names>L.</given-names></name>, <name><surname>Villanueva</surname>, <given-names>A.</given-names></name>, &#x26; <name><surname>Cabeza</surname>, <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>Gaze estimation interpolation methods based on binocu- lar data. <italic>Biomedical Engineering</italic></article-title>. <source>IEEE Transactions on</source>, <volume>59</volume>(<issue>8</issue>), <fpage>2235</fpage>–<lpage>2243</lpage>.<pub-id pub-id-type="pmid">22665501</pub-id></mixed-citation></ref>
<ref id="b19"><label>19</label><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Weng</surname>, <given-names>J.</given-names></name>, <name><surname>Cohen</surname>, <given-names>P.</given-names></name>, &#x26; <name><surname>Herniou</surname>, <given-names>M.</given-names></name></person-group> (<year>1992</year>). <article-title>Camera calibration with distortion models and accuracy evaluation.</article-title> <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>14</volume>(<issue>10</issue>), <fpage>965</fpage>–<lpage>980</lpage>. <pub-id pub-id-type="doi">10.1109/34.159901</pub-id><issn>0162-8828</issn></mixed-citation></ref>
</ref-list>
</back>
</article>
