The mean point of vergence is biased under projection

The point of interest in three-dimensional space in eye tracking is often computed based on intersecting the lines of sight with geometry, or finding the point closest to the two lines of sight. We first start by theoretical analysis with synthetic simulations. We show that the mean point of vergence is generally biased for centrally symmetric errors and that the bias depends on the horizontal vs. vertical noise distribution of the tracked eye positions. Our analysis continues with an evaluation on real experimental data. The estimated mean vergence points seem to contain different errors among individuals but they generally show the same bias towards the observer. And it tends to be larger with an increased viewing distance. We also provided a recipe to minimize the bias, which applies to general computations of gaze estimation under projection. These findings not only have implications for choosing the calibration method in eye tracking experiments and interpreting the observed eye movements data; but also suggest to us that we shall consider the mathematical models of calibration as part of the experiment.


Introduction
Humans tend to direct both of their eyes at roughly the same point in 3D space. Binocular saccades and smooth pursuit between objects in a 3D scene often exhibit vergence, which means that two eyes move in opposite directions [1] for fixation to coincide with the intended object. In other words, vergence is the movement of both eyes towards or away from each other, depending on the relative change from the previous to the current target. It is often assumed that the fixation points of the two eyes are perfectly aligned but it has been shown that the eyes first diverge before they converge at the gaze point during fixations [2,3]. Studies on binocular coordination of eye movements during reading show that fixation points of two eyes vary during reading and disparity in both horizontal and vertical directions were observed [4,5]. There is considerable variation among participants ability to fixate the same point in depth, depending on their eye dominance and squinting, or even strabismus (when the weak eye is off-target). In addition, measurements of pupil positions with video-based eye-trackers are very sensitive to variations in pupil dilation [6], which leads to uncertainty over measurements of vergence.
Many approaches have been proposed to estimate the direction of gaze for each eye in physical space, based on recorded pupil positions by eye-tracking devices. Using these gaze vectors, it is possible to reconstruct the gaze point on real three-dimensional stimuli by intersecting one or both rays with the fixated object in space, assuming its geometry is known [7,8,9]. Alternatively, we can attempt to find the point where the two vectors intersect with each other in space [10,11,12,13,14], but in 3D space two gaze vectors typically do not intersect.

The mean point of vergence is biased under projection
However, even if the observer experiences looking at a point in space with both eyes, the eye rays provided by the eye-tracker contain error, for a variety of reasons:  data from eye-trackers have an inaccuracy (systematic error) and introduce imprecision (variable error, or noise) onto the signal [15,16,17].  the inaccuracy is not constant, but varies with pupil size [18] and quantization of the CR in the eye camera [19].  ideally, human gaze direction is controlled to bring the object into the fovea centrals [20], which has a non-negligible extent of 1.5−2°.  it is well-known that in binocular vision, many observers have a dominant eye which is more accurately directed towards the target (in about 70 % of the cases, the right eye) and a weaker eye, which may be considerably off-target [2,3,21], which is called binocular disparity (strabismus in extreme cases).  the resulting unknown and likely non-linear function mapping from tracked pupil and CR centers in the eye video to lines of sight is approximated using low order polynomials [15,22,23].
For these reasons, the two projected eye rays produced by eye-trackers are generally skewed and have no common intersection point in space. In order to calculate a point that approximates the expected intersection of the rays, the most natural and commonly employed solution is to compute the point that has the smallest distance to both rays in 3D. Here we call this point the vergence point. We derive the necessary equations for this computation next and then use it for simulating the reconstruction of vergence points in the presence of systematic (accuracy) and variable (precision) errors. We then develop the mathematical description of inaccuracy and imprecision of gaze vectors, that are used to simulate the effect on estimated intersection point of inaccuracy (offsets) and imprecision (noise). [6] present human data of vergence error as an effect of inaccuracy from changes in pupil dilation. In this work, we focus on the effect of imprecision and present recorded human vergence data that validate the simulation on noise. Finally, we present a method to better estimate the intersection of eye rays and reconstruct the position of the fixated object in 3D, given noisy vergence data. Note that the whole analysis applies to eye tracking not only in space but also on flat surface, as long as the underling projective mapping is used in the model.

Part 1: Mathematical model and simulation
When recording binocular gaze in 3D, the two gaze vectors can be thought of as originating in the centers of the two eyes of the observer. Two vectors in threedimensional space are generally skew, i.e. they have no common (intersection) point. For the reasons mentioned in the introduction, the two gaze vectors are commonly far from intersecting. In order to assign a point of interest given two gaze vectors that have no intersection point, the dominant strategy is to compute the point in 3D space that has the smallest sum of squared distances to the two gaze lines. Here we show how to compute distances of a point to a line by using the formulation of projector, and then how to find the point of interest. and are the centers of the left and right eyes, and and are gaze vectors of unit length, i.e. the eye rays. We want to calculate the point z with the minimal distance to both rays. Choose the coordinate system such that centers of the eyes are displaced symmetrically from the origin along the first coordinate direction. The up direction defines the second coordinate direction and target is placed on the third direction pointing away from the observer. This means the centers of the left and right eyes are The mean point of vergence is biased under projection Here, and in the following, boldface lower-case characters denote column vectors in Euclidean space. Define an up direction as 0, 1,0 . We refer to the first coordinate axis as horizontal and the second one (i.e. along the up direction) as vertical. Objects of interest located at ∈ ℝ are displaced from the eyes mostly along the third coordinate axis, i.e. 0, 0, . Then the normalized (unit-length) vectors from eye to interest point are where the subscripts and refer to the left and right eye. Normalization ensures that the vectors have unit length: 1  e l,r  e l,r 2  e l,r , e l,r  e T l,r e l,r . (3) Note that latter notation for the inner product follows from / being a column vector and the usual conventions for matrix multiplication; we use this notation in what follows.
Given only the positions of the eyes and unit gaze vectors, we want to compute the point closest to both eye rays. One way to do so is measuring the squared distances to the rays and finding the point that minimizes them. Let's first consider a ray through the origin. We define it by specifying a unit direction vector ∈ ℝ , ‖ ‖ . Then the points on the ray in the direction are given by , where ∈ ℝ is a scalar parameterizing the ray.
Consider the (symmetric) matrix . Here denotes the 3 3 identity matrix (we generally use uppercase bold-face letter for matrices) and is an outer product, following directly from the common rules for matrix multiplication: Multiplication of this matrix with any point on the ray yields 0, while multiplying with a vector ∈ ℝ of arbitrary length but orthogonal to , i.e. , yields .
With this way of measuring the distance to a ray, the sum of the squared distances to the eye rays for any point in space can be written as: .
To find the point in space that minimizes this sum of squared distances compute the gradient of this function (with respect to ) and set it to zero: .
In this way the point of interest is defined as the solution of a 3 3 linear system. The system has a unique solution as long as the sum is non-singular. Each of the two matrices , has a one-dimensional kernel: the ray direction , is an eigenvector with zero eigenvalue. If the two gaze vectors are parallel, then the projectors are identical and 2 is singular. This is quite intuitive, as there is no unique point with smallest distance to two parallel lines.
If the eye rays are not parallel, however, the sum is non-singular. This is also geometrically intuitive as there is a unique point minimizing the squared distances to the two lines; and this fact can be proven rigorously ([24] Corollary 2.5). Thus, the point of interest is defined as Eye ray errors Firstly, we introduce variable error (imprecision, noise) , into the eye rays, with separate horizontal and vertical noise as well as separate noise for left and right eyes: ( 1 2 ) While errors are usually represented in terms of angular deviation (i.e. radians), for small enough values the linear approximation is very good and adding the error vectors to the eye ray vectors has the same effect as rotating the eye rays. Including renormalization this yields:  e l  e l  l e l  l ,  e r  e r  r e r  r . (13) With this model we simulate how noise affects the computation of the vergence point. Following the results in [25] (which show that noise in eye trackers are mostly Gaussian distributed), we use a zero-mean Gaussian distribution, i.e.
with the horizontal and vertical deviations being uncorrelated (meaning noise distribution in each direction varies independently) In this setup, error vectors can be generated by simply drawing the components , , , independently from univariate normal (Gaussian) distributions with zero mean and standard deviation A point of interest defines the unbiased eye rays , , which align with the lines of sight. To generate points of vergence in the presence of precision error, we draw error vectors , (i.e. the added noise), modify the eye rays accordingly and reconstruct the vergence point using the linear system above. The python script given in the appendix does exactly this. It outputs the mean vergence and offers graphical output, such as the one shown in Figure 3. We chose three standard deviations of 0.16°, 0.5° and 1.5° for the Gaussian distributions of noise (which are motivated by experimental data described later). 0.5° is the commonly used threshold of the average calibration accuracy of each marker, 0.16° corresponds to the average precision of fixations and 1.5° corresponds to difficult eye tracking situations for example when observers wear glasses. The simulation shows a large error in direction of depth -this is quite intuitive given the short base line relative to the distance to the object.
More severely, the mean vergence point appears to be biased systematically towards or away from the observer. This is clearly visible when the noise distribution has different variance in horizontal and vertical directions as shown in Figure 3: if variance is larger in the horizontal direction, the mean vergence points shifts away from the viewer; if variance is larger in the vertical direction, the mean vergence point shifts towards the viewer. Table 1 shows detailed simulation results when the distance between observer and the fixation target is considered as another variable. We perform the simulation when the fixation target is placed at three different distances, namely 50 cm, 70 cm and 110 cm. The further away the target is, the larger the bias of the mean vergence point contains. Note that this bias is non-linear due to the underlining projective relation. Within each condition, the Figure 3 Vergence points from eye rays distorted with zeromean normally distributed errors (i.e. noise). Distance between two eyes is 6 cm and the object is at the distance of 50cm. Top row: the noise distribution has standard deviation of 0.5 degrees in both horizontal and vertical direction. Despite the relatively small error, the variation in depth is quite large. Middle row: standard deviation in vertical direction is only 0.16, but in horizontal direction it is 1.5. The noise in horizontal direction gets larger and the mean estimated point of vergence is shifted towards larger depths. Bottom row: standard deviation is 1.5 degrees in vertical direction and 0.16 in horizontal direction. Horizontal noise is small, but the mean estimated point of vergence shifts significantly towards the viewer. Same number of samples is drawn in each simulation. 10,000 points are current shown in the figure.
bias is mainly in depth but large variation in horizontal direction leads to large bias while the same amount of variation in vertical direction corresponds to smaller bias.
Secondly, although it is not the focus of this paper, our noise formulation could also be used to investigate the effect of systematic errors (inaccuracy, offset) by introducing constant offset for the eye positions in Equation 12. As shown in the simulation results in Figure 4, unsurprisingly, the resulting errors, i.e. the distances between the estimated mean vergence point and the true target point, are larger when inaccuracy is introduced compared to when only noise is added to data. Furthermore, systematic offsets in the horizontal direction shift the mean of the estimation no matter whether the offsets in the two eyes converge or diverge. Meanwhile, vertical systematic offsets in the same direction lead to larger estimation errors without shifting the distribution mean much. Again, the estimated mean is biased towards the observer when systematic offsets in opposite vertical directions are introduced, similar to the bias we observed when only noise was introduced where the estimation of Table 1. Simulation errors with target placed at three different distance 50 cm, 70 cm and 110 cm. σ x and σ y represents the standard deviation in horizontal and vertical directions measured in degrees. p is the mean position of estimated vergence points in depth. Averaged distance errors and standard deviations are reported in cm with x represents the horizontal direction, y the vertical direction, and z in depth. Notice that the small baseline of distance between the eyes leads to large bias when the target point is further away. On average, the errors of the estimations are 27% (50 cm), 35% (70 cm), and 49% (110 cm) in percentage of the corresponding distances.  the mean vergence point is always closer to the observer than the true target point.

Qualitative analytical analysis of bias
In the section above (and examples shown in Figure  3), we use a simple numerical simulation to visualize the distribution of computed points of interest given an assumption on the probability distribution of the noise. This simulation suggested that the mean vergence point is biased, shifted away from the true target point, depending on the standard deviation of the noise distribution in horizontal vs. vertical directions. We wish to analyze this behavior analytically. For this we attempt to compute the mean (or expected) vergence point for a given probability distribution of noise. As we will see, the general problem is difficult to approach. Yet, by assuming the geometric situation exhibits symmetry (see Appendix A for details) we are able to show that the trend we have observed in the numerical simulation holds qualitatively for wide classes of practically relevant scenarios.
The expected value for a discrete noise distribution would be the sum of values multiplied with the respective probabilities for the input parameters. In the continuous case, this sum turns into an integral over the product of the computed value and the probability distribution for the input parameters.

The projectors
, , are generated from the noisy eye rays , . Assuming a probability distribution , , the expected intersection point is: ( 1 6 ) This integral, in general, cannot be treated analytically. However, note that the resulting mean is a linear function of the individual points of vergence. This means we can generate the mean of several instances of the distribution and then integrate over these local means. We will make use of this property in the following.
From the perspective of applications, we are mostly interested in understanding the bias with respect to different distances in depth. This allows us to concentrate on geometric configurations that exhibit symmetry. First, we assume the fixated object is on the symmetry line and at unit distance, i.e. 0, 0, 1 . Note that by setting the interocular distance 2 this still considers arbitrary distances because geometrically it makes no difference if we change the distance of the target or the distance between the eyes. Second, we assume the probability distributions for left and right eyes are identical. We denote this by dropping the subscript: . Third, we may additionally assume the probability distributions are symmetric around the origin. Point symmetry of the distribution means the probabilities of the variable errors and are the same: . This would allow us to consider pairs of points based on the error vectors , and , , which are symmetric around the line 0, 0, 1 (i.e. one error vector is the reflection of the other by the line). Since their probabilities are the same, their mean is on this line. Pairing all instances of variable errors in this way shows that the mean would be on the symmetry line, for all probability distributions with point symmetry.
Consider the pair of error vectors , and the reversed pair , . By our assumptions, these instances have the same probability. The two pairs give rise to two points of interest. Figure 5 illustrates the case of horizontal error only. In this case, one of the two intersection points is closer to the observer then the target, while the other one further away. Their mean will never be closer then the real target, indicating that the mean over all instances with horizontal noise only will be biased to be further away than the target -as observed in the numerical simulations. We now prove that this intuitive reasoning is correct.
To compute the expected depth value, we start by analytically solving the least squares estimation (LSE) in Eq. 10 for the last component. This can be done by elementary computations. In this way we can express the depth for a certain pair of error vectors , as well as for the reversed pair , . Then we take the mean of the two depth values. All computations can be carried out by a computer algebra system such as Maple or Mathematica. The resulting expression is lengthy and unsuitable for direct inspection. However, considering the case of horizontal noise only leads to the following simple expression: As defined in Equation 1 and 12, 2 equals to the interocular distance and / represents the horizontal component of variable error / . It makes sense to assume that the distance between the displacements / is smaller than the interocular distance 2 (otherwise the intersection of the eye rays would be behind the observer). In this case the denominator is positive but never larger than the numerator, so the depth is larger. Because this is true for any pair of instances (see more in Appendix A), this is also true for the mean resulting from an arbitrary probability distribution (as long as they are they same for both eyes). This confirms our intuitive geometric conclusion from Figure 5.
Next we consider only vertical noise / . In this case it is convenient to consider the pair of error vectors , and , . Note that compared to the former case we now also reverse the signs, which is admissible if the probability distribution is symmetric. Using computer algebra as before, we find that the relative mean depth for the two intersection points is ( 1 8 ) Comparing numerator and denominator reveals that ′ with equality only for regardless of the probability distribution and without any restrictions on the vertical noise.
Together, these two results confirm our observations that noise in horizontal direction biases the mean depth towards larger values, while noise in vertical direction biases it towards the observer. This result holds for all probability distributions, as long as they are identical for both eyes (and exhibit symmetry in the vertical direction).
In any realistic scenario, however, noise errors have horizontal and vertical components. The resulting bias will depend on the relative magnitude of these errors. If the magnitude of horizontal and vertical noise error is equal, then the bias is toward smaller depth and the magnitude of this effect is dominated by the squared difference . In general, however, the variance along the horizontal versus vertical axes in the probability distribution determines the bias in depth. If the variance is higher in horizontal direction, the depth will be biased towards greater values; if the variance is higher in vertical direction, depth will be biased closer to the viewer. As before, this result holds for probability distributions as long as they are identical for both eyes and the point closest to any two eye rays in the distributions is in front of the viewer (not behind).

Part 2: Human data
The numerical simulations above show that variable and systematic errors in eye rays have a significant influence on the estimated mean point of vergence under projection. In this section, we collect human vergence data to study the real noise distributions when using a videobased eye-tracker. Eye-trackers primarily output mapped gaze positions, which in 2D are points on a screen, described in pixel coordinates. This mapping is established using a calibration routine, where participants are asked to look at several targets on the screen while features of their eyes are tracked in the camera image. These precalibrated features -the pupil centre and the centre of the reflection in the cornea -are mapped to gaze positions on the monitor during calibration. With an established mapping, any eye positions can be mapped into the target space, which corresponds to the gaze point.
Several sources of errors are known to interfere in this procedure, in particular when both eyes are being calibrated. Despite existing research on vergence eye movements [2,3,21], there is no established consensus on how to calibrate binocularly. [5] have brought up the question whether binocular calibration (i.e. calibrating both eyes at the same time) or monocular calibration (i.e. calibrate one eye at one time while the view of the other eye is occluded) is better suited for binocular eye movements study. Besides the intrinsic properties of eye movements in binocular viewing, the estimated mapping function in calibration also introduces errors in the estimated point of interest. The estimation of parameters in the mapping function is a minimization procedure. In practice, low order polynomials are often usedto map pupil and corneal reflection positions onto screen coordinates, and the mapping parameters are approximated through an optimization procedure (e.g. least squares), which inevitably contains modeling errors.
Therefore, we designed a data collection including both binocular and monocular viewing conditions and analyzed the detected raw pupil and corneal reflection positions without mapping them to the target space. Two depth variations were included and we used symmetric presentation of stimuli to counterbalance any potential behavioral differences due to spatial dependency. Participants 25 participants from TU Berlin (students and staff) joined our experiment (mean age = 27, SD = 7, 4 females). They all had normal or corrected to normal vision and provided informed consent. Participants with glasses were excluded from the experiment due to concerns over eye tracking accuracy. Five of them had previous experience with eye tracking experiments. Their time was compensated. Additionally we performed a monocular eye examination for each participants and the averaged acuity is 0.93 (SD=0.26) measured in the decimal system. We also measured the eye dominance of observers following the standard sighting eye dominance test, as our experiment does not involve any interocular conflict. Observers were asked to look at a distant point though a small hole placed at arm length, and then to close their eye one after another. Only the dominant eye supposes to see the point while viewing monocularly. Among all 25 observers, only 7 of them has a left dominant eye while 18 of them reported to have a right dominant eye. We discuss on the eye dominance test in the discussion section.

Apparatus and recording setup
The data collection was conducted in a quiet and dark room. We used an EyeLink 1000 desktop mount system in the experiment and binocular eye movements were tracked at a sampling rate of 1000 Hz. A chinforehead rest was used to stabilize observers' head positions. A 24-inch display (0.52 m ×0.32 m, 1920 × 1200 pixels) was placed at two distances of 0.7 m and 1.1 m from the eyes. Stimulus presentation was controlled using PyLink provided by SR Research. Note that any eye trackers can be used for the recordings as our analysis is based on detected eye positions in the camera frames.

Design and procedure
In order to familiarize observers with fixational eye movements, and to verify the setup that observers' eyes were clearly tracked, we ran a default 9-point calibration routine before the data collection started. A calibration is followed by a validation to verify its accuracy. Experiment was continued only when the achieved validation accuracy is on average smaller than 1.0° of visual angle. Otherwise we either repeated the calibration and validation routine immediately or after a short break. However, the resulting gaze data are not used. All further analyses are purely based on the pre-calibrated pupil minus CR (corneal reflection) data, to minimize influence from the calibration mathematics of the EyeLink.
The data collection consisted of two distances and each distance had five repetitions. In each repetition, we presented three viewing conditions, namely monocular viewing with left eye, monocular viewing with right eye, and binocular viewing. Eye covers were rotated by the instructor to occlude/reveal the eyes in different viewing conditions. Meanwhile, the EyeLink was continuously tracking in binocular mode throughout.
For each distance, repetition and condition, the participant was asked to fixate each one out of 12 targets in the form of a ring with an addition central point ( Figure 6). Participants were instructed to follow the marker and fixate the white dot at the center as accurately as possible. Black circles with their center marked by a white dot were used as the target. The radius of the ring corresponded to 7° of visual angle. Each marker was presented for 1.5 s and a beep sound was used to signal the start. Between each fixation trial, participants refixated the center point. In total, there were 12 targets fixations on the periphery targets and 12 at the center target.
Each block of five repetitions for a distance took about 20 minutes and participants were asked to take a break before the second block. In total it lasted about one hour including a simple trial run at the beginning.

Analysis Methods and Results
Our analysis was based on raw samples of detected eye positions, i.e., the pupil minus CR positions, represented as pixel coordinates in the eye camera. The only use of calibrated gaze was our use of the built-in velocitybased algorithm of the EyeLink software to detect fixations. In order to find fixations on targets, we filtered out all short fixations that are less than 100ms in a preprocessing step as short fixations very often are the result of undershoots ( [26], p.222-223) and are quickly followed by a correction saccade onto targets. Conservatively, among fixations on the same target, we considered those that are two standard deviations away from the cluster center as outliers. The remaining fixations were from then on only processed as uncalibrated pupil minus CR.
For each dataset of one observer, we aggregated all fixations (pupil-CR) in the five repetitions for each target in the same viewing condition. On average there are 56 fixations in each such block (SD = 20), and each target has 2.3 fixations as we have 24 fixation targets in each repetition. Here fixations from individual eye are considered independently. The averaged duration of fixations is 821ms (SD = 494ms). There is no significant difference among different viewing conditions at two distances.

Comparing sets of covariance matrices
In the previous simulation (see Section "Ray Errors"), fixation positions have independent zero-mean Gaussian distributions for variable errors in horizontal and vertical directions, which can be represented by a 2 2 diagonal covariance matrix . Diagonal elements in describe the variances in horizontal and vertical directions separately and off-diagonal elements show the correlation between them. Therefore, is a positive semi-definite symmetric matrix, and it can be visualized as an error ellipse with its axes pointing into the directions of its eigenvectors. The lengths of the semi-axes are proportional to the square roots of the corresponding eigenvalues . We used √5.991 as the semi-axis length (derived from a Chi-Square distribution), which gave us the ellipse that covers a 95% confidence interval. We continued using covariance matrices to analyze the error distributions and visualized them as ellipses in the following. Essentially covariance matrix measures the precision in two dimensions, as we are interested in its special distribution.
In our data collection, we had a set of markers on screen and a covariance matrix represented fixation distribution at each marker position. To model the variability among different covariance matrices and to compare the differences among sets of covariance matrices, we computed distances between all pairs of covariance matrices in a set and then compared the resulting distributions. Despite the raising amount of applications of analyzing the variance among covariance matrices, there is still no consensus on how to analyze the covariance structures [27]. We settled on a logarithm-based distance estimator [28], a Riemannian metric, defined as ( 1 9 ) where the logarithm is given by and , , can be factorized from singular-value decomposition (SVD) as . is a diagonal matrix of singular values of . ∥ ∥ is the Euclidean norm (also called Frobenius norm) and it can be computed by the matrix trace ∥ ∥ . In case of covariance matrix, the 2 ) , Figure 6 An exemplary trial. The center was always presented at the beginning of each trial. A randomly selected marker on the ring was then presented for 1.5s followed by the presentation of the center marker for another 1.5s. Then another random selected marker on the ring was presented. The presentation continues in such way until all markers on the ring were presented once. Gray markers are drawn here for illustration purpose and were invisible during experiment.
trace measures the total variation in each dimension without considering the correlations among variables. As shown in [28], this distance measure is symmetric and non-negative, and it is invariant under affine transformation and inversion.
Intuitively speaking, by multiplying with Σ in bilinear form, we transform into a new basis Σ Σ is a perfect circle. The distance measures the relative ratio of eigenvalues in the new basis and the largest eigenvalue of Σ Σ corresponds to the ratio of maximum variance between two groups [27]. To aggregate a set of covariance matrices, we used the arithmetic average as the mean covariance matrix. [29] compared many covariance distance measures especially in the context of shape interpolation; however, it is not clear which one suits best in our case. Here we only want to be able to compare sets of covariance matrices.
In the next step, the distance distributions of two sets of covariance matrices were compared using the Kolmogorov-Smirnov test (KS-test) [30] and a p-value was computed to determine whether the two distributions differ significantly. The KS-test computes the vertical distance between two cumulative fraction functions that are used to represent two distributions and takes the largest distance as the statistic. Therefore, it is robust with respect to variance types of distributions.

Results
Below, we examined whether the bias was present also in the human data. We first compared the distributions of variable errors among all individual observers. Then we aggregated all datasets to examine whether the distributions has any spatial dependency. In the last step, we used the averaged variable error distribution to sample eye positions following the procedure in the previous simulation (see Section Ray Error) to investigate whether there is a bias in the direction of the observer in human data as there was in simulated data.

Eye dominance and acuity do not seem to matter
Each dataset (of one observer) had five repetitions of each viewing condition. Each eye had two viewing conditions, namely monocular viewing and binocular viewing. Targets were presented at two different distances. We first aligned the five repetitions in the same condition and computed the covariance matrix of fixations at each marker position. Following Equation 18, each covariance   Visualization of mean covariance matrices at each marker position. First row shows error distributions of data collected at a distance of 70 cm and second row shows data collected at a distance of 110 cm. In each viewing condition at one distance, data from all observers were aggregated together and covariances at each marker position are visualized as ellipses. For instance, a left-tilted ellipse means that vertical noise is larger than horizontal noise. The size of the ellipse thus corresponds to the total variation while orientation indicates the correlation between horizontal and vertical directions.
matrix was compared to the other eleven covariance matrices. Note that we discarded all fixations at the center marker. One exemplar dataset is shown in Figure 7. Covariance matrices are visualized as ellipses and distributions of all pair-wise distances are plotted as histograms.
Using the KS-test, we compared histograms of the distance distributions of each individual dataset to the others and results are shown in Figure 8. Red line marks the significance level of 0.05. Note that any statistic value larger than the level (i.e. above the line) corresponds to a significantly different distribution. Considering each eye separately, we had four viewing conditions in one dataset, namely monocular viewing of left eye (ML), monocular viewing of right eye (MR), binocular viewing of left eye (BL) and binocular viewing of right eye (BR). At the distance of 70 cm, 13 out of 25 datasets have significantly different fixation distributions in ML, 4 in MR, 5 in BL and 1 in BR. According to the categorization of eye dominance, we have 8 significantly different distributions of dominant eye in monocular viewing and 12 significantly different distributions of nondominant eye. In binocular viewing condition, 2 distributions of dominant eye significantly differ from the others and 6 of nondominant eye. At the distance of 110 cm, the numbers of datasets, which are significantly different from the others, are 10 (ML), 1 (MR), 6 (BL), and 7 (BR). In monocular viewing, 3 and 9 datasets are significantly different for distributions of dominant eye and nondominant eye respectively. In binocular viewing condition, the number of different datasets is 9 for dominant eye and 6 for nondominant eye. Note that this counting is based on the mean statistic value. And we do observer large variances in each dataset as shown in Figure 8.
In conclusion, the differences between dominant and non-dominant eye are so small and varied that we cannot conclude that eye dominance matters. Neither did we observe any correlation between the distribution differences and eye acuity, which indicates that eye acuity does not contribute to the differences in distributions. It is likely that other factors, such as eye colour, may explain part of the noise.  Table 2 for more detailed statistics.

Noise does not vary depending on where participants look
It is known that noise varies across the measurement space ( [26], p.182). To test whether there is any such spatial dependency of noise distributions of fixations, we computed the distance distribution at each marker position by comparing each pair of covariance matrices from 25 datasets. Similarly, we applied the KS-test to compare the distance distributions. Only one distribution from monocular viewing of right eye when screen was at a distance of 70 cm is significantly different. To visualize the variance at each marker position, we computed the mean of 25 covariance matrices for each marker in one viewing condition and plotted corresponding ellipses in Figure 9. Note that comparing two distances, covariance ellipses are similar in both sizes and orientations. But ellipses in binocular viewing condition have smaller sizes but still similar orientations. In conclusion, the amount of noise does not seem to differ between the conditions, nor between positions. There is however a tendency that vertical noise is larger than horizontal noise.

But there is bias in the human data
After we have established that noise does not vary across eye dominance, acuity and fixation position, we can now aggregate the covariance matrices at all marker positions, which gives us an overall representation of noise distribution of fixations in each condition. Using the distributions -a covariance matrix for each of the 12 fixation point in each condition -rather than the data themselves introduces no bias, but is computationally easier, since it allows us to calculate the mean easily without being biased by unevenly taken data (i.e. uneven contribution of individual observers because of fixations of differing lengths).
We used the arithmetic mean of the 12 covariance matrices (of 12 targets) to represent the fixation distribution in each viewing condition and obtained two covariance matrices, one for each eye in either monocular or binocular viewing condition at one distance. As shown inFigure 10, an ellipse is used to represent a covariance matrix. Errors seem to be larger when targets move from 70 cm to 110 cm with increased sizes of ellipses. Noise in binocular viewing condition is smaller comparing to that in monocular viewing condition, evidenced by smaller ellipses in most positions in all conditions. The radius of the ring corresponds to 7° of visual angle and opposite sample positions span a visual angle of 14°. Following this ratio, we converted sample units (measured in pixel) into degrees of visual angle and applied the analysis framework used in the simulation above. Variances of error distributions shown in Figure 10 approximate to 1° of visual angle, which is commonly used as a calibration threshold in 2D eye tracking experiment. Additionally we also experiment with 2° of visual angle, which is equivalent to the start-of-art eye tracking accuracy in 3D space [9,10,13]. Simulated results based on sampling from real error distributions are plotted in Figure 10. On the left side we have spatial distributions of vergence points when variance was converted to 1° of visual angle and on the right side we see the results after converting to 2° of visual angle. Detailed statistic results are given in Table  2. Although the error in estimated mean vergence point is acceptable when everything is perfect within 1° of visual angle, however, achieving such accuracy in 3D is rather challenging. Even with an acceptable accuracy of 2°, there exists a strong bias towards the viewer in the estimated mean point of vergence.

Minimizing the uncertainty in vergence point estimation
When researching vergence using a video-based pupil and corneal reflection eye-tracker such as the Eyelink, what can you do to minimize biases and errors? We assume that the human participant has a negligible difference in gaze directions between left and right eye, that luminance conditions are fixed and no other effects on pupil dilation are present, and that the only remaining issue is to minimize the bias from the noise in the signal.
As this bias is an effect of the projective mapping, the non-linear mapping that is commonly used to estimate the vergence point in space, there is not much you can do if you use the calibration routine shipped with the eyetracker. The single fixation per calibration point will have an inaccuracy and noise that increases the bias. However, it is possible to instead record multiple fixations on each point in your own set of calibration targets. Then take the average of the data samples in the several fixations per fixation target and use that average to calibrate the eye-tracker. The key is to use many fixations per calibration point. The same principle applies to real experiment after calibration: collect many fixations if possible. For each point of interest, taking the average of fixation samples in the camera space leads to better estimation in space. In practice, this could mean a repetition of the same experimental condition, for example, where observers are asked to refixate on the same targets in a vergence study.
The reason this method works is that for geometrical reasons, it is better to average in the calibration data than in the depth data of the intersection points. Assume that noise in the eye samples are Gaussian distributed, taking the average in the calibration data leads to better approximation to the noise-free eye samples. However, due to the non-linearity of the mapping, averaging in the depth data only leads to a bias as we see before. Figure  11 shows for simulated data how the offset in depth decreases with an increasing number of fixations per calibration point.
This solution works on the DPI, the EyeLink and the SMI eye-trackers, which all provide access to the center point of the corneal reflection and the pupil (for EyeLink  Table 2. Vergence errors sampled from real noise distributions. σ represents converted standard deviation in degree of visual angle. p represents the mean point of vergence in depth measured in cm. Averaged errors and standard deviations are reported in cm with x represents the horizontal direction, y the vertical direction, and z in depth. On average, the errors in percentage of the corresponding distance are 30% σ 1 and 46% 2 when target distance is 70 cm, and 41% 1 and 50% 2 when target distance is 110 cm. and SMI) or the 4th Purkinje (for the DPI) in the data. Note that the recipe works with raw positions of the eyes and does not depend on specific calibration model given by any eye trackers.

Discussion
The major finding in this paper is that vergence data from eye-trackers exhibit a bias, depending on the noise level in horizontal vs. vertical directions. This applies to the estimation of the point of vergence in threedimensional space, as well as to the vergence estimation on planar surface in two-dimensional space. As long as a projective model is used in the mapping, the estimated mean vergence point will be biased.

Reading research and calibration algorithms
The bias we have found is in line with [5], who reported that the fixations during reading were almost always crossed, peaking 2.6 cm in front of the plane of text, which is very much in line with the error reported in Table 2. However, instead of resulting in a discussion about potential biases from the measurement technology itself, the subsequent papers instead investigated whether monocular vs. binocular calibration could cause the crossing of fixations.
Liversedge et al. [31] had found that "When the points of fixation were disparate, the lines of gaze were generally diverged (uncrossed) relative to the text (93% of fixations),..., but occasionally converged (crossed) (7% of fixations)." using a monocular calibration. Later study by Kirkby et al. [32] replicated the finding that -after monocular calibration -the majority of fixations are uncrossed, both with the EyeLink and the DPI.
Our results using the collected human data show a significant bias towards observers for both monocular and binocular calibration, corresponding to crossed fixations in reading. This bias must have existed also for the studies by Liversedge et al. [31] and Kirkby et al. [32], so it is surprising that they found crossing result in the opposite direction of the bias.
Švede et al. [33] argued that monocular calibration is the only physiologically correct form in the sense that it preserves the difference in gaze direction between the two eyes. It could be the case that the average gaze differences between left and right eye after monocular cali-bration are so large that they consume the whole bias and nevertheless can remain uncrossed.
Despite the fact that the bias in the estimated gaze positions in binocular viewing is smaller than the bias in monocular viewing condition, the ratio in depth between target position and the estimated mean vergence point is around 1.2.

Eye dominance, interpupillary distance and fixation disparity
The collected human data show a large variance among individual observers. However, neither eye dominance nor acuity leads to significantly different noise distribution and where participants look also doesn't seem to matter. Nevertheless we should be aware that eye dominance information is based on observer's self-report. Moreover, the eye dominance is determined by the socalled sighting eye dominance test. A recent study [34] brought up the question whether one type of eye dominance exists and their results indicate disagreements among different eye dominance test methods, especially the difference between sighting eye dominance test and binocular rivalry based test. Even though our experiment does not involve any interocular conflict, the actual dominant eye during the experiment might still be different from the measured one, which might further explain the observed no-impact findings.
Interpupillary distances may also influence the bias, as it changes the geometry, i.e. the short edge of the triangle formed by the two eyes and a target. How the bias is correlated to the interpupillary distance is not studied in this work.
It is not clear so far how fixation disparity may contribute to our observations. [4] reported that fixation disparity decreased over time during reading, and it is tightly linked to vergence eye movements [35,36,37] as well as binocular vision [38]. Additionally, vision training may improve stereo perception and eliminate fixation disparity [39]. We neither performed any fixation disparity test nor measured the binocular version in our study, although how to accurately measure fixation disparity seems to be an on-going effort [40]. Future studies should include measurements such as the near point of convergence, positive and negative fusional range, dissociated phoria at near and far, stereopsis and amplitude of accommodation. It would also be interesting to test whether there exists a correlation between fixation disparity and the bias in individual's dataset.

Limitations of our study
Independent Gaussian distributions were used for each dimension. It would be interesting to study the dependency between noise in horizontal and vertical directions and to see how a multivariate distribution influences the bias.
In our experiment targets were displayed on a flat plane without the need of focus change. But noise distribution might be different due to the dynamic changes of focus in 3D. For example we might need to take into account the changes of pupil size. How to collect enough fixation data in 3D while maintaining the same precision is another practical but challenging problem.
To minimize the bias from the signal noise, we suggest to collect many fixations per calibration target, and then use the mean fixation point to calculate the mapping. Future work should validate this proposal with real human data. Note that this proposed recipe does not account for fixation disparity, i.e. the alignment difference between the dominant eye and the weak eye.

Suggestion for future experiments
There seems to be no good reason to prefer one viewing condition to the other in calibration. In monocular calibration, each individual eye is forced to fixate on targets without the interference of binocular fusion and eye dominance. However, visual acuity is also limited in monocular viewing and possibly decreases over distances. Precision of eye movements in binocular viewing condition seems to be higher.
We believe it is important to be aware that the propagation of noise may lead to a bias in the estimated mean vergence point in space. It is also very important for future eye movement studies, especially for vergence studies, to provide the validation error of calibration not just as a single scalar, but also in the form of a spatial distribution (i.e. an error ellipse). It provides a confidence level of the observed data and puts their interpretation into perspective. It is not even clear if the error distribution would be roughly similar for all types of tracking devices and experiments, or if the distribution changed with the type of experimental task. If so, resulting mean vergence depth would vary with experimental setup.
Moreover, our results suggest that, for researchers using eye-tracking devices, it is good to think about the procedure, instead of being only concerned about the data after calibration. The mathematical models behind calibration might provide additional information as being part of the experiment.

Conclusions
The estimated point of interest from intersection of eye rays has large error in depth. The mean of the linearly estimated points of vergence is biased and depends on the horizontal vs. vertical noise distribution of the fixation positions. It is generally hard to interpret results for depth from binocular vision and our suggestion is to take the average of fixations of the same target to minimize the uncertainty, in both calibration and experiment phase.

Ethics and Conflict of Interest
The authors declare that the contents of the article are in agreement with the ethics described in http://biblio.unibe.ch/portale/elibrary/BOP/jemr/ethics.ht ml and that there is no conflict of interest regarding the publication of this paper.