Robust Head Mounted Wearable Eye Tracking System for Dynamical Calibration

In this work, a new head mounted eye tracking system is presented. Based on computer vision techniques, the system integrates eye images and head movement, in real time, performing a robust gaze point tracking. Nystagmus movements due to vestibulo-ocular reflex are monitored and integrated. The system proposed here is a strongly improved version of a previous platform called HATCAM, which was robust against changes of illumination conditions. The new version, called HAT-Move, is equipped with accurate inertial motion unit to detect the head movement enabling eye gaze even in dynamical conditions. HAT-Move performance is investigated in a group of healthy subjects in both static and dynamic conditions, i.e. when head is kept still or free to move. Evaluation was performed in terms of amplitude of the angular error between the real coordinates of the fixed points and those computed by the system in two experimental setups, specifically, in laboratory settings and in a 3D virtual reality (VR) scenario. The achieved results showed that HAT-Move is able to achieve eye gaze angular error of about 1 degree along both horizontal and vertical directions.


Introduction
Eye Gaze Tracking (EGT) system includes a device able to continuously acquire and follow eye position over time and compute gaze point coordinates in the environment around the subject through an analytical relationship.In the current literature, EGTs have been exploited in different fields, ranging from the medical field (detecting the relationship between oculomotor characteristics and cognition and/or mental states) to user interaction (helping people with disabilities or level attention detection), from multimedia to product design (C.Morimoto & Mimica, 2005).Currently, most EGTs are based on the Video-OculoGraphy (VOG) technique, which is a method for tracking eye movements through computer vision techniques used to process eye images (Van der Geest & Frens, 2002).Through VOG, pupil position and iris landmarks are detected by means of image processing algorithms and used to calculate both eye rotation angles and the eye center.VOG-based EGTs can be classified into two main categories identified by the position of the camera, that is dedicated to acquiring eye images with respect to the user.In particular, if the camera is placed on a fixed support in front of the subject the systems are named remote EGTs, when placed on the head of the subject they are named Head Mounted EGTs (HMEGTs), i.e. portable EGTs.Of course, the choice between the two types of systems poses different technical and methodological challenges.Generally, in laboratory environment remote EGTs are employed.They allow a quite precise measure, but impose limitations in the kind of information that can be retrieved.Specifically, remote EGTs could make use of either the chin support to block the user's head resulting unsuitable for long term acquisitions but with a good accuracy, or intelligent algorithms such as Active Shape Model (ASM) (T.F. Cootes, Taylor, Cooper, & Graham, 1995) to detect the user's eye allowing for limited head movements.They often require really expensive high definition camera that make remote EGTs suitable for investigating oculomotor strategy in neurological investigation, (Meyer, B öhme, Martinetz, & Barth, 2006).On the contrary, HMEGTs have opened new scenarios for research and for markets where the user is free to move his head on which the equipment is directly mounted (Zhu & Ji, 2007).HMEGTs make it possible to investigate the eye-movement during natural tasks in uncontrolled environments (M.Land & Lee, 1994) and in real life scenarios.Although in the real life scenario some uncontrollable external factors such as illumination changes could make the experimental setting time-consuming and not easily repeatable, at the same time HMEGTs allow for dynamical monitoring even in case of time-variant experimental paradigms (Hayhoe & Ballard, 2005).One of the most significant approach to prevent luminosity change issues is based on infrared (IR) illumination.In particular, spectral (reflective) properties of the pupil, under near-IR illumination, are exploited to maximize the image-contrast (C.Morimoto, Amir, & Flickner, 2002;C. Morimoto & Mimica, 2005).Nevertheless, in order to combine the advantage of supervised laboratory settings with real-life-like environments one of the most interesting approach is based on virtual reality immersive space (Fahrenberg, Myrtek, Pawlik, & Perrez, 2007).Virtual Reality (VR) offers an excellent compromise between laboratory and natural world accounting for a systematic control of the stimuli and the variables involved.Given the advantages of immersive VR spaces, HMEGTs have opened up new exploiting directions of the human interaction over time (Henderson, 2003;Jacob & Karn, 2003) focussing on understanding and coding naturalistic behaviour (Hayhoe & Ballard, 2005).However, even though many researches exploited HMEGTs for studying user attention, their perception of surrounding objects, and user interest as well as eye pattern in affective stimulation obtaining good results, (Lanatà, Valenza, & Scilingo, 2013;de Lemos, Sadeghnia, Ólafsd óttir, & Jensen, 2008;Partala & Surakka, 2003;Lanata, Armato, Valenza, & Scilingo, 2011), the performances of these systems drastically decade when the user head is free to move for example in the investigation of VR space navigation (C.Morimoto & Mimica, 2005).In this context, this work aims at developing a new HMEGT (named HAT-Move) to be used either in real life or in completely immersive 3D-virtual reality worlds.

Head Movement Issue
The possibility of freely moving the head with HMEGTs requires a robust and reliable identification and tracking of the pupil center and gaze point.Generally, all of the eye tracking systems developed both for market and research purpose make use of an uneasyto-perform calibration procedure, that should be very accurate.As a matter of fact, the better is the calibration the better is the outcome of the EGT.In particular, given a certain number of points (i.e, calibration points) in the real world and fixed their coordinates on the acquired image, the calibration is an analytical relationship that maps the coordinates of the eye gaze computed by the system into the coordinates of the calibration points (Hartley & Zisserman, 2000)..This procedure can be differentiated by both the number of calibration points and the kind of mathematical models used to generate this relationship, (Ramanauskas, Daunys, & Dervinis, 2008).In the literature many efforts have been spent to improve gaze estimation in terms of increasing tolerance to head movements, and as a consequence to improve and simplify the calibration process (Cerrolaza, Villanueva, & Cabeza, 2008;Johnson, Liu, Thomas, & Spencer, 2007;Evans, Jacobs, Tarduno, & Pelz, 2012).In fact, in spite of several attempts of EGTs to enable head movements which can be found in the literature, the head movement keeps remaining an open issue.Babcock et al. introduced a projected grid of 9-points in front of the person, (Babcock, Pelz, & Peak, 2003), as a reference for calibrating the eye position with the camera scene image.Even though this system is designed to be used in natural task it requires many calibration steps during the experiments.Moreover, since a measure of the movement of the head is missing the recalibration process is based on human-operator expertise.Rothkopf et al. tried to use head movements to disambiguate the different types of eye movements when subjects move head and body (Rothkopf & Pelz, 2004).However, the performance of the algorithm declines for large motions in roll, and sometimes the algorithm fails completely.Data analysis showed that the recordings suffered from a significant noise level compared to experimental conditions in which the subject does not move neither the body nor the head, highlighting that much more complex patterns of eye and head movements had to be considered.Johnson et al. allowed relatively unrestricted head and/or body movements (Johnson et al., 2007) tracking them by a visual motion tracker but large errors were found.Moreover two relevant drawbacks have been reported: the eye tracker became less reliable at larger eye rotations and errors arose if there was a shift in the relative position 3D reference on the eye-tracker's visor and the participant's head.As a result of this background it is worthwhile noting that augmenting the eye trackers capacity of differentiating among the movements of the head and the eyes could improve their ability in performing an effective and robust gaze detection.In this view, HAT-Move aims at overcoming the state of the art proposing a novel method for integrating head movement contribution in a free to move eye gaze detection.

Eye Movement
One of the crucial point of the proposed system is the ability in detecting and monitoring the eye movement.Specifically, eye movements can be classified as pursuit and smooth pursuit, saccades and nystagmus movement.Pursuit movements or smooth pursuit are eye movements used for tracking an object in movement, therefore a moving image has to remain constrained to the fovea to achieve a stable image seen by the user.Fovea is a small area of the retina with a very high visual acuity, and it covers about 2 degrees of visual angle.Saccades are rapid movements of eyes for scanning a visual scene.They are also present when the subject is fixating one point (Findlay, 2009).Nystagmus indicates involuntary eye movements.More specifically, when the head rotates about any axis, distant visual images are sustained by rotating eyes in the opposite direction on the respective axis.Physiologically, nystagmus is a form of involuntary eye movement that is part of the vestibulo-ocular reflex (VOR), characterized by alternating smooth pursuit in one direction and saccadic movement in the other direction (Westheimer & McKee, 1975).As a matter of fact the brain must turn the eyes so that the image of the fixed object falls on the fovea.Of note, the pursuit system keeps up the moving object in order to allow the brain to move the eyes in opposite direction to head motion, otherwise, image slip on the retina and a blurred image is produced.
In this work, we propose an innovative version of the already partially proposed wearable and wireless eye tracking system (Armato, Lanatà, & Scilingo, 2013).The new system, HAT-Move, is comprised of only one light camera able to capture simultaneously the scene and both eyes of the subject through a mirror.The system calculates the gaze point in real-time both indoor and outdoor.Moreover, a precise and fast Inertial Motion Unit (IMU) was added to render it independent from involuntary head movements, during the calibration phase, or to render it robust to head movements needed to focus on objects out of the visual space.Specifically, in this study we evaluated the contribution, in terms of angular error, of the head movement integration with respect to standard eye tracking.Moreover, HAT-Move system inherits from the previous version the robustness against illumination variation implementing a normalization of the illumination through a Discrete Cosine Transform

Materials and Methods
The HAT-Move eye tracking system was developed in two configurations: the "baseball" hat (see fig. 1) and the head band (see fig. 2).They are technically and functionally equivalent, although the former can be considered aesthetically more pleasant.It is comprised of a wireless camera that is light and small, with an Audio/Video (A/V) transmitter of up to 30m of distance.The camera has a resolution of 628 x 586 pixels with F2.0, D45 optic, and 25 frames per second (f.p.s.).In addition, the InfraRed (IR) filter, which is actually present in each camera, was removed and a wide-angle-lens was added allowing to enlarge the view angle and acquire natural infrared components increasing both the image resolution and the contrast between pupil and iris.This system is able to simultaneously record the visual scene in front of the subject and eye position.This is achieved through a mirror(5 x 0.6cm) placed in front of the user's head (see fig. 2).The system is completely customizable on the user's forehead (see fig. 2).In addition, a wireless Inertial Motion Unit is placed atop the head close to the azimuth rota-Figure 3. Representation of the head along with IMU.In the figure X e , Y e are the axis in the reference frame of the eye; X cam , Y cam , Z cam are the axis in the reference frame of the camera; X h , Y h , Z h are the axis in the reference frame of the center of the head where the IMU is placed; y h , q h , f h are the Euler-angles of the head rotation while y e , q e , are the Euler-angles of the eye rotation.
tion center (see fig. 3).The IMU allows for the acquisition of head movements and rotations during natural activities, allowing for the correction of eye gaze estimation taking into account both the movements during the calibration phase and the "Vestibulo-Ocular Reflex" (VOR) contributions.The adopted IMU provides the three rotation Euler-angles of the head (see Fig. 3) with a sampling frequency of 100 Hz.The system is intended to be wearable, minimally invasive, capable of eye tracking, estimating pupil size, lightweight, and equipped for wireless communication.Moreover the system is designed to be attractive and aesthetic (in a baseball-like version), and able to process the eyegaze pattern in real-time.The HAT-Move system uses a passive approach for capturing ambient light reflected by the eye (VOG).The customized image acquisition system is able to acquire both natural light along with its IR components that are already present in the natural light bandwidth.Therefore, the system presents IR lightning advantages increasing the pupil-iris contrast and avoiding any possible eye injury due to artificial IR illuminators.The block diagram in Figure 4 shows the methodology used to process the acquired image, in which both eyes and scene are presented.The whole processing chain is comprised of a series of algorithms for the detection of the eye center and for the integration of the head rotation angles and the correction of involuntary head movements.Specifically, the eye center detection is achieved through the fol- lowing steps: eye region extraction algorithm, photometric normalization algorithm of illumination, extraction of the pupil contour, and ellipse fitting algorithm as well.Afterwards, the center of the eye is detected, Eulero-head-angles together with the pupil center are integrated into the mapping function to map the eye center and movements into the image plane.The processing chain is fully described in the following sections.

Extraction of the eye region
The region containing the eye must be extracted from the image in which both scene and eyes are simultaneously acquired (see fig. 5).It is obtained through an automatic detection of the rectangular area including the eye (named Region Of Interest (ROI)), (see fig. 5 and 6).For this purpose a modified version of the Active Shape Model was used (ASM).The ASM is an artificial intelligence algorithm generally used for the detection of objects by means their shape (T.F. Cootes et al., 1995), in particular, several modified versions of this algorithm were implemented over time for face detection applications.More specifically, after defining the "landmarks", or distinguishable points present on every image, e.g. the eye corner locations, the shape was represented by an x-y coordinate vector of those landmarks which characterize the specific shape.It is worthwhile noting that a shape of an object does not change when it is moved, rotated or scaled, and the average Euclidean distance between the shape points is minimized through a similar transformation in order to align one shape with an other.Indeed, ASM is a recursive algorithm that starts from an initial tentative shape and adjusts the locations of shape points by means of a template matching the image texture around each point, aimed at adapting the initial shape to a global shape model.The entire search is repeated at each level in an image pyramid, from coarse to fine resolution; specific details can be found in (T.Cootes & Taylor, n.d.).In our study, after the HAT-Move is worn, the first thirty seconds were used to train the ASM and then to detect the ROI.Since the system is mounted on the head, the extracted ROI does not change through-  out the experiment.In addition, only the red-imagecomponent is converted in gray scale and used as input to the other processing blocks (see fig. 6).This image component, indeed, is especially helpful in enhancing the contrast between pupil and background.

Illumination normalization
Illumination normalization relies on an algorithmic strategy to keep stable illumination conditions throughout the captured images.More specifically, environmental illumination changes are reflected in the acquired images as a variation of the eye representation in terms of intensity thereby strongly reducing the contrast between eyes and landmark.The standard approach is based on the Retinex theory, (E.H. Land & McCann, 1971) whereby the effect of a non-uniform illumination is eliminated and is completely independent of any a-priori knowledge of the surface reflectance and light source composition.According to this theory, the image intensity I(x, y) can be simplified and formulated as follows: where R(x, y) is the reflectance and L(x, y) is the illuminance at each point (x, y).The luminance L is assumed to contain low frequency components of the image while the reflectance R mainly includes the high frequency components of the image.The technique Figure 7 shows the output of the DCT algorithm applied to gray scale image reported in figure 6.

Pupil tracking and ellipse fitting
This section deals with the method used to extract the pupil contours.The method is comprised of several blocks in which the acquired eye image is first binarized in order to separate the pupil from the background by using a threshold in the image histogram; then a geometrical method was used to reconstruct pupil contours and to remove outliers belonging to the background, details of this algorithm con be found in (Armato et al., 2013).Following the geometrical detection of the points belonging to the pupil, an ellipse fitting algorithm is implemented for pupil contour reconstruction and for detecting the center of the eye.In the literature, the ellipse is considered to be the best geometrical figure representing the eye, being the eye image captured by the camera a projection of the eye in the mirror.Over the last decade many ellipse fitting algorithms have been proposed (Forsyth &  Ponce, 2002; Bennett, Burridge, & Saito, 2002), although most work offline.In our system we used the Least Square (LS) technique, which is based on finding a set of parameters that minimizes the distance between the data points and the ellipse, (Fitzgibbon, Pilu, & Fisher, 2002).According to the literature this technique fulfills the real time requirement (Duchowski, 2007).Specifically, we follow the algorithm proposed by Fitzgibbon et al., which is a direct computational method (i.e.B2AC, it is the exact name of Fitzgibbon's algorithm which is based on the solution of a quadratic polynomial) based on the algebraic distance with a quadratic constraint, in which a gaussian noise is added for algorithm stabilization, (Maini, 2005).Afterwards, the center of the eye is computed as the center of the fitted ellipse.A detailed description of the methods can be found in (Armato et al., 2013).Figure 8 shows the result of pupil tracking and the ellipse fitting algorithm for reconstructing pupil contours.

Mapping of the position of the eye
The mapping procedure aims at associating the instantaneous position of the center of the eye to a point of the scene.This point is named gazepoint.This procedure is mainly based on a mathematical function, named mapping f unction, which is an equation system constituted of two second order polynomial functions (C.H. Morimoto, Koons, Amir, & Flickner, 2000) defined as: x si = a 11 + a 12 x ei + a 13 y ei + a 14 x ei y ei + a 15 x ei 2 + a 16 y ei 2 (3) y si = a 21 + a 22 x ei + a 23 y ei + a 24 x ei y ei + a 25 x ei 2 + a 26 y ei 2 (4) where x si , y si are the coordinates of a point on the image plane (i.e. the coordinates of the point on the screen mapped into the image plane captured by the camera), and x ei , y ei are the coordinates of the center of the eye coming from the ellipse fitting block, referred to the image plane as well.The procedure is intended to solve the equation system by means of a calibration process.
Once the system is positioned onto the subject's head in a manner that eyes and scene are simultaneously presented in the image captured by the camera, the user is asked to look at some specific points on the screen (calibration process).These points are identified by coordinates s i = (x si , y si ) referred to the image plane (i.e. the image captured by the camera), (see fig. 5).Since the coordinates of the calibration points are known to solve the equation system, we have to compute the coefficients a 1,1 to 6 , and a 2,1 to 6 that are unknowns.The results are achieved because each calibration point defines 2 equations, and, considering a 9-point calibration process, the system is over constrained with 12 unknowns and 18 equations and can be solved using Least Square Method (LSM).Head movements mainly affect the calibration process, resulting in artifact movement that degrades eye estimation and consequentially the point of gaze as well.Two different problems related to these movements arise.The first consists of the modification of the image plane position, which follows the head rotations being attached to the forehead, while the second is due to the involuntary movement.
In this work an innovative integration process is implemented, as described in the next paragraph.

Movement Integration Process
This integration process is related to the adjustment of the eye gaze as a consequence of changes in the calibration plane orientation with respect to the camera and the compensating eye rotations against head rota-tions.These issues are mainly due to the user's inability to hold the head still for the whole duration of the calibration process.Consequently, these movements reduce the system accuracy.This process is based on data gathered from the IMU.First of all, according to figure 10, O h X h Y h Z h , we define the cartesian system in the reference frame of the IMU; OX cam Y cam Z cam the cartesian system in the reference frame of the camera; OX e Y e a cartesian system in the reference frame of the center of the eye; O 0 x i y i the cartesian system on the image plane; f the focal distance; c(x c , y c ) the projection of the central point of the calibration plane onto the image plane; s i (x si , y si ) the projection of P(x, y, z), which is a generic calibration point, onto the image plane.Moreover, we also define the following rotations: q h , y h and f h head rotation angles around Y h , X h , and Z h axes, respectively; and q e and y e the eye rotation angles around Y e and X e , respectively.The Movement Integration (MI) is performed during the acquisition of the 9 target points s i = (x si , y si ), the 9 points related to eye positions e i = (x ei , y ei ), and synchronously the Euler angles of the head (q h , f h and y h see fig.10).In Figure 10.Representation of the head, IMU, image plane and real plane, respectively.In the figure, X e , Y e are the axes in the reference frame of the eye; X cam , Y cam , Z cam are the axes in the reference frame of the camera; x i ,y i are the axes in the reference frame of the image plane; X h , Y h , Z h are the axes in the reference frame of the center of the head where the IMU is placed; y h , q h , f h are the Euler-angles of the head rotation while y e , q e , are the Euler-angles of the eye rotation; c(x c , y c ) is the projection of the central point of the calibration plane on the image plane; s i (x si , y si ) the projection of P(x, y, z), which is a generic calibration point, on the image plane.particular, the MI process performs both the realignment of eye center position on the image plane when the VOR occurs and the remapping of the calibrated space onto the image plane when the head of the user is rotating.Hence at the end of the process, the mapping function will compute the adjusted coordinates of the eye center x ei , y ei , and the corrected coordinates of the calibration point, s i = (x si , y si ), both referred to the image plane.Referring to the eye rotation angles, they were estimated taking advantage of VOR contributions by means of Vestibulo-Ocular calibration curves.These curves are estimated to quantify eye rotations by a mathematical model for transforming eye rotations, expressed in degrees, into movements of the eye center along the vertical and horizontal axes, expressed in pixels (Crawford & Vilis, 1991).Here, vestibulo-ocular curves are two curves (one is for the rotation y around x axis and the other is for the rotation q around y axis) computed asking the user to rotate the head around the horizontal and vertical axes fixing the gaze on point "C" in front of him while acquiring eye tracking, gaze, head movements and the scene over time.A linear fitting is applied to both rotations for extracting gain and offset, as expressed by the formula: qe = G q e P x + O q e (5) Specifically, the adjusted eye rotations are computed according to the following equations: y 0 e M = ỹe + y h + 4(y h ) where q 0 e and y 0 e are the corrected eye angles and qe and ỹe are the eye angles of a specific subject wearing the system computed as explained in eq. 5 and 6. 4(q h ) and 4(y h ) are obtained as a decomposition of f h , which is the head rotation around the z axis.More specifically, when a head rotation around z occurs the IMU provides a value of f different from 0 (f 6 = 0) and taking into account P x and P y continuously provided by HAT-Move, which are related to this variation of f, the values of 4q e and 4y e are obtained by means of the equations 5, 6. Afterwards the corrected angles are calculated by using equations 7 and 8, and then applying the equations 5, 6 the corrected coordinates of the eye center are obtained.At this point starting from the new eye angles by means of the eqs.5, 6 the corrected coordinates of the eye center are simply computed.The correction of the projection of the calibration point on the rotated image plane is carried out by means of geometrical considerations.Figure 11 shows a transverse view of the subject, the image plane and the calibration plane (or real plane), during the initial conditions.In this case, the subject is aligned with the central point of the calibration plane (point number five, named C, see fig.14).Let us define a generic calibration point P 8(5):2, 1-15 Lanata, Greco, Valenza & Scilingo (2015) Robust Head Mounted Wearable Eye Tracking System for Dynamical Calibration  Figure 13.Representation of the error after positive q h rotation and its projection onto image plane p.During a positive head rotation of q h (see fig. 12) (which is a reasonable head movement during the calibration process) the projection of P in the image plane results p 0 instead of p (see fig. 13).It means that an error is made in the calibration process and consequently it will be propagated in the gaze estimation.Therefore, in taking into account the acquired head rotations, it is possible to correct the projection of P in the exact position by means of an algorithm based on some geometrical considerations reported in figure 13.In the case shown in fig.13 x 0 p results to be less than x c , but other cases can be identified: • x p 0 > x c and (q 0 + q h ) < 90 ; • x p 0 > x c and (q 0 + q h ) > 90 ; and for q h < 0, we have: • x p 0 > x c and (q 0 q h ) < 90 ; • x p 0 < x c and (q 0 q h ) < 90 ; • x p 0 < x c and (q 0 q h ) > 90 ; Considering all of the cases mentioned above, the correction along the x axis can be reassumed by the following equations: otherwise The y p corrections are the same as x p corrections using the angle y h instead of q h ; however, the relationships are inverted with respect to the sign of the angle.Therefore: if y p 0 < y c then otherwise The correction by f h gives a contribution for both x and y coordinates.All cases for both f h < 0 and f h > 0 can be summarized by the following equations: if (x 0 p < x c and y 0 p < y 0 c ) or (x 0 p > x c and y 0 p > y 0 c ) then 1 the symbol M = indicates that it is a new formulated equation The correct coordinates of both the eye center and the calibration points will be used in the mapping function system (eq.3, 4) to detect the new gaze point.

Experimental setup
This section deals with two experimental setups.The first aims at defining a protocol to validate the accuracy of the HAT-Move and the relevance of correction process (hereinafter named Accuracy Estimation) in laboratory settings, while the second is mainly a proof of concept concerning the use of the HAT-Move system in a 3D VR scenario (hereinafter named VR Estimation).To this extent we evaluated the applicability of the HAT-Move with the correction process in a quasi naturalistic environment when the subject is free to move his head and to walk around in the VR space.. Accuracy Estimation.The first experiment was performed by group of 11 subjects who did not present any ocular pathologies.All of the subjects were asked to sit on a comfortable chair placed in front of a wall (3 x 3 m 2 ) at a distance of 2 meters, while wearing the system.The wall surface was constituted of a set of black squares (2 cm per side) immersed into a white background.This controlled environment permitted to verify the system functionality during the whole experimental session.More specifically, experiments were divided into two main blocks; the first was related to VOR curves computation, and the second was related to the estimation of the eye gaze tracking.11 subjects of both genders were recruited having different eye colors.8 subjects had dark eyes and 3 had bright eyes.The average age was 27.8 years.For the first session, the VOR curves were computed on the whole group of subjects asking them to rotate their head first around x axis (y angle) and then around y axis (q angle) fixing point C placed in front of them Possible involuntary movements of the head around the azimuth z axis (f angle) have been considered through their contributions along both y and q.During the experiment, the subjects were asked to rotate the head around the horizontal and vertical axes fixing point C placed in front of them.These calibration curves are intended to be used for solving the equation system 5 and 6 for each subject.Specifically, the system can be solved by imposing some constraints.The first constraint regards the initial condition; in fact, when the user is looking at the central calibration point C, before moving the head, his head-angles are forced to be null, and P x and P y are considered equal to the starting eye coordinates extracted from the ROI.During the required movements around the axes, with a fixed gaze while the head is rotating, the IMU values exactly correspond to the eye rotation angles but in opposite directions (these angles were captured at a sampling frequency of 100 Hz).Therefore, by a linear fitting applied to both rotations, gains were extracted for each subject.Afterwards, by using the specific average gains (G q e , G y e ) in the equation system, 5 and 6 each specific offset for each subject was computed using the initial condition, where the eye angles were null.At the end of the process, given the G q e , G y e , O q e , and O y e as well as P x and P y in the image plane all corresponding eye angles ( qe , ỹe ) are determined.Moreover, both data with and without VOR contribution were collected and compared.The second session was organized into two phases.During the first phase the subjects were invited to look at the central point of the calibration plane (C point in the fig.14), initial condition, and to the other calibration points with their eyes indicated by the numbers, in an arbitrary order (fig.14).Simultaneously, the experimenter marked the corresponding point seen on the image plane.In the second phase, the subjects were invited to follow the square indicated by letters with their eyes.This second phase was performed in two configurations.The first configuration was carried out with a chin support, where the head was completely still.Afterwards, the second configuration was conducted without any support, so the head was free-to-move.The results of the two configurations were statistically compared.
VR Estimation.Eleven subjects were enrolled for the 3D VR scenario experiment.Our VR application consists of an immersive room equipped with a number of sensors and effectors including three projection screens, a sound system, and a tracking system (fig.15).The subjects were asked to wear the system and, after the calibration phase, to freely walk around the room looking at the three screens.A total number of 10 circles were shown one by one in a random order and unknown position (see fig. 15).The whole experiment duration was of 10 minutes for each subject.During the calibration phase the head movement correction was performed.The accuracy of the system was calculated in terms of median of angular errors between the circle positions and the estimated eye position across all the subjects, Moreover, a statistical comparison was performed between the accuracy results in the laboratory and in the VR conditions.

Experimental results
In this section we report on the achieved results for both experimental setups.

Accuracy estimation
Here, the computation of the VOR curves as well as the computation of the accuracy of the system along x, and y axes were computed in terms of angular error.In particular, by means of the angular error, we evaluated the effect of the correction process comparing the gaze point in three different "modalities".Specifically, the first was computed when the head was completely still, making use of a chin support (hereinafter called Stationary Mode), the second and third were performed without any support.More specifically, the second was computed applying only the integration of the calibration plane orientation changes (hereinafter called Screen Point), and the third was obtained applying both the integration of plane orientation changes and VOR contribution (hereinafter called Screen Point + VOR).
Table 1 reports average gain and standard deviation of VOR for both vertical (q) and horizontal (y) axes.

Table 1 Average Gain VOR
G q e G y e 52.44±2.8361.36±3.46 These values were then used as gain corrections to estimate the specific offset for each subject.The accuracy, where d pixel represents the distance between the subject and the calibration plane.Tables 2 -3 show median and median absolute dispersion of errors per subject, expressed in degrees, in stationary head condition for x and y axes (similar information are shown from the figures 16 -17).In particular, the first column refers to the values without any correction, the column Screen Point is referred to the error values, in the image plane, with the correction of the calibration plane orientation only actuated by IMU, and the column Screen Point + VOR is referred to the values, in the image plane, with the MI of both calibration plane orientation and VOR together.The corrections are reported for three head rotation angles q, f, and y.     sion of errors per subject, expressed in degrees, with free head movement condition for x and y axes (the same information cane seen in figures 18 -19).In these Tables the columns report only the values for the column Screen Point, which refers to the error values applying the correction of the calibration plane orientation and the column Screen Point + VOR, which is referred to the values applying the correction of both calibration plane orientation and VOR together.In these Tables the corrections are reported for 3 head rotation angles q, f, and y.The head movements were related to an average rotation amplitude of 20 degrees around the three axes.Tables 4 -5 do not report the column "Without MI", with respect to Tables 2 -3 because the calibration process in case of head movement cannot be computed.More specifically, calibration would produce wrong random values by chance.As a matter of fact, Figure 20 reports an example of calibration during head movement.In this figure, it can be noticed that calibration points are strictly concentrated on a small part of the image plane making the system completely unusable.Table 6 show results from Friedman non para-  without MI.More specifically, the test returns the probability that the different samples were not belonging to the same population.A pairwise comparison was performed for every pair of samples after the rejection of the null hypothesis carrying out a Mann-Whitney test with a Bonferroni adjustment.Results show that in Stationary mode the error between actual and computed coordinates of the gaze point estimated with and without MI are statistically equivalent, while when the head is moving the errors belong to different populations when MI (Screen + VOR) is not used.On the contrary, after the integration of the head movement, the statistical analysis reported that no significant difference is achieved among median in stationary head (stationary) and head free to move (movement) conditions.

VR estimation
Results achieved in the VR scenario are reported in Table 7 and are expressed in terms of angular errors along the X and Y axes.The Table shows for each subject the median ± the median absolute value of the angular errors computed on the gaze points of the 10 circle targets.The last raw represents the inter-subject median angular error, which resulted to be equal to 1.45 degree for the X coordinate and 1.54 degree for the Y coordinate .Moreover, a statistical comparison between the results achieved in laboratory and virtual reality conditions was performed by means of a Mann-Whitney test.For both coordinates, no significant differences were found between the two experimental setup conditions (X p-value > 0.05 -Y p-value > 0.05) confirming the usability of the HAT-Move even in free-movement conditions.Lanata, Greco, Valenza & Scilingo (2015) Robust Head Mounted Wearable Eye Tracking System for Dynamical Calibration

Real-time estimation
The execution time of the main tasks and the entire software integrated into the system was about of 34.47 ms.The working frequency is about 29 Hz, which is greater than camera sampling frequency (25Hz), therefore the real-time requirement was fulfilled.¡

Discussion and conclusion
Even though many current HMEGT systems are used without any integration of the movement of the head either for image plane orientation nor for the Vestibulo-Ocular Reflex, they are used with partial head movements.This study pointed out that HMEGT systems are strongly affected by head movements.More specifically, when the chin support is used the angular errors are acceptable.This conclusion is supported by the literature and also confirmed by this study, see figures 16 and 17, and tables 2, 3,.Indeed, no strong divergence between the median values for both errors along x and y is present, and confirmed by Friedman nonparametric test for paired sample with Bon f erroni post-hoc correction for multiple comparison reported in Table 6 no significant difference is shown for stationary mode.However, the same statistical tests highlighted that when head slightly moves the errors dramatically increase.It is worthwhile noting that the multiple comparisons reported in figures showed that during head movements the gaze point diverge with errors of 4 5 degrees.This experimental evidence suggests that the proposed corrections are essential to achieve an accurate eye gaze tracking in dynamical conditions.In fact, since eye tracking systems are often used also in medical environments for detecting pathologies which can range from behavioral to neurological fields, these errors could bring to misleading interpretations.The effectiveness of the head movement integration has been proven by the statistical comparison on x and y directions.In fact, it is possible to reduce the angular errors achieving no statistical difference between stationary mode and head movements, showing that the system keeps the same accuracy in both modalities.The estimated median error with head movement is reduced from 5.32 with a standard deviation of 2.81 (without VOR correction) to 0.85 with a standard deviation of 0.44 and from 4.70 with a standard deviation of 3.94 (without VOR correction) to 1.78 with a standard deviation of 1.05 for x and y axes, respectively.The obtained accuracy results confirm the reliability and robustness of the proposed system.Moreover, the difference between the accuracy along x and y can be due to angular position of the camera (which is above of the eyes) which reduces the accu-racy of the vertical positions of the pupil.In addition, this system fulfilled the realtime requirement being the execution time of the algorithm lower than the time interval between two consecutive video frames.In order to test the system even in conditions in which the subject was completely free to move and walk, we have developed an experiment in a virtual reality environment.We asked the subjects to look at random points on the screen of the VR moving into the room.Accuracy resulted to be equal to 1.45 degree for the X coordinate and 1.54 degree for the Y coordinate.No significant differences were found from the accuracy in the laboratory conditions.This result confirm the robustness of the proposed system even in scenarios similar to real environments The main limitation of the system is the low frame rate of the camera.This limitation does not allow the system to acquire fast saccadic movements, which are known to be in the time range of 30 ms, while it is able to acquire slow saccadic movements around 100 ms.The proposed system is equipped with low cost hardware and it results extremely lightweight, unobtrusive, and aesthetically attractive providing a good acceptability by the end users.Thanks to these properties and its technological specifications the HAT-Move system allows investigating how humans interact with the external environment continuously.This ecological approach (Bronfenbrenner, 1977) could be pursued either at individual or community level with the aim of analyzing and coding both activities and relationships.More specifically, this kind of information can be really useful in studying non-verbal social behavior in both healthy and impaired people (e.g.affected by behavioral pathologies such as autistic disorders) as well as to improve the scientific knowledge on human interpersonal relationships.Furthermore, HAT-Move system has been already shown to be useful for studying eye pattern as a response of emotional stimulation with good and promising results (Lanatà et al., 2013;Lanata, Armato, et al., 2011).As a matter of fact eye feature pattern could provide a new point of view in the study of personal and interpersonal aspects of human feelings.In such a way eye information which we already showed to be informative in emotional response, could be integrated with other sets of physiological data (Betella et al., 2013) such as cardiac information (Lanata, Valenza, Mancuso, & Scilingo, 2011) or heart rate variability (Valenza, Allegrini, Lanata, & Scilingo, 2012) , respiration activity (Valenza, Lanatá, & Scilingo, 2013), as well as electrodermal response (Greco et al., 2012) in a multivariate approach (Valenza et al., 2014) in order to create a very complete descriptive set of data able to explain the non-verbal phe-nomenology behind implicit (autonomic nervous system response) and explicit human response to external stimulation in real or virtual scenario (Wagner et al., 2013).It could be really helpful also in the investigation of several pathologies where the unobtrusiveness of the instrumentations could allow of monitoring naturally evoked response of participants, (Lanatà, Valenza, & Scilingo, 2012;Valenza, Lanata, Scilingo, & De Rossi, 2010;Lanatà et al., 2010;Armato et al., 2009;Valenza et al., 2010;Lanata, Valenza, et al., 2011).This latter issue could be used either on healthy participants or subject suffering from pathologies such as autism (Mazzei et al., 2012) as well as alterations of mood (Valenza, Gentili, Lanata, & Scilingo, 2013) etc. in which social skills are a strong impact on the lifestyle of it should improve the scientific knowledge on human interpersonal relationships, emotions etc.Moreover, to achieve these aims further efforts will be devoted to integrating a high-speed camera with high resolution in order to capture fast saccadic movements and to providing better accuracy.Moreover, an optimization process will be addressed to develop new multithreading algorithms based on the interocular distance in order to obtain a 3D eye tracking system.
This work is supported by the European Union Seventh Framework Programme under grant agreement n. 258749 CEEDS.

Figure 4 .
Figure 4. Block diagram showing all the algorithmic stages of the processing of eyes and outside scene.

Figure 5 .
Figure 5. Example of single frame captured by the camera.The rectangular area marked by red represents the ROI.

Figure 6 .
Figure 6.Red component of the ROI.

Figure 7 .
Figure 7. Eye image after the application of illumination normalization algorithm by DCT

Figure 8 .
Figure 8. Results of the pupil tracking and ellipse fitting algorithm.a) In blue are represented geometrical construction for pupil contour detection; Contour points are evinced in yellow.b) In red the fitted ellipse is highlighted.

Figure 9 .
Figure 9. Block Diagram of the mapping function calculation process.

Figure 11 .
Figure 11.Initial condition of the calibration.

Figure 12 .
Figure12.Representation of the positive q h rotation.

Figure 15 .
Figure 15.Virtual Reality environment used for the completely free movement scenario.

Figure 16 .
Figure 16.Box plot of data from the statistical comparison of the Median and Median Absolute Dispersion (MAD) along with x axis, with and without head movements.In the case of head movements without any VOR correction.

Figure 17 .
Figure 17.Box plot of data from the statistical comparison of the Median and Median Absolute Dispersion (MAD) along with y axis, with and without head movements.In the case of head movements without any VOR correction.

Figure 18 .
Figure 18.Box plot of data from the statistical comparison of the Median and Median Absolute Dispersion (MAD) along with x axis, with and without head movements.In the case of head movements with VOR correction.

Figure 19 .
Figure 19.Box plot of data from the statistical comparison of the Median and Median Absolute Dispersion (MAD) along with y axis, with and without head movements.In the case of head movements with VOR correction.

Figure 20 .
Figure 20.Calibration of Subject 6.The 9 calibration points are marked in blue; the calibration was performed during head movements.It is an example of the errors produced by uncorrected head movement calibration

Table 2
X accuracy with head in stationary mode Median 0.85±0.310.84±0.370.82±0.37Table 3 Y accuracy with head in stationary mode

Table 4
X accuracy with the head free to move, corrections are reported for three rotation angles

Table 5 Y
accuracy with the head free to move, corrections are reported for three rotation angles

Table 6
Results of the Friedman non parametric test with Bonferroni correction applied to Stationary Mode and Head Movement conditions "without MI" and with "Screen point + VOR" for x and y axes, respectively.In the Table p values are shown.

Table 7
Evaluation results of median ± MAD angular error computed for X and Y axes in a 3D VR scenario