Combining EEG and Eye Tracking : Using Fixation-Locked Potentials in Visual Search

Visual search consists of finding a target in the midst of distractors, and is a complex task that involves many neural pathways and systems including the visual system, working memory, and attention to identify relevant areas of interest within a visual scene (Kastner & Ungerleider, 2000; Petersen & Posner, 2012). Visual search has been described both in terms of exogenous and endogenous components. Exogenous visual search is driven by properties of a visual scene, which appear more salient due to human visual processing neural pathways, as the central nervous system is structured to respond to certain stimuli preferentially (Albright, 2012). Visual receptors and pathways have evolved to capture key features automatically, such as color, motion and edge. This automatic, pre-attentive process (Treisman, 2006) is quick (Montagna, Pestilli, & Carrasco, 2009), and requires little conscious effort. Theories of exogenous control assume a saliency map (Koch & Ullman, 1985; Treisman & Gelade, 1980) where locations of likely relevance are identified. Driven by these maps, attention serves as a control system that biases the filtering of feature and location information to support threat detection and response selection (Müller & Krummenacher, 2006). On the other hand, endogenous, top-down attentional orienting during visual search occurs when attention is consciously directed in a voluntary way according to goals and intentions (Mulckhuyse & Theeuwes, 2010). Endogenous attention can be allocated to a location within about 300–500 ms and may be sustained for several seconds (Montagna, et al., 2009). Combining EEG and Eye Tracking: Using Fixation-Locked Potentials in Visual Search


Introduction
Visual search consists of finding a target in the midst of distractors, and is a complex task that involves many neural pathways and systems including the visual system, working memory, and attention to identify relevant areas of interest within a visual scene (Kastner & Ungerleider, 2000;Petersen & Posner, 2012).Visual search has been described both in terms of exogenous and endogenous components.Exogenous visual search is driven by properties of a visual scene, which appear more salient due to human visual processing neural pathways, as the central nervous system is structured to respond to certain stimuli preferentially (Albright, 2012).Visual receptors and pathways have evolved to capture key features automatically, such as color, motion and edge.This automatic, pre-attentive process (Treisman, 2006) is quick (Montagna, Pestilli, & Carrasco, 2009), and requires little conscious effort.Theories of exogenous control assume a saliency map (Koch & Ullman, 1985;Treisman & Gelade, 1980) where locations of likely relevance are identified.Driven by these maps, attention serves as a control system that biases the filtering of feature and location information to support threat detection and response selection (Müller & Krummenacher, 2006).On the other hand, endogenous, top-down attentional orienting during visual search occurs when attention is consciously directed in a voluntary way according to goals and intentions (Mulckhuyse & Theeuwes, 2010).Endogenous attention can be allocated to a location within about 300-500 ms and may be sustained for several seconds (Montagna, et Visual search is the primary role of The Transportation Safety Administration (TSA) Transportation Safety Officers (TSOs), who are tasked with identifying potential threat items within cluttered carry-on bags at over 7000 security checkpoints in the United States.As bags are screened using X-ray technology, TSOs must determine whether they believe the bag to be free from threats, in which the bag is "cleared", whether there is a potential threat, in which the bag is subjected to further search, or whether a serious threat exists.If threats are highly prevalent, potential or serious threat decisions are more likely, and "clear" responses are slower because such a response would often lead to a mistake (Wolfe, Horowitz, & Kenner, 2005).Baggage screening is a repetitive visual search task that often has a very low probability of encountering a threat, but high consequences if a serious threat is missed.In baggage screening, since threats are of low prevalence, a "cleared" response will often lead to a successful outcome and thus becomes the more frequent decision.Observers will tend to abandon a search in less than the average time required to find a target (Wolfe & Van Wert, 2010) under such circumstances.
Regardless of the decision made by a TSO, there is little quantifiable information available regarding what led to a decision.
The addition of real-time neurophysiological measures could provide a more granular understanding of the decision making process throughout training and performance, and neurophysiological signatures could also be developed to mitigate potential threat misses.Electroencephalography (EEG) is a well-established non-invasive technique for brain monitoring with high temporal resolution and relatively low cost.As such, EEG has proven to be a critical monitoring and diagnostic tool in the clinic (Lagerlund, Cascino, Cicora, & Sharbrough, 1996;Mendez & Brenner, 2006).EEG is also a popular research tool among scientists for evaluating somatosensory responses to stimuli, error detection (Davidson, Jones, & Peiris, 2007), and sleep or fatigue monitoring (Colrain, 2011;Landolt, 2011), among other uses.Various EEG components in the temporal domain have been used to define distinct phases of cortical processing in response to stimulus presentation.Such event-related potentials (ERPs) have been used to noninvasively study visual (Clark, Fan, & Hillyard, 1995;Hillyard, Vogel, & Luck, 1998), auditory (Naatanen & Picton, 1987), and somatosensory processing (Wada, 1999).One component particularly important in visual processing is the P3, a time-locked deflection which appears 300 -400 ms after stimulus presentation, first described a half century ago (Sutton, Braren, Zubin, & John, 1965).Training can alter perception and motor learning (Censor, Sagi, & Cohen, 2012), including visual discrimination (Stickgold, Whidbee, Schirmer, Patel, & Hobson, 2000), key to TSO screening tasks, which can be monitored using ERPs (Song, Ding, Fan, & Chen, 2002).However, the use of such an evaluation outside the laboratory lacks an indication of when a participant visually fixed upon a stimulus of interest.
Eye tracking technology offers the possibility of capturing visual behavior in real-time and monitoring locations of fixations within images (Hansen & Ji, 2010).Recently, eye tracking technology has become more accurate and user friendly.It has extended to various areas that led to a wide range of applications (Duchowski, 2002;Jacob, 1991).The current study was designed to test the utility of using eye-tracking fixation points on targets to parse simultaneously recorded EEG, and to test the feasibility of developing a unique neurophysiological fixation-locked event related potential (FLERP) classifier to monitor performance in visual search tasks.

Stimuli and apparatus
ScreenADAPT software, developed by Design Interactive Inc., is an adaptive software suite designed to reduce the time to criterion baggage screening performance during training, and allows creation of Xray images of carry-on luggage with customized content.The image generator is fed by X-rayed threat and distractor libraries rendered from individual 3D models obtained from public-domain websites, overlaid onto 'clear' X-ray bag images in various orientations and positions, where clear bags included a variety of nonthreat items typical of carry-on luggage (see Figure 1).Threat images included guns and knives, and distractor images were chosen to be intentionally similar in size and shape to threats.A single threat or distractor image was inserted into each existing X-ray baggage image.

Participants
All methods involving participants were approved by an independent Institutional Review Board.Forty novice participants [20 male, 20 female; average age 28 ±8 (SD) years] completed and received payment of $100 USD for participation in this study.All participants were recruited from the community and met minimum recruitment requirements for TSA officers including citizenship (US citizen), age (over 18 years), education (high school diploma or equivalent) and vision (20:20 or corrected to 20:20) requirements.
All participants were fully informed about the procedure and purpose of the study, which lasted approximately 2 -3 hours.

Apparatus
Baggage images were displayed on a 48 cm flat panel monitor with 1280 x 1024 pixels.A standard mouse and keyboard were used to interact with the system.Participants were seated 40 cm from the monitor, with a remote video eye tracking system (easyGaze, Design Interactive Inc., Oviedo, Florida) situated directly below the monitor at a 30° viewing angle to acquire participants' eye position.The system utilizes nearinfrared (NIR) LEDs to illuminate the eyes of the participant and gathers the data via binocular dark pupil tracking at 30 Hz. Calibration was done with a 16-point grid to ensure accuracy of both eyes to meet a minimum of 0.5°, which included horizontal and vertical position of the gaze point, distance from each eye to the camera, and pupil diameter.A dispersion-threshold method was used to identify fixations as groups of consecutive gaze points within a maximum separation of 20 pixels (Salvucci & Goldberg, 2000).Furthermore, a temporal restriction of 100 ms was applied as the minimum fixation duration to alleviate the device variances.

EEG
The EEG was recorded throughout the experiment with the Advanced Brain Monitoring (ABM, Carlsbad CA) B-Alert X-10 wireless acquisition system.The system records from 9 Ag-AgCl electrodes according to the International 10-20 system at Fz, F3, F4, Cz, C3, C4, POz, P3 and P4 at 256 Hz.All electrodes were referenced to additional mastoid electrodes, bandpass filtered at 60 Hz to remove line noise, and impedances were kept below 40 kΩ.
Recorded EEG was decontaminated by removing artifact for EMG, eye blinks, excursions, saturations and spikes by ABM B-Alert software.Identification of eye blinks in the EEG is achieved by filtering the fast component of the Fz channel with a 7 Hz IIR low-pass filter, applying crosscorrelation analysis to the filtered signal using the positive half of a 40 µV 1.33 Hz sine wave as the target shape, and applying thresholds to the outputs from the cross-correlation analysis.Minima and maxima analysis in each direction from the point of maximum correlation is used to identify the data points corresponding to the range between the start and end of each eye blink.Once eye blink ranges have been determined, the 0.5 Hz highpass filtered EEG signal from each channel is decontaminated by replacing the data points in the eye blink region with the corresponding data after application of the 4 Hz filter (Berka, Levendowski, Cvetinovic, Petrovic, Davis, Lumicao, Zivkovic, Popovic, & Olmstead, 2004).

Visual search scenarios
Following the donning of the EEG headset and eyetracker calibration, participants were given written instructions that outlined what constituted a threat for this experiment, as well as instructions on how to operate the software, followed by a practice session.The practice session consisted of four trial images, two of which contained threats.Once participants completed the practice session, the experimental session started, which consisted of a pre-test session, and seven additional test sessions interspersed with training sessions.Each test and training session consisted of 64 baggage images with an equal distribution of threat and distractor stimuli.Participants were instructed to scan each bag for threat items, and to complete the task as accurately and quickly as possible.If the participant perceived a threat, they were instructed to click directly on threats with a computer mouse.If the participant did not perceive a threat, they were instructed to press the space bar to "Clear" the bag and move to the next image.Training sessions between each test session were identical to test sessions except EEG and eye tracking data were not gathered, and participants were given performance feedback.Hit rate was defined as the number of hits divided by the sum of hits and misses; miss rate was the number of misses divided by the sum of hits and misses.False alarm (FA) rate was defined as the number of FA divided by the sum of FA and correct rejections (CR); and the CR rate was defined as the number of CR divided by the sum of FA and CR.Trials with response times exceeding 2 SD from the mean were discarded.

FLERP analysis
EEG and eye-tracking synchronization was accomplished via post-processing in MATLAB (Mathworks, Natick MA).Since both the eye tracker and the EEG were run on the same PC, the reported timestamps for both systems queried the same system clock at the ms level when reporting values.The first fixation on a threat/distractor was used to mark the beginning of an event-related potential (ERP) within EEG data for each session.Following the participant response, these time points were classified as hit (threat present, participant clicked on threat), miss (threat present but participant indicated no threat, type II error), FA (no threat present but participant indicates threat in image, type I error) or false positive, and CR (no threat present and participant indicates no threat) (Macmillan & Creelman, 2004;Wickens, Hollands, Banbury, & Parasuraman, 2012), as shown in Figure 2. If threats or distractors were not fixated upon, trials were excluded from FLERP analysis.EEGLAB and ERPLAB packages (Delorme & Makeig, 2004) were used in MATLAB to process the EEG data.After the decontaminated data was opened in EEGLAB, electrode locations and event timing and classifications were saved into an EEG dataset per session and participant.
Next, in the ERPLAB package, events were assigned into 4 bins, corresponding to the 4 classifications, and were baseline corrected, averaged, and plotted, both as a temporal series, and a spatial series across the scalp for the P3 wave.Baseline correction was used to eliminate any overall voltage offset from the ERP waveforms in each epoch by subtracting the mean prestimulus voltage in the 100 ms immediately preceding the FLERP.Finally, the P3 component amplitude was computed in ERPLAB software using the ERP measurement function, which measures the peak amplitude of the 3 rd positive peak in the ERP, at 300 ± 25 ms.

Statistical analysis
Changes in classification rate, mean reaction times (RT), number of fixations on target, and fixation durations were analyzed before and after training with repeated measures t-test in SPSS 18 software (IBM, Armonk, NY), with significance set at 0.05.Each of the 4 classifications pre-training was tested separately against the same classification post-training, using weighted means to prevent confounding.The ERP waveforms were analyzed using the ERPLAB measurement function, followed by analysis using a repeated measures t-test in SPSS, with significance set at 0.05.The mean amplitude of the P3 component of the ERP was also analyzed across classifications and electrode sites using a one-way ANOVA in SPSS, with significance set at 0.05.

Performance
Following the initial pre-training test session, reaction times and classification rates improved steadily over the first 4 testing sessions, but did not improve over the 5 th through 7 th testing sessions (see Figure 3).Since the goal of this training exercise is to reach criterion performance, only the pre-training and 4 th testing session are presented below.

Eye Fixations
The mean number of fixations per threat/distractor is shown in Table 2. Following the training sessions, the average number of fixations per threat/distractor decreased significantly for the hit [t(39)=8.68,p<.001], and CR [T(39)=2.38,p=.03], classifications.The interaction effect between training and classification was analyzed via ANOVA, and was not significant, F(3,7)=3.19,p>0.05.The increased hit rate was also accompanied by a decrease in the mean fixation duration, defined as the time between the first fixation on threat and participant response, only on threats classified as hit [T(39) =6.61, p<.001], as shown in Figure 5. Significant changes in mean fixation duration were not seen across the other classifications.

Discussion
The current study demonstrates that fixations obtained from an eye tracker can be used to parse and obtain meaningful insight into the decision making process from the EEG in the time domain.The resulting FLERPs are expected to be able to function as unique neurophysiological signatures to mitigate errors in visual search training due to the statistical differences found within group averages.The current study focused on a single component of the ERP, the P3 wave, but a combination of additional features is expected to significantly increase the performance of the classifier (Dornhege, Blankertz, Curio, & Müller, 2004).Recently, other groups have reported the utility of combining EEG and eye tracking to assess simple visual search performance (Hale, Fuchs, Axelsson, Baskin, & Jones, 2007;Kamienkowski, Ison, Quiroga, & Sigman, 2012), correct artifacts in the EEG (Plochl, Ossandon, & Konig, 2012), study the process of reading (Dimigen, Sommer, Hohlfeld, Jacobs, & Kliegl, 2011) and in neuromarketing efforts (Khushaba, Wise, Kodagoda, Louviere, Kahn, & Townsend, 2013).The current study has shown that even in cluttered imagery such as X-ray scans of carry-on baggage, unique ERP signatures can be obtained from combining eye tracking and EEG to improve performance in scanning tasks.
In this study, significant changes were observed in scan performance, overall performance, and FLERPs following training.Mean reaction times improved for all classifications following training, although due to a difference between the response types (mouse click for threat, spacebar press for clear), small differences in reaction times may be introduced.Distractors were chosen such that they mimicked the size and shapes of knife and gun threats.The resulting FLERPs may indicate different neurophysiological processes, in the case of a threat it may indicate the preparation or execution of a manual response, and in the case of a distractor, it may indicate either the preparation or execution of a key stroke or to continue searching.Additionally, hit rate increased while the mean fixation duration on threats decreased, showing improvements in both accuracy and efficiency in finding and correctly classifying threats due to training.Training sessions also caused a decrease in misses, which is especially critical in baggage screening tasks.Unexpectedly, an increase was seen in FA, expected to reduce overall efficiency in the baggage scanning scenario, along with a concomitant decrease in CR.Previous groups studying visual search scenarios have shown that if targets are highly prevalent, "target present" decisions are more likely and "targetabsent" responses are slower since such a response often leads to a mistake (Wolfe, et al., 2005).Such behavior was observed in this study, as threat prevalence was very high, set to 50% throughout the scenarios.As seen in the changes to mean reaction times, ERPs, and fixation times, significant learning occurred following training.Such learning is expected to cause changes to both visual search patterns (Sireteanu & Rettenbach, 2000), as well as plasticity in visual search pathways as participants develop expertise (Walsh, Ellison, Ashbridge, & Cowey, 1999).
Eye movements introduce artifacts into the EEG, including corneo-retinal dipole changes due to large ocular movements, saccadic spike potentials, and artifacts due to blinking (Plochl, et al., 2012), which are expected to contribute to the variability within an EEG dataset.Baseline correction was applied using the 100 ms of EEG data prior to the start of a fixation event, which included the saccade, and would be prone to artifacts, especially at electrode locations adjacent to the eye.In addition, the changes we observed in this study were in group averages, not specific to single participants.A classifier built upon data averaged from a group is not expected to generalize universally across many subjects due to differences resulting from a wide range of variables such as sensor placement, expertise level, or underlying neurophysiological differences (Del, Mourino, Franze, Cincotti, Varsta, Heikkonen, & Babiloni, 2002).The EEG system and eye tracker were chosen for this study based upon their wireless capability, low profile, fast setup time, and low cost, making them more likely to be deployed in training scenarios.In addition, due to the relatively low number of electrodes in our EEG system (Srinivasan & Tucker, 1998), as well as the relatively low sample rates of the EEG system and eye tracker, higher fidelity classifiers could be developed with higher electrode density and sampling rates (Ryynanen, Hyttinen, & Malmivuo, 2006), at the expense of reducing user comfort and classifier speed due to increased input.The low frequency of the eye tracker also affects the measured durations and latencies, making it difficult to distinguish fixations from other eye movements (Andersson, Nystrom, & Holmqvist, 2010).

Conclusions
In summary, by combining a low cost eye tracker and EEG system, significant differences in neurophysiological markers indicative of user decisions were found within a complex X-ray visual search task.Future work is focused on using a combination of features extracted from the EEG to develop and test a classifier to prevent errors in visual search scenarios.

Figure 1 -
Figure 1 -Example X-ray images of simulated carry-on baggage used in this study.Images were used both without threats (baggage on left), and with specific threats inserted into the images (baggage on right).Gaze locations and durations are shown in a heat map format overlaid on the images in the bottom panels.

Figure 3 -
Figure 3 -The average hit rate increased following the pretesting session (PT) through the first 3 testing sessions, then did not change between sessions 4 to 7.

Figure 4 -
Figure 4 -The mean rates ± SD for all classifications.All classifications were significantly different following training.The hit rate and FA rate significantly increased.The miss rate and CR rate significantly decreased following training; p ≤ 0.001.Pre-training d'= 1.632; post-training d' = 1.364.The average reaction time, defined as the time between the presentation of an image and user response, for both pre-training and post-training sessions is shown in Table 1.The average reaction times (RT) decreased significantly for the hit [T(39)=6.66,p<.001], miss [T(36)=10.47,p<.001], and CR [T(38)=5.39,p<.001] classifications, but not the FA.The interaction effect between training and classification was analyzed via ANOVA, and was not significant, F(3,7)=2.74,p>0.05.

Figure 5 -
Figure 5 -The mean fixation duration on threats/distractors ± SD for all classifications.The mean fixation duration decreased significantly following training for trials classified as hits.**p ≤ 0.001.

Figure 6 -
Figure 6 -FLERPs at electrodes Pz and Cz, shown as average amplitude (µV) over time (ms).The mean amplitudes of the P3 wave were analyzed using the ERP measurement tool in the ERPLAB package.A significant difference (**p ≤ 0.001) was only found for the miss classification.When the mean amplitude of this component was analyzed, a significant difference [T(55)=3.48;p < 0.001] was found between the pre-training and post-

Figure 7 -
Figure 7 -Analysis of variance of the P3 components in the pre-training condition uncovered significant differences between the CR and hit classifications (*p ≤ 0.05) and the CR and miss classifications (**p ≤ 0.001).In the post-training condition, a statistically significant difference between the hit and miss classification (**p ≤ 0.001), between the hit and FA classification (*p ≤ 0.05), and between the miss and CR classification (*p ≤ 0.05) was found.

Table 1
Mean reaction times in seconds and standard deviations (SD) for the four classifications from all users.The hit, miss, and CR reaction times were significantly different following training.* p≤0.001.

Table 2 -
Mean number of fixations and standard deviations (SD) for the four classifications from user responses.The mean number of fixations per threat for the hits and CR were significantly different following training.