Visual vs . Spatial Contributions to Microsaccades and Visual-Spatial Working Memory

Microsaccade rates and directions were monitored while observers performed a visual working memory task at varying retinal eccentricities. We show that microsaccades generate no interference in a working memory task, indicating that spatial working memory is at least partially insulated from oculomotor activity. Intervening tasks during the memory interval affected microsaccade patterns; microsaccade frequency was consistently higher during concurrent spatial tapping (no visual component) than during exposure to dynamic visual noise (no task). Average microsaccade rate peaked after appearance of a fixation cross at the start of a trial, and dipped at cue onset and offset, consistent with previous results. Direction of stimuli in choice tasks did not influence microsaccade direction, however.

In a particularly informative study, Klauer & Zhou (2004) in experiments based on an existing paradigm (Tresch et al., 1993) isolated the distinction between visual and spatial STM from a number of alternative explanations. Their spatial task measured memory for a dot location. The visual memory task in contrast had participants select a Chinese ideogram that had been shown previously. Both tasks had 10 sec delays and seven distractor locations or ideograms at test. Three interference conditions were imposed during the delay, requiring either a movement or color discrimination. Movement discrimination created spatial interference by requiring participants to identify which of 12 asterisks remained stationary, whereas visual interference was generated by categorizing 14 successive background colors as blue or red. Relative to conditions without interference, movement discrimination affected dot location memory and color categorization interfered with ideogram memory. This crossover interaction pattern remained consistent in subsequent experiments that ruled out encoding differences, resource trade-offs, and task similarity as well as contributions from verbal WM, LTM, and the central executive.

Defining the Code and Capacity in VSWM
Visual-Spatial Working Memory (VSWM) requires the sustained representation of objects that are not continually visible. In a VSWM change detection paradigm experimenters display an array of objects briefly before a short delay, afterward present an identical or changed array, and ask whether any objects had changed. Luck and Vogel (1997) did this and found no differences in the slopes of accuracy by set size functions whether objects were distinguished only by color or composed of a conjunction of up to four features. The k-index (Pashler, 1988) was computed from hit H and false alarm F data (the latter to compensate for guessing) to render a memory capacity k estimate of about four objects. In particular, capacity k=[set size(H-F)]/(1-F). Similar performance ceilings are found in iconic memory under whole report conditions (Sperling, 1960), and with multiple object tracking (Pylyshyn & Storm, 1988). Vogel, Woodman, and Luck (2001) extended this finding, ruling out the contribution of verbal WM by nesting the VSWM task within the delay in a two-digit recall task and shortening the initial exposure duration to make phonological rehearsal difficult (Frick, 1988). Capacity was still four integrated objects, suggesting that VSWM has a fixed number of "slots" (Cowan, 2001).
Rather than capacity being affected by visual complexity, Awh, Barton, and Vogel (2007) note that greater visual search slopes can reflect increasing item similarity, and suggested that can lead to greater comparison errors at test. Using memory sets with stimuli from two different categories (cubes and Chinese characters), they showed that between-category changes of low similarity were detected more accurately than within-category changes of higher similarity. Despite the greater visual complexity of the stimuli, capacity estimates from between-category changes were identical to those from simple color-changing squares (about four items, as with previous paradigms). Such results suggest that capacity does not depend on complexity, and that with high-similarity comparisons resolution of the representation rather than VSWM capacity is the limiting factor. This interpretation is strengthened by strong correlations between capacity estimates only for between-category and color changes (Scolari, Vogel, & Awh, 2008).

Oculomotor Programming and Attention
Spatial attention, or orienting (Posner, 1980), is an interface between cognitive processing resources and the sensory environment, and has a rich tradition outside of VSWM. It has been identified as one of three forms of attention implemented in discrete neural networks (Posner & Rothbart, 2007); its capacity-limited dynamics are determined at least partially by physiological constraints on cortical energy consumption (Lennie, 2003). Single-cell (Andersen, 1989) and brain imaging (Corbetta et al., 1998) studies converge to the conclusion that this form of attention is implemented by a fronto-parietal network including some areas identified as relevant to VSWM (Awh & Jonides, 2001;Corbetta, Kincade & Shulman, 2002).
It has often been found that oculomotor programming and attention or spatial representation are closely related. An early incarnation was Rizzolatti's "premotor theory" of attention which supposes that shifts of spatial attention originate from the same system whether they are overt, as eye movements, or covert; covert shifts involve inhibiting the concomitant eye movement (Rizzolatti et al., 1987). Premotor theory is supported by evidence of activation in the frontal eye fields and superior colliculi, known to be involved in oculomotor control (see Moschovakis, Scudder, & Highstein, 1996 for review), and in attention (Corbetta et al., 1998;Moore & Fallah, 2004;Kustov & Robinson, 1996;Bichot, Rao & Schall, 2001;Hanes, Patterson & Schall, 1998).
If microsaccades indicate covert attention shifts, they might also signal orienting in the service of VSWM. Performing voluntary saccades during retention decreases spatial span to a greater degree than attention shifts, which still affect span more than no secondary task at all (Pearson & Sahraie, 2003;Lawrence, Myerson & Abrams, 2004). Reflexive saccades also interfere with spatial span (Lange, Starzynski, & Engbert, 2012) and participants tend to suppress saccades during spatial but not verbal memory encoding when permitted to move their gaze freely (Lange & Engbert, 2013). Studies reporting those findings, though, did not record microsaccade activity to determine whether they have similar detrimental effects on spatial span.
Several studies substantiate a visual versus spatial dissociation in VSWM (e.g. Zimmer, 2008;Klauer & Zhou, 2004). Introducing several visual or spatial tasks during a VSWM retention interval did not interfere with the memory, however (Gaunt and Bridgeman, 2012). It would be wise, for the sake of theoretical compatibility, to explore the behavioral and oculomotor effects of secondary tasks already common to VSWM upon the location task. In our experiment we introduce two events during a VSWM retention period, visual random noise requiring no response and a tapping task that requires a response but does not involve vision. These secondary tasks have been used repeatedly in the VSWM literature, concurrent spatial tap-ping (CST) by Farmer, Berman and Fletcher (1986) and dynamic visual noise (DVN) by Quinn & McConnell (1996;1999), to distinguish between visual and spatial processing in VSWM (Pearson & Sahraie, 2003). DVN should interfere with any visually-based information storage, while CST should interfere with spatially-based storage.

Method
Methods are similar to those used by Gaunt and Bridgeman (2012), except for the tasks during the retention interval.

Participants
Eleven UC Santa Cruz students completed the experiment for course credit. All were right-handed and two were female. All had either normal vision or corrective contact lenses.

Apparatus
Stimuli were presented at a 55 cm viewing distance on a 19" CRT monitor running 1152x864 resolution at 85 Hz. Eye movements were recorded from participants' left eyes with a Bouis Oculometer (Bach, Bouis & Fischer, 1983) sampled at 1 kHz by a National Instruments Data Acquisition (NI-DAQ) PCI card in a PC running Windows XP. Head movements were minimized with a bite bar. Because monocular recording results in noisier microsaccade detection and gives generally weaker directional distributions Engbert, 2006), our results and conclusions are somewhat conservative.
Randomization, timing, and presentation of stimuli, as well as recording of behavioral responses and control of the NI-DAQ were controlled by custom software written by the first author in Matlab (v7.1) with the Psychophysics Toolbox (v2.0; Brainard & Pelli, dark room minimizes room reflections on the screen surface, and a dark background reduces fatigue and pupil-related oculometer artifacts.
Participants responded with the F and J keys of a standard keyboard. These keys were chosen for their standard home-row positions and the presence of tactile bumps to distinguish them.

Behavioral Task
On each trial, a fixation cross (subtending 2°) appeared in the middle of the screen for 500 ms, followed by a cue stimulus in one of several positions for 400 ms. Three imaginary circles (radii of 4°, 4.8°, and 5.5°) surrounded central fixation, each containing cue positions at every 10° increment to make a total of 108 possible positions. After cue offset a five second retention interval began, concluding in the presentation of a probe stimulus that remained visible until responded to in a two-alternative forced choice (2AFC) (Figure 1). Participants were asked to respond as quickly as possible to the probe while remaining very accurate. They were instructed to press the right key if the probe was in the same location as the cue, or the left key if shifted inward toward fixation. Error tones provided feedback during the inter-trial interval following misses or false alarms. The fixation cross remained visible throughout the trial.

Procedure
Practice: Participants were first familiarized with the memory task. They were instructed not to move their eyes away from fixation while doing trials, and were aware that their eye movements would be monitored in the main experiment to ascertain that they were not looking at cue, probe, or choice stimuli. They practiced the memory task alone with no intervening task until they met the accuracy criterion of 70%. Performance was evaluated every 15 trials to determine whether participants met the criterion on those trials, and if not they practiced for another 15 trials until they did.

Equipment Setup:
Participants had a dental-wax bitebar molded to their mouths and mounted before the stimulus display and oculometer. After seatadjustments were made to maximize comfort and minimize movement, the experimenter began the calibration routine. Participants fixated points five degrees away from the zero-point in each of four directions (left, right, above, and below) as the experimenter adjusted the X and Y gain to afford a suitable range of voltage values in each channel. Next, participants fixated small targets at the zero-point and one degree away from it in each of four directions as the experimenter recorded the voltages at each target. Voltage differences between targets were used to derive constants to convert the voltage signal to visual degrees. Lastly, the experimenter adjusted the zero-point and gain settings for the X and Y channels on the oscilloscope so that the calibrated region would be magnified over the larger screen, permitting sensitive monitoring of gaze stability. That is, it was plainly visible when the participant's gaze drifted or otherwise moved outside of the 4°2 calibrated region.
Experimental Trials: Each trial was initiated by the experimenter pressing a key. Giving the experimenter this control assured that gaze remained within the calibrated region, and afforded the ability to halt the trial series and make corrective calibrations if necessary. The experimenter rejected the trial if there were saccades outside of the calibrated region. Rejected trials were recycled into the block queue. Participants were invited to take quick breaks between blocks, and calibration of the participant's gaze preceded the beginning of each one.

Gaze Data Processing
Smoothing: To overcome velocity artifacts that could affect the accuracy of the microsaccade parsing algorithm, gaze data were low-pass filtered with a small Gaussian (filter bandwidth at half maximum was 6 samples). Digital step size in the unfiltered record was 8.7 arc sec, which represents the resolution of our system. Noise in the raw traces comes from machine noise as well as nystagmus in the oculomotor system, with the relative contributions between those two factors being indeterminate.

Microsaccade Detection:
We used the Engbert & Kliegl (2003) algorithm to parse microsaccades from gaze data with only slight modifications to conform with our hardware (Gaunt & Bridgeman, 2012). Velocity outliers that did not exceed 1° amplitude, 100°/s velocity, or 40 ms duration were classified as microsaccades.

Behavioral Tasks
Staircasing: Before experimental sessions were begun, the location task was performed in two independent SIAM (single-interval adjustment matrix: Kaernbach, 1990) staircasing runs with different adjustment matrices to acquire each participant's probe displacement thresholds at target performance levels of 25% and 75%. The order of the two runs was randomized across participants, with the second run immediately following the first. All runs began with a displacement of three degrees, a step size of 0.2°, and a displacement-to-match trial ratio of 75%. After the first reversal step size was reduced to 0.1° and the displacement-to-match ratio to 50%.
Step size was further reduced to 0.05° after the second reversal, the smallest displacement increment with our screen reso-lution. Once twelve reversals had occurred the first two were discarded and an average of the last ten was used as the threshold for that target performance value. Gaze data were not recorded during staircasing, but it was important to expose participants to all elements of the task as they appear in the experimental trials.
Secondary Tasks: Dynamic visual noise: Adhering to the most common method in the literature (Quinn & McConnell, 1996;1999;Dean, Dewhurst, & Whittaker, 2008), a matrix of 80 x 80 cells was centered on the fixation cross. At our viewing distance and screen resolution this came to a square 16° per side, with each cell subtending about 0.2° (6 x 6 pixels each), that occluded all possible cue locations. Following previous studies by filling cells with either black or white could make maintaining fixation upon the white fixation cross unnecessarily difficult. To ensure the fixation cross stood out, the bright DVN hue was a shade darker than white and the dark hue was just brighter than black. Half of the cells were dark and the other half bright. Starting 500 ms into the retention interval and ending 500 ms before its conclusion, 229 cells were flipped in each of 56 successive DVN displays, each lasting about 70 ms each to give an appearance of continuous change over four seconds resembling video white noise. Participants were to maintain fixation while passively viewing DVN. Concurrent spatial tapping: Duplicating methods from the literature requiring a custom surface for CST could be problematic for location task performance. Not only would reaching between keyboard and tapping surface take time, but hand placement accuracy would be impaired without visual guidance (unavailable in the dark while maintaining fixation). Therefore keeping hands in position on the keyboard throughout a trial would be ideal. Participants heard a series of eight 50 ms clicks at an SOA of 450 ms, and were asked to tap a pattern of keys on the keyboard with their middle fingers such that the taps and clicks occurred together. The pattern was predictably 'i-e-kd-i-e-k-d', and permitted participants to keep their location task response fingers ready on the F and J keys. If there were not 8 taps participants would hear 3 successive error tones during the inter-trial interval. If the wrong sequence of keys was tapped they heard two tones, and if they tapped over 150 ms early or late with the clicks they heard one error tone.
Design: Participants performed the location task in four separate sessions; each was a different condition in a 2x2 factorial design defined by the presence or absence of visual and spatial secondary tasks during the retention interval. Thus, one condition featured no secondary task during retention, two others had either DVN or CST alone, and another combined the two. Each session had four blocks of 36 trials each, for a total of 576 trials per participant across sessions. Match and miss trials occurred with equal probability.
Accuracy data were analyzed with a 2 x 3 x 2 x 4 (Block x Eccentricity x Trial x Condition) repeatedmeasures ANOVA. This permitted examination of any interaction effects that might emerge between the performance level threshold used across blocks, the eccentricity of cues, whether the probe was in the same location as the cue or displaced, and the presence or absence of the two secondary tasks.

Behavioral Data
All analyses excluded trials with memory task responses faster than 300 ms or slower than 1500 ms, resulting in exclusion of 0.8% of 6,336 total trials. Also, one trial with missing gaze data was excluded. All post-hoc pairwise comparisons were made using Tukey HSD. SIAM Thresholds: Threshold differences produced by the 25% and 75% performance level runs were reliably different (Figure 2). Repeated-measures ANOVA showed that the average displacement threshold found from 25% performance level runs (M = 0.96°, SD = 0.17°) was significantly smaller than those acquired in the 75% runs (M = 1.61°, SD = 0.34°), F(1, 10) = 44.57, p < .0001, ηp2 = 0.82.
A block x trial interaction, F(1, 10) = 17.96, p = .002, ηp 2 = 0.64, showed more specifically that displacement trials in the 25% performance level blocks were less accurate than match trials in either block and miss trials in the 75% block as well, ps < .001. There was also a main effect of eccentricity, F(2, 20) = 14.16, p = .0002, ηp 2 = 0.59, and examination of an eccentricity x trial interaction, F(2, 20) = 20.94, p < .0001, ηp 2 = 0.68, shows it being driven by significantly lower accuracy on displacement trials when cues appear at 4.8° eccentricity, p = .004, and 5.5°, p = .0002. Posthoc tests confirmed no statistical differences among trials of any eccentricity when probe and cue locations matched. There were no other significant interactions.

Microsaccades
Microsaccade Rate: Figure 3 shows a main sequence plot of detected saccades. The smallest microsaccades of about 0.02°, at the lower left of the graph, merge in magnitude and velocity with slow drifts. Figure 4 plots mean microsaccade rates over the trial timeline plotted as a function of secondary task condition. Because there are substantial individual differences in microsaccade rates (Rolfs, 2009), we also plot trials for individual subjects. At the top of the figure are raster plots for individual subjects with each row representing microsaccade onsets from seven randomly selected trials per participant in the four conditions before averaging. The largest peak in rates occurred just after fixation onset from a blank screen, though some of these may have been small fixational saccades to acquire the target. Microsaccades appear to be inhibited across conditions shortly after memory cue onset and offset, replicating results of Gaunt and Bridgeman (2012).

Figure 3. A log-log main sequence plot, flanked by frequency polygons to impart density information. Light grey lines indicate microsaccade thresholds.
When secondary tasks began, rates remained steady if there was no secondary task and then declined slowly until dropping off after probe onset. In the CST condition microsaccade rates were low following cue offset but increased once tapping started and remained at that level until dropping off shortly before probe onset. Beginning about 0.5sec after task or display onset the rate in the CST condition is higher than in the DVN condition in every sampling interval until task or display offset, a difference significant at p<0.001 by a binary sign test. Microsaccades appear to be released when visual attention or stimulation lags during tapping, consistent with early findings that visual concentration depresses rates (Bridgeman & Palca, 1980;Winterson & Collewijn, 1976) but extending those findings to intervals of several sec.
Rates peak briefly in the DVN condition and the condition with both secondary tasks following the onset of DVN. The first peak in the condition with both DVN and CST, though, appears much greater than that with only DVN; moreover, microsaccade rate in that condition also appears lower than in the CST condition immediately following cue offset. These two conditions also showed a small rebound effect in microsaccade rate following the offset of DVN; the rate in the DVN condition declines steadily before peaking a final time just before probe onset. A small bump in the no-task condition is due to a sustained high rate from a single subject (see individual subject records in the figure).
Microsaccades nearly disappear from the records near the time of the memory response, before rebounding after probe onset, again showing a dearth of microsaccades at a time of visual challenge. defined form. Many candidate time windows were detected in which mean rates exceeded the shaded region of a surrogate distribution (Gaunt and Bridgeman, 2012) that defines no significant relationship between saccade direction and direction of stimulus shift, but only one attained significance (from 3096 to 3118 ms into the trial, in the condition with no secondary task). Figure 6 is a polar plot of the 111 microsaccade directions during this 22 ms interval. Overall there is a tendency, albeit bimodal, for microsaccades to shift gaze away from the cued location about 700 ms after cue offset when there is no intervening secondary task to prepare for.

Discussion
Location task responses to probe displacements were faster and more accurate in 75% performance level blocks than 25% blocks. The latter blocks had smaller displacements on average. Instituting an adaptive procedure to personalize stimuli for participants was successful in raising the level of performance for small displacements, as well as lowering the larger displacements from ceiling. Although the SIAM procedure was successful in defining displacements at statistically distinct sizes that in turn affected location task data across the two different trial blocks, accuracy is greater than would be anticipated with the associated criterion. The SIAM procedure was the first exposure to the location task for some participants, and apparently their performance improved during the experimental trials as a function of practice. Also, cue eccentricity and position were not controlled for in the SIAM runs. Cue eccentricity effects are also present, lowering performance on displacement trials with more peripheral cues.
Contrary to what we expected from previous work with DVN and CST (Farmer, Berman and Fletcher, 1986;Quinn & McConnell, 1996;Pearson & Sahraie, 2003;Zimmer, 2008), these secondary tasks generated only insignificant interference. Had the location task relied solely on spatial WM, as previously assumed, we should have found impaired accuracy in the conditions with CST, but not the condition with DVN only or no secondary task. Were there an additional visual aspect, we should have found interference in the DVN only condition and in the condition with both secondary tasks. Thus our result casts doubt upon the likelihood that current and previously reported location task data are readily interpretable within familiar VSWM theoretical frameworks that dissociate visual and spatial processing.
Microsaccade rate functions are similar to those computed in earlier experiments (Gaunt and Bridgeman, 2012;Siegenthaler et al. 2013). There is a decrease in rate followed by an increase after the two most informative stimuli in our protocol, cue onset and probe onset. This pattern has been modeled by a dynamic activation model of the superior colliculus (Engbert, 2012), where microsaccades are simply activations of an anterior region of the collicular map near the fixation center. The self-avoiding walk of the model is an instantiation of lateral inhibition. The major source of novel observations comes from inclusion of CST and DVN. The most important new observations occur during the retention interval (figure 4). CST induced consistently higher microsaccade rates during most of the retention interval, indicating that shifting attention away from vision allowed more microsaccades, whereas an intense visual stimulation suppressed them.
Microsaccade rates in the two conditions with CST were lower following cue offset and more stable throughout most of the later part of the retention interval, whereas the two conditions without CST had higher microsaccade rates following cue offset and showed gradual declines during retention. The two conditions with DVN showed peaks in microsaccade rate following the onset and offset of the DVN array, with a higher peak in the condition with both CST and DVN. Betta & Turatto (2006) have shown that microsaccade rates in a simple RT task are lower just before responses and greater just afterward. Such a finding is compatible with lower microsaccade rates in conditions with CST prior to secondary task onset, as well as a greater peak in microsaccade rate following onset in the condition with both CST and DVN versus the condition with DVN only in which no tapping was required. The peaks found only in the two conditions with DVN following DVN onset and offset are consistent with the common observation of inhibition following stimulus display changes (Rolfs, Engbert, & Kliegl, 2008). Recording gaze during the extended time periods permitted us to see microsaccade rate decline during fixation and that rates do not return to the pre-fixation levels immediately after probe responses. Microsaccade directional rates attained significance only briefly during the condition without either secondary task. This is likely an artifact of the sparseness of the data set at the point of occurrence, which leads to the standard deviation of the surrogate distribution shrinking to virtual nonexistence and makes it easy for the mean directional microsaccade rate to exceed the surrogate distribution. On the polar plot of normalized microsaccade trajectories during that brief time window in figure 6 it appears that although there is a general bias away from the cue it is not as focused as found previously by Turatto et al. (2007) by partici-pants using focused attention to perform difficult perceptual discriminations.
There is a poverty of behavioral tasks that cleanly dissociate visual and spatial elements of VSWM. Corsi blocks, for instance, are sensitive to visual display features such as symmetry of the block array (Rossi-Arnaud, Pieroni, & Baddeley, 2006), and also rely on central executive resources (Klauer & Stegmaier, 1997). The location task paradigm also seems bedeviled by an inability to be neatly categorized as either a simple visual or spatial STM task. One approach to resolving such conflicts is a continuity model of WM (Cornoldi & Vecchi, 2003) where a continuum is defined by amodal central processing (WM manipulation) on one extreme and parallel modality-specific processing (STM rehearsal) on the other. The form of modality-specific information (verbal, visual, spatial, and others) makes up another continuum, such that any task could be assumed to rely on a blend of those resources rather than one exclusively. Tasks using STM and WM could theoretically be placed at different points on these continua depending on the degree to which they relied on varying modality-specific STM stores and the degree of central processing involvement, although presently it is unclear how to quantify such point values. The larger number of free parameters in a model will also of course improve its fit to any data.
The experiments reported here showed no difference between the mean number of microsaccades that occur on correct and incorrect location task trials. Future studies should first ascertain that larger, voluntary saccades interfere with location task performance. Theoretical positions that implicate the oculomotor system in VSWM rehearsal (Belopolsky & Theeuwes, 2009) must distinguish activity related to microsaccades from any other emerging computations. One could wager that the attention shift associated with a voluntary saccade disrupts VSWM, and not the physical saccade itself, to thereby exclude microsaccades with no attentional component from the category of events that disrupts VSWM.