<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">

<article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
 <front>
    <journal-meta>
	<journal-id journal-id-type="publisher-id">Jemr</journal-id>
      <journal-title-group>
        <journal-title>Journal of Eye Movement Research</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1995-8692</issn>
	  <publisher>								
	  <publisher-name>Bern Open Publishing</publisher-name>
	  <publisher-loc>Bern, Switzerland</publisher-loc>
	</publisher>
    </journal-meta>
    <article-meta>
	<article-id pub-id-type="doi">10.16910/jemr.11.2.5</article-id> 
	  <article-categories>								
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
		</article-categories>
      <title-group>
        <article-title>Synchronizing eye tracking and optical motion capture: How to bring them together</article-title>
      </title-group>
	   <contrib-group> 
				<contrib contrib-type="author">
					<name>
						<surname>Burger</surname>
						<given-names>Birgitta</given-names>
					</name>
					<xref ref-type="aff" rid="aff1"></xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Puupponen</surname>
						<given-names>Anna</given-names>
					</name>
					<xref ref-type="aff" rid="aff1"></xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Jantunen</surname>
						<given-names>Tommi</given-names>
					</name>
					<xref ref-type="aff" rid="aff1"></xref>
				</contrib>				
        <aff id="aff1">
		<institution>University of Jyv&#xE4;skyl&#xE4;</institution>,   <country>Finland</country>
        </aff>
		</contrib-group>     
	  <pub-date date-type="pub" publication-format="electronic"> 
		<day>7</day>  
		<month>5</month>
        <year>2018</year>
      </pub-date>
	  <pub-date date-type="collection" publication-format="electronic"> 
	  <year>2018</year>
	</pub-date>
      <volume>11</volume>
      <issue>2</issue>
 <elocation-id>10.16910/jemr.11.2.5</elocation-id>
	<permissions> 
	<copyright-year>2018</copyright-year>
	<copyright-holder>Burger, Puupponen and Jantunen</copyright-holder>
	<license license-type="open-access">
  <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License, 
  (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">
    https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p>
</license>
	</permissions>
      <abstract>
        <p>Both eye tracking and motion capture technologies are nowadays frequently used in human sciences, although both technologies are usually used separately. However, measuring both eye and body movements simultaneously would offer great potential for investigating crossmodal interaction in human (e.g. music and language-related) behavior. Here we combined an Ergoneers Dikablis head mounted eye tracker with a Qualisys Oqus optical motion capture system. In order to synchronize the recordings of both devices, we developed a generalizable solution that does not rely on any (cost-intensive) ready-made / company-provided synchronization solution. At the beginning of each recording, the participant nods quickly while fixing on a target while keeping the eyes open - a motion yielding a sharp vertical displacement in both mocap and eye data. This displacement can be reliably detected with a peak-picking algorithm and used for accurately aligning the mocap and eye data. This method produces accurate synchronization results in the case of clean data and therefore provides an attractive alternative to costly plug-ins, as well as a solution in case ready-made synchronization solutions are unavailable.</p>
      </abstract>
      <kwd-group>
        <kwd>Eye movement</kwd>
        <kwd>eye tracking</kwd>
        <kwd>new media</kwd>
        <kwd>intermodal processing</kwd>
        <kwd>motion capture</kwd>
        <kwd>technology</kwd>
        <kwd>synchronization</kwd>
        <kwd>methodology</kwd>
      </kwd-group>
    </article-meta>
  </front>
<body>

    <sec id="S1">
      <title>Introduction</title>
	  
      <p>Both eye tracking and motion capture technologies are
nowadays widely used in human sciences (e.g., music
research or sign language linguistics), although both
technologies are usually used separately. However, the
combination of measuring both eye and body movements
simultaneously would offer great potential for investigating
action-perception links and cross-modal interaction in
human behavior in general, and in musical behavior and sign
language in particular. Especially in communicative and
joint actions, such as making music or dancing together,
combining different data acquisition tools like motion
capture and eye tracking would provide new and innovative
possibilities for conducting research.</p>

      <p>Possible research questions of interest could include
whether performers in a musical ensemble coordinate eye
and body movements to create successful joint
performances or whether gaze directions reflect participants&#x2019;
movements and interactive behaviors when dancing with
another person. In the field of sign language research &#x2013; in
which eye behavior, together with the activity of the hands
and other parts of the body, has been argued to be an
important means to organize linguistic structure &#x2013; possible
research questions could include how exactly signers
coordinate eye gaze and eye movements with manually
produced linguistic units, and how the temporal alignment of
eye and hand behaviors differ, for example, between native
signers and sign language learners.</p>

      <p>However, the biggest challenge in combining separate
data acquisition technologies, such as motion capture and
eye tracking, is reliably synchronizing the devices so that
the data can either be recorded at the same time or be
precisely aligned afterwards. Accurate synchronization of the
different data streams is crucial for time-critical analysis
of the data and for relating the different data streams to
each other in order to answer the research questions at
hand.</p>

      <sec id="S1a">
        <title>Research using motion capture and eye tracking</title>

      <p>Both technologies have been used separately in various
research areas such as psychology, biomechanics,
education, sports, linguistics, and music. Since the authors are
mainly familiar with research in music and sign language,
the following literature review will focus on these research
fields.</p>

        <sec id="S1aa">
          <title>Music and motion capture</title>

        <p>In the music field, motion capture has been used to
study gestures during music performance or spontaneous
movement responses to music for example. In terms of
performers&#x2019; gestures, work by, for instance, Thompson
and Luck (
          <xref ref-type="bibr" rid="b32">32</xref>
		  )investigated expressivity during piano
performances, finding increased movement in structurally
important parts when playing expressively compared to
playing without expression. Van Zijl and Luck (
          <xref ref-type="bibr" rid="b37">37</xref>
		  )
addressed the role of experienced emotions on movement
characteristics during music performance, finding
increased movement when playing with a sad expression
compared to playing while being in a sad feeling.
Glowinski et al. (
          <xref ref-type="bibr" rid="b12">12</xref>
		  )
          studied the movements of a string
quartet during performance, obtaining different head
movement patterns in joint versus solo performances.</p>

        <p>In music-induced movement, Burger, Thompson,
Luck, Saarikallio, and Toiviainen (
          <xref ref-type="bibr" rid="b3">3</xref>
		  ) explored
relationships between spontaneous full body movement and
musical characteristics such as pulse clarity and spectral
content, finding that clearer pulses and stronger spectral
content in low and high frequencies encouraged
participants to move more.
Van Dyck et al. (
          <xref ref-type="bibr" rid="b36">36</xref>
		  ) showed that
participants&#x2019; spontaneous movements increased with the
presence of the bass drum. Carlson, Burger, London,
Thompson, and Toiviainen (
          <xref ref-type="bibr" rid="b6">6</xref>
		  ) focused on personality
characteristics in relation to music-induced movement,
finding that participants with higher conscientiousness and
lower extraversion show greater responsiveness to tempo
changes. Haugen (
          <xref ref-type="bibr" rid="b15">15</xref>
		  ) studied music-dance
relationships both in Brazilian Samba and Norwegian
Telespringar, while Naveda and Leman (
          <xref ref-type="bibr" rid="b26">26</xref>
		  )investigated
spatiotemporal representations in dance gestures of Samba
and the Charleston.</p>

        <p>Movement has also been studied from the perspective
of perception. Vuoskoski, Thompson, Clarke, and Spence (
          <xref ref-type="bibr" rid="b38">38</xref>
		  ) showed stick-figure animations to participants and
studied the perception of expressivity in musical
performances, showing that the influence of the visual
component seems stronger in the communication of expressivity
compared to the auditory. Burger, Thompson, Saarikallio,
Luck, and Toiviainen (
          <xref ref-type="bibr" rid="b4">4</xref>
		  ) investigated the attribution
of emotions to music-induced movement by showing
participants stick-figure animations of spontaneous dance
movement, showing that dance was perceived rather as
positive than negative emotions. Su and Keller (
          <xref ref-type="bibr" rid="b31">31</xref>
		  )
          studied synchronization when perceiving stick-figure
videos of dance movements of oneself and others, finding that
participants, especially musicians, synchronized more
accurately with others than with their own movements.</p>
      </sec>
	  
        <sec id="S1ab">
          <title>Sign language linguistics and motion capture</title>	  

        <p>In sign language linguistics, motion capture has been
used in a few works to investigate various linguistically
relevant phenomena from an articulatory perspective.
Concerning early work, Wilbur (
          <xref ref-type="bibr" rid="b41">41</xref>
		  )showed that there
is a link between stressed sign production and certain
kinematic variables such as displacement, velocity, and
acceleration. Wilcox (
          <xref ref-type="bibr" rid="b42">42</xref>
		  )
          , in turn, looked at the production of
consecutive hand alphabets (i.e. fingerspelling) and
showed, for instance, that the velocity peaks of the finger
movements to target alphabets are a significant feature in
the organization of fingerspelling.</p>

        <p>More recently, Tyrone and Mauk (
          <xref ref-type="bibr" rid="b35">35</xref>
		  ) examined
sign lowering (i.e. producing the sign lower than its
citation form) in American Sign Language and found that it is
affected in predictable ways by production rate, phonetic
context, and position within an utterance (see also Mauk
&#x26; Tyrone, 
          <xref ref-type="bibr" rid="b23">23</xref>
		  ).
Jantunen (
          <xref ref-type="bibr" rid="b18">18</xref>
		  ), in turn, investigated
whether the signed syllable &#x2013; a sequential movement of the
articulator(s) &#x2013; could be empirically defined with the help
of a single acceleration peak. He found that this was not
the case, as the number of acceleration peaks in syllables
could vary from zero to three and acceleration peaks could
also be found outside the syllable domain. In another
study, Jantunen (
          <xref ref-type="bibr" rid="b19">19</xref>
		  )
          compared sign strokes ("signs")
with non-strokes ("transitions") and established that there
is a kinematic difference between them.</p>

        <p>In a more recent work, Puupponen, Wainio, Burger,
and Jantunen (
          <xref ref-type="bibr" rid="b29">29</xref>
		  ) analyzed the kinematic characteristics
and functional properties of different head movements in
Finnish Sign Language and showed that there is no perfect
correspondence between their forms and functions, unlike
results reported in some earlier studies.</p>
      </sec>
	  
        <sec id="S1ac">
          <title>Music and eye tracking</title>	  

        <p>Eye tracking has been frequently used to study music
(sight-) reading. When looking at amateur musicians
Penttinen, Huovinen, and Ylitalo (
          <xref ref-type="bibr" rid="b27">27</xref>
		  ) found that more
experienced musicians used shorter fixation times and
more linear scanning of the notated music. Focusing on
adult music students, Penttinen, Huovinen, and Ylitalo (
          <xref ref-type="bibr" rid="b28">28</xref>
		  ) found that performance majors showed shorter
fixation durations and larger eye-hand spans. Professional
performers had more efficient fixations that helped them
anticipate difficulties and potential problems compared to
non-musicians (
          <xref ref-type="bibr" rid="b27">27</xref>
		  ).</p>
		  
        <p>Hadley, Sturt, Eerola, and Pickering (
          <xref ref-type="bibr" rid="b14">14</xref>
		  ) found that
harmonically incongruent melodies caused rapid
disruption in eye movements and pupil dilation.
Gruhn et al. (
          <xref ref-type="bibr" rid="b13">13</xref>
		  )
          investigated differences between saccadic eye
movements in musicians and non-musicians, finding that
musicians had more express saccades, stronger voluntary
eye control, and more stability in their fixations than
non-musicians.</p>

        <p>Laeng, Eidet, Sulutvedt, and Panksepp (
          <xref ref-type="bibr" rid="b21">21</xref>
		  ) found
relationships between pupil dilation and musical chills, in
that the pupil size increased around the moment of
experiencing the chill. Gingras, Marin, Puig-Waldm&#xFC;ller, and
Fitch (
          <xref ref-type="bibr" rid="b11">11</xref>
		  ) could predict pupillary responses from
musicinduced arousal and individual differences &#x2013; pupils dilated
more for arousing or tense excerpts, in particular when the
excerpts were liked less.</p>

        <p>Fink, Geng, Hurley, and Janata (
          <xref ref-type="bibr" rid="b10">10</xref>
		  ) investigated the
role of attention during music listening on pupil dilation,
finding pupil dilations in deviants of complex musical
rhythms. Woolhouse and Lai (
          <xref ref-type="bibr" rid="b43">43</xref>
		  )
          studied participants&#x2019;
eye movements while observing dance movements,
finding more fixations on the upper rather than the lower body,
as well as greater dwell times for the head than for torso,
legs, or feet.</p>
      </sec>
	  
        <sec id="S1ad">
          <title>Sign language linguistics and eye tracking</title>		  

        <p>In sign language linguistics, the use of eye tracking has
been very rare. Concerning perception studies, Muir and Richardson (
          <xref ref-type="bibr" rid="b25">25</xref>
		  )
          found that native signers tend to fixate
on the upper face of the addressee, especially if the
addressee is close by. Emmorey, Thompson, and Colvin (
          <xref ref-type="bibr" rid="b9">9</xref>
		  ) showed that this tends not to be the case for signing
beginners who prefer to look at the mouth area. Wehrmeyer (
          <xref ref-type="bibr" rid="b39">39</xref>
		  )
          showed that the viewing habits of deaf
and hearing adults are also different in other contexts, for
example, in watching sign language interpreted news
broadcasts.</p>

        <p>Concerning production studies, Thompson, Emmorey,
and Kluender (
          <xref ref-type="bibr" rid="b33">33</xref>
		  ) found that signers&#x2019; gaze behavior is
different depending of the type of the verb sign and how it
is modified in the signing space. In a follow up study (
          <xref ref-type="bibr" rid="b34">34</xref>
		  ), they also showed that this gaze behavior is affected
by signing skill. A recent study by Hosemann (
          <xref ref-type="bibr" rid="b17">17</xref>
		  ),
however, suggested that the pattern found by Thompson et al. (
          <xref ref-type="bibr" rid="b33">33</xref>
		  ) may not be so systematic.</p>
      </sec>
	  
        <sec id="S1ae">
          <title>Combining motion capture and eye tracking</title>	  

        <p>Within the music field, there have only been very few
studies so far that tried to combine motion capture and eye
tracking, while in sign language research, motion capture
and eye tracking have not been used together before. In
music-related research, Bishop and Goebl (
          <xref ref-type="bibr" rid="b1">1</xref>
		  ) study
visual attention during duet performances, expecting that
visual attention declines with repetition of the piece due to
getting to know each other&#x2019;s intentions. Marandola (
          <xref ref-type="bibr" rid="b22">22</xref>
		  )
          investigated hand-eye synchronization in xylophone
performance, suggesting that western musicians prepare for
the hits to be performed with their gaze, while
Cameroonian musicians tend to look away from the instrument.</p>
      </sec>
      </sec>
	  
      <sec id="S1b">
        <title>What is motion capture?</title>

          <p>Different systems for recording motion capture are
available (
            <xref ref-type="bibr" rid="b2">2</xref>
            ). Inertial systems track the
acceleration and orientation of sensors attached to
participants/objects in three dimensions, while magnetic systems
measure the three-dimensional position and orientation of
objects in a magnetic field. Of more importance for this
paper are camera-based systems, in particular
infraredbased optical motion capture systems. In such systems,
cameras send out infrared light that is reflected by
(passive, wireless) markers attached to participants and/or
objects, so that these reflections can be recorded by the
cameras. These systems are composed of an array of
several cameras chained in a row to represent the data in a
three-dimensional space. Using a method called direct
linear transformation, the system acquires the exact position
and orientation of each camera, with respect to the others
and the floor, to be able to create the three-dimensional
representation of the capture space and triangulate the
marker positions (
            <xref ref-type="bibr" rid="b30">30</xref>
            ).</p>
			
          <p>Since optical motion capture systems work with
reflections (i.e., passive reflective markers) only, these markers
need to be labeled to identify which body part or object
each marker represents. Two main approaches for data
labeling exist. Some systems, such as the ones manufactured
by Vicon or OptiTrak, let the user define the locations of
the markers and create a body model prior to the recording
that is applied during the recording or post-processing. If
the model works correctly, the data is labelled
automatically. However, if the model fails (due to, for instance,
marker loss), manual labeling is required. In the Qualisys
system, the user first records the raw markers without any
body model. Afterwards, one recording is labelled
manually, from which a body model is created that is then
applied to the other recordings to label them automatically.
Also here, manual labeling is required, if the model fails.
However, the model can be improved by updating it after
each recording. The main challenge of optical systems is
that occlusions of markers during the recording causes
marker loss and gaps in the data. Thus, such occlusions
should be prevented by careful marker placement and
camera positioning before and during the recording.</p>

          <p>Optical motion capture systems have high temporal
and spatial resolutions, as recent systems track up to
10,000 frames per second and have a resolution of less than
one millimeter. Normally in music- and sign
language-related applications, standard capture speeds range from 60
to 240 Hz (most often 100-120 Hz), which is sufficient for
capturing most relevant activities, such as playing
instruments or dancing (
            <xref ref-type="bibr" rid="b2">2</xref>
            ).</p>
        </sec>
		
      <sec id="S1c">
        <title>What is eye tracking?</title>		

          <p>In the case of eye tracking, camera-based trackers are
most widely used nowadays, with an infrared light source
detecting the pupil by using the so called corneal
reflections, resulting in a variety of different measures including
the position or dilation of the pupil (
            <xref ref-type="bibr" rid="b16">16)</xref>
            ). Screen-based or stationary eye trackers are attached
to the object to be tracked, usually a screen, with the
participant placed in a stationary position in front of the screen
and the tracking system. Mobile eye trackers, on the other
hand, are head-mounted eye trackers worn like glasses so
the participant can move in space while the tracker
captures the eye movement and the scene being observed.
Therefore, mobile eye trackers have two kinds of cameras,
one (infrared-based) to record the eye/pupil and the other
(regular pixel-based, fish-eye lensed) for the field or the
scene, representing what the participant sees.</p>

          <p>Eye trackers also require calibration, usually by
providing four fixed points in space that the participant is
asked to focus on one after another while keeping the head
still (i.e., by only moving the pupils). With these four
points, the system is able to combine the eye positions with
the field video and display the focus of the gaze as a cross
hair in the field video. Most mobile eye trackers track at
rates of 50 or 60 Hz. Both mocap and eye tracking systems
result in numerical data representations of the body and
eye movement respectively that can be processed
computationally.</p>
        </sec>
		
      <sec id="S1d">
        <title>Synchronization of motion capture and eye tracking</title>

          <p>Reliable and accurate synchronization between the
motion capture system and eye tracker is crucial for relating
both data streams to each other and time-critically
analyzing the data. Different attempts have been developed. The
two studies mentioned above have employed different
methods. One possibility is to use (i.e., purchase) solutions
offered by the manufacturers (e.g., using sync boxes or
plug-ins like Bishop &#x26; Goebl, 
          <xref ref-type="bibr" rid="b1">1</xref>
		  )or alternatively use
(analog) claps like Marandola (
          <xref ref-type="bibr" rid="b22">22</xref>
		  ) equipped with
mocap markers recorded by the eye tracking glasses&#x2019; field
camera. However, manual claps would require the
researcher to manually synchronize the data, which is a
rather time-consuming effort. Moreover, since the video (of
the eye tracker field camera) recording the clap is based on
(changing) pixels, the possibility of finding the exact
frame to which the mocap data should be synchronized
might be more challenging compared to working with
digital representations of time series motion capture and eye
movement data. Another potential challenge for
synchronization might arise from differences in the starting times
of the recordings of both eye tracking cameras. This would
mean that the delay between the start of the eye camera
and the field camera has to be additionally quantified for
each recording, resulting in possible inconsistencies.</p>
		  
          <p>Ready-made solutions offered by the manufacturers
are available for several motion capture system and eye
tracker combinations, although not for all available eye
systems. Furthermore, such a plug-in is relatively
cost-intensive and usually requires a complicated technical setup
using two computers (one for running the motion capture
recording software, the other for running the eye tracker
software &#x2013; at least in case of the Qualisys motion capture
system) that are linked via a wireless network connection,
which might cause computer/system security issues or
delays/lags in the processing. Other solutions (e.g., from
Natural Point OptiTrack) work via a sync box connecting the
different devices, for instance via a TTL signal and/or
STPTE timecode (see below), which is also cost-intensive,
possibly requiring engineering knowledge as cables might
need customized connectors to fit into the available in- and
outputs of the devices and computers.</p>

          <p>Synchronizing different devices is a technically
challenging problem. It is not only a challenge to ensure that
recordings start at the same time, but also that they would
not drift apart in time from each other during the recording
(so one recording would be longer or have less frames
recorded than the other). Another possibility could be that the
sampling points of the different systems are locally
misaligned (due to an unstable sampling rate) which is referred
to as jitter. While high quality motion capture systems,
such as the Qualisys system used in this case, exhibit close
to zero drift and jitter (being one part per million according
to the Qualisys costumer support), eye trackers are said to
exhibit some drift and jitter (
          <xref ref-type="bibr" rid="b16">16</xref>
            ).</p>
			
          <p>Different ways to synchronize different devices have
been developed and are used in industrial and research
applications. One way is to send TTL (Transistor&#x2013;transistor
logic) triggers indicating the start and stop of a recording.
Other developments include timecode and genlock/sync (
          <xref ref-type="bibr" rid="b24">24</xref>
            ). Timecode, such as the SMPTE
timecode, developed by the Society of Motion Picture and
Television Engineers, is a standard in the film industry to link
cameras or video and audio material. The SMPTE
timecode indexes each recorded frame (or every second, third,
etc. depending on the frame rate of the devices) with a time
stamp, to offer synch points for post processing. However,
such time codes can still cause jitter as well as drift if they
are not strictly kept together by, for instance, using a
central clock or a reference signal genlocking the devices.
However, such devices are relatively expensive and
require some engineering knowledge to set up correctly.
Often, they also require a cable connection between the
device and the recording computer. With this being less of a
problem for the motion capture system (since the pulses
would only be sent to the cameras), the (wireless) eye
tracker would lose its mobility. Some systems offer
wireless synchronization via WLAN, however, this is likely to
introduce delays, inconsistencies, and data loss due to
unreliability and loss of the signal.</p>

          <p>Another option that has been developed to synchronize
different devices is the lab streaming layer (LSL). The
LSL is a system for the synchronized collection of various
time series data over network. However, it requires
programing and computer knowledge, especially if the motion
capture and eye tracker systems at hand are not among the
already supported devices. Thus, it might not be suitable
and easy to use for everyone.</p>
        </sec>
		
      <sec id="S1e">
        <title>Aim of this paper</title>		

          <p>In order to overcome such device-specific, hard- and
/or software-based solutions, we aimed for a device-free,
behavior-based approach to reliably synchronize the two
systems that can be used with any combination of motion
capture and eye tracking systems. This approach should be
easy to perform for the participant and automatically
processable by a computational algorithm to avoid manual
synchronization of each separate recording. Such a
solution has low demands on technical knowledge and could
be used with any combination of eye tracker and motion
capture system at no extra cost. Furthermore, the
synchronization would be purely based on the numerical
representations of both mocap and eye tracker data, so possible
differences in recording beginnings of the different (eye
tracker) cameras would not affect the synchronization
accuracy.</p>

          <p>This computational synchronization solution was
developed in a pilot phase, and a refined version of it was
subsequently tested in a second, larger data collection.
This paper describes this development as well as the
evaluation of the accuracy in comparison to manual
synchronization of the recordings.</p>
        </sec>
      </sec>

    <sec id="S2">
      <title>Pilot phase &#x2013; Methods and results</title>
	  
      <p>In order to develop a computational method to
synchronize motion capture and eye tracker data, pilot data
from six participants within a sign language experiment
were collected. Data were simultaneously recorded using
the motion capture system, the eye tracker, and an external
(regular) video camera. An actual experiment setting was
chosen so that we were able to collect the data in an
authentic scientific scenario. During the experiment, each
participant signed five different short stories, resulting in
five recordings per participant.</p>

      <sec id="S2a">
        <title>Equipment</title>
		
        <p>We used a Qualisys Oqus 5+ infrared optical motion
capture system (8 cameras mounted to the ceiling of the
room) tracking at a frame rate of 120 Hz as well as an
Ergoneers Dikablis Essential head mounted eye tracker
(glasses) tracking at a frequency of 50 Hz (both eye and
field camera). 120 Hz is the usual frequency at which we
track motion capture data, being sufficient for whole body
movement (
          <xref ref-type="bibr" rid="b2">2</xref>
          ). 50 Hz is the only frequency at
which this eye tracker operates.</p>
      </sec>
	  
      <sec id="S2b">
        <title>Procedure</title>
		
        <p>At the beginning of the recording session, participants
were equipped with motion capture markers (25 in this
case) and the eye tracker glasses, which were calibrated to
the participant&#x2019;s left eye (see Fig. 1).</p>

<fig id="fig01" fig-type="figure" position="float">
					<label>Figure 1.</label>
					<caption>
						<p>a) Mocap markers as a schematic representation and b) mocap markers and eye tracker attached to participant.</p>
					</caption>
					<graphic id="graph01" xlink:href="jemr-11-02-e-figure-01.png"/>
				</fig>

        <p>In order to synchronize the eye tracker and motion
capture recordings, the participants were instructed to look
straight with upright body posture fixing a point in space
(e.g., a target on the wall, in our case the head of another
person standing in front of a wall opposite the participant),
and then nod (i.e., move the chin towards the chest, change
direction when being about half way between straight
position and chest) very quickly at the beginning of each
recording, keeping the eyes open and fixating the target
while nodding (see mocap trace illustration in Fig. 2). The
nod should be performed as one movement (i.e., not
stopping in the lowest point of the motion, but just changing
directions and moving upwards immediately). The nod
resulted in a sharp vertical displacement of both the pupil
data and the mocap data of the head markers (see Fig. 3)
at the same time. The time point of maximal displacement
is used to align mocap and eye tracker data afterwards. To
simplify the procedure, the mocap recording was always
started first, followed by the recording of the eye tracker,
with about a second delay. The start of each recording was
announced to the participants. Thus, sufficient time was
given for the participants to adjust to the target before
and/or right at the beginning of the recording.</p>

<fig id="fig02" fig-type="figure" position="float">
					<label>Figure 2.</label>
					<caption>
						<p>Trace illustration of the nod of the four head and the chin markers. The gray stick representation depicts the starting point, the black stick representation the end position, and the blue lines display the motion trajectory of the nod.</p>
					</caption>
					<graphic id="graph02" xlink:href="jemr-11-02-e-figure-02.png"/>
				</fig>
				
<fig id="fig03" fig-type="figure" position="float">
					<label>Figure 3.</label>
					<caption>
						<p>Vertical displacement of pupil and (front left) head marker during the nod (indicated by the red arrow).</p>
					</caption>
					<graphic id="graph03" xlink:href="jemr-11-02-e-figure-03.png"/>
				</fig>				
      </sec>
	  
      <sec id="S2c">
        <title>Analysis</title>
		
        <p>The following workflow describes our approach to
computationally synchronize eye tracker and mocap data.
Several steps are required to prepare the data, so that
automatic synching is possible.</p>

        <p>The first step for the pupil data was performed in the
Ergoneers recording software D-Lab, whereas all the
remaining steps were performed in Matlab using the MoCap
Toolbox (
          <xref ref-type="bibr" rid="b5">5</xref>
          ), a Matlab toolbox
for analyzing and visualizing motion capture data, and
other Matlab functions. The first step in D-Lab included a
pupil detection check to ensure that the pupil was
successfully recognized during the nod. For those participants
whose pupil moved out of the automatic tracking range
when at the maximum of the nod, it was manually added
using the &#x201C;Pupil adjustment&#x201D; function in D-Lab.
Afterwards, the numerical data were exported into a text file.</p>
		
        <p>After labeling the mocap markers in the respective
recording software Qualisys Track Manager (QTM) and
exporting the labeled data into a text file, both pupil and
mocap data were imported into Matlab using the MoCap
Toolbox. In order to process the eye tracker data, we wrote
an extension that would read in the numerical output from
the eye tracker software and parse it into a MoCap Toolbox
compatible data structure.</p>

        <p>For this analysis, the vertical displacement of the pupil
data and the vertical displacement of the left front head
marker was used (Fig. 4a). In the first step, the pupil data
was linearly gap-filled to remove possible blinks
happening before and after the nod (see Fig. 4b). This was done
to make the data smoother (i.e., more continuous) for
further computation. Gaps of blinks were short (max. 10
frames) and since these actual data were not relevant for
the computation, the gaps could be filled irrespective of
their length without necessitating the use of a threshold.
The mocap data were gap-filled just in case, although it
was rather unlikely that the head marker was occluded in
the beginning of a recording.</p>

<fig id="fig04" fig-type="figure" position="float">
					<label>Figure 4.</label>
					<caption>
						<p>Workflow of the synchronization procedure. a) vertical displacement of pupil and left front head marker; b) linearly gap-filled data; c) vertical velocity data; d) first 1.5 sec of mocap and first sec of pupil data set to 0; e) z-scored data plus indicating peak-picking of velocity minima of the nod (green arrows) and subsequent zero-crossing (red arrows) used to align eye and mocap data.</p>
					</caption>
					<graphic id="graph04" xlink:href="jemr-11-02-e-figure-04.png"/>
				</fig>

        <p>Next, the instantaneous velocity was calculated for
both eye and mocap data using numerical differentiation
and a Butterworth smoothing filter (second-order
zerophase digital filter). This centered the data around 0,
removing potential differences in the height of the
participants as well as compensating for movement drifts in the
beginning (in case the participant was not yet looking at
the target), while also resulting in a sharper, more focused
minimum of the curve (representing the nod) compared to
the position data (see. Fig. 4c).</p>

        <p>In order to remove artefacts in the beginning of the
recording (i.e., the participant was not focusing the target
yet), the first second of the pupil data and the first 1.5
seconds of the mocap data were set to 0 (see Fig. 4d). This
was possible to be done hazard-free, due to starting the
recording devices consecutively at a relatively low pace
(mocap was started first, thus the slightly longer time).</p>

        <p>Subsequently, both data streams were z-scored (a way
to standardize scores to have an overall mean of 0 and a
standard deviation of 1) to adjust the scaling of the values,
since some participants would nod faster than others (see
Fig. 4e).</p>

        <p>Following this, the first local minimum of each data
stream was computationally determined by using a
(selfimplemented) peak-picking algorithm with a threshold of
-2 (see green arrows in Fig. 4e). This value was found
suitable as a threshold in the given data set.</p>

        <p>The maximal dislocation of position data (i.e., at the
point of change in direction from moving the head
downwards to moving the head upwards again) results in a
velocity value of zero, so the zero-crossing of the velocity
curve following the first local maximum was determined
(see red arrows in Fig. 4e) and taken as the synchronization
point. Since the velocity value would never be exactly
zero, the frame before the zero-crossing was used as the
synchronization point.</p>

        <p>In the last step, the (temporal) difference between the
occurrences of both zero-crossings was calculated and
subsequently used to trim the beginning of the mocap data,
so that the nod would be aligned in both data streams and
the data therefore synchronized (see Fig. 5).</p>

<fig id="fig05" fig-type="figure" position="float">
					<label>Figure 5.</label>
					<caption>
						<p>Illustration of the trimming. The dashed red lines indicate the sync points for each data stream. The red arrow indicates the temporal difference between both sync points. The green arrow (same size as the red arrow) indicates the part to be trimmed from the beginning of the mocap data.</p>
					</caption>
					<graphic id="graph05" xlink:href="jemr-11-02-e-figure-05.png"/>
				</fig>
      </sec>
	  
      <sec id="S2d">
        <title>Results and evaluation</title>
		
        <p>In order to test whether the computational extraction could
locate the synch point correctly, &#x201C;ground truth data&#x201D; from
both the mocap and the eye tracker were assessed for
comparison purposes. The motion capture &#x201C;ground truth
data&#x201D; were assessed within QTM by determining the
minimal point of the vertical dislocation of the head during
the nod in each recording by plotting the time series of the
left front head marker. For the eye tracker data, this was
done in D-Lab. The frame that displayed the most
downwards displacement of the field camera was taken as
the reference for the maximal vertical dislocation of the
eye during the nod (for an example see Figure 6). For this,
the play-back / time line was manually reset, so that the
recording would start from 0:00 (this is due to D-Lab not
using a global time code to record the different cameras,
so starting times vary between eye and field camera).</p>

<fig id="fig06" fig-type="figure" position="float">
					<label>Figure 6.</label>
					<caption>
						<p>Field camera sequence of the head nod. During the nod, the head (i.e., field camera) moves downwards until the point of most downward displacement (middle picture) while fixing the target (the face of the other personin this case).The corresponding time stamp of the picture frame in the middle constitutes the &#x201C;ground truth data&#x201D;.</p>
					</caption>
					<graphic id="graph06" xlink:href="jemr-11-02-e-figure-06.png"/>
				</fig>

        <p>Subsequently, the sync points of the computational
extraction were subtracted from the &#x201C;ground truth data&#x201D; in
order to determine the accuracy of the computational
solution. The data, as well as their respective differences from
two of the six participants, are presented in Table 1.</p>

<table-wrap id="t01" position="float">
					<label>Table 1.</label>
					<caption>
						<p>&#x201C;Ground truth&#x201D; sync points (&#x201C;QTM&#x201D; and &#x201C;D-Lab&#x201D;) and computationally derived sync points (&#x201C;Matlab&#x201D;) as well as their respective difference (&#x201C;Diff&#x201D;). All values are in seconds.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="3">Mocap data</td>
            <td rowspan="1" colspan="3">Pupil data</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Trial</td>
            <td rowspan="1" colspan="1">QTM</td>
            <td rowspan="1" colspan="1">Matlab</td>
            <td rowspan="1" colspan="1">Diff</td>
            <td rowspan="1" colspan="1">D-Lab</td>
            <td rowspan="1" colspan="1">Matlab</td>
            <td rowspan="1" colspan="1">Diff</td>
          </tr>
						</thead>		  
						<tbody>		  
  <tr>
    <td rowspan="1" colspan="1">1</td>
    <td rowspan="1" colspan="1">4.7500</td>
    <td rowspan="1" colspan="1">4.7500</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">3.48</td>
    <td rowspan="1" colspan="1">3.46</td>
    <td rowspan="1" colspan="1">-0.02</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">2</td>
    <td rowspan="1" colspan="1">4.9000</td>
    <td rowspan="1" colspan="1">4.9000</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">3.00</td>
    <td rowspan="1" colspan="1">3.02</td>
    <td rowspan="1" colspan="1">0.02</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">3</td>
    <td rowspan="1" colspan="1">4.7830</td>
    <td rowspan="1" colspan="1">4.7830</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">3.16</td>
    <td rowspan="1" colspan="1">3.14</td>
    <td rowspan="1" colspan="1">-0.02</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">4</td>
    <td rowspan="1" colspan="1">4.7500</td>
    <td rowspan="1" colspan="1">4.7583</td>
    <td rowspan="1" colspan="1">0.0083</td>
    <td rowspan="1" colspan="1">3.50</td>
    <td rowspan="1" colspan="1">3.50</td>
    <td rowspan="1" colspan="1">0</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">5</td>
    <td rowspan="1" colspan="1">3.8830</td>
    <td rowspan="1" colspan="1">3.8830</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">2.70</td>
    <td rowspan="1" colspan="1">2.70</td>
    <td rowspan="1" colspan="1">0</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">6</td>
    <td rowspan="1" colspan="1">4.1083</td>
    <td rowspan="1" colspan="1">4.1083</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">2.76</td>
    <td rowspan="1" colspan="1">2.76</td>
    <td rowspan="1" colspan="1">0.02</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">7</td>
    <td rowspan="1" colspan="1">4.6916</td>
    <td rowspan="1" colspan="1">4.6916</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">2.88</td>
    <td rowspan="1" colspan="1">2.88</td>
    <td rowspan="1" colspan="1">0</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">8</td>
    <td rowspan="1" colspan="1">3.7500</td>
    <td rowspan="1" colspan="1">3.7500</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">2.78</td>
    <td rowspan="1" colspan="1">2.78</td>
    <td rowspan="1" colspan="1">0</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">9</td>
    <td rowspan="1" colspan="1">3.5916</td>
    <td rowspan="1" colspan="1">3.5916</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">2.54</td>
    <td rowspan="1" colspan="1">2.52</td>
    <td rowspan="1" colspan="1">-0.02</td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">10</td>
    <td rowspan="1" colspan="1">4.5916</td>
    <td rowspan="1" colspan="1">4.5916</td>
    <td rowspan="1" colspan="1">0</td>
    <td rowspan="1" colspan="1">3.02</td>
    <td rowspan="1" colspan="1">3.02</td>
    <td rowspan="1" colspan="1">0</td>
  </tr>
						</tbody>
					</table>
					</table-wrap>	

        <p>For the mocap data, the &#x201C;ground truth data&#x201D; equaled the
computationally-derived sync points in all trials but one,
in which a difference of one frame (8.3 ms) was found. For
the pupil data, the &#x201C;ground truth data&#x201D; conformed to the
computationally derived sync points in five trials, whereas
there was a difference of one frame (20 ms) in the other
five. In three of these five trials, the difference was
negative, meaning the automatic sync solution located the peak
of the nod one frame before the &#x201C;ground truth data&#x201D;,
whereas, in the other two trials, the automatic solution
located the peak nod after the &#x201C;ground truth data&#x201D;. This
suggests that the differences were rather due to rounding
errors in the calculation than a trend that one measure would
have been consistently behind the other.</p>
      </sec>

      <sec id="S2e">
        <title>Discussion</title>

        <p>The results of the comparison between &#x201C;ground truth
data&#x201D; and computational synchronization show that the
computational solution is able to correctly and accurately
identify the nod. However, several issues arose that led to
refinements of the approach. Only the data of two out of
six participants could be synced this way. The reason for
the two other participants failing was mainly that they
blinked during the nod instead of keeping the eyes open.
A technical weakness in our approach was that we only
aligned the recordings in the beginning, leaving it
unknown whether any drift or jitter would occur between the
two data streams that would result in inaccurate
synchronization. Furthermore, the sample was relatively small, so
more data were needed to test and validate the method.
Thus, several improvements to the approach were made,
which are outlined in the following sections.</p>
      </sec>
    </sec>
	
    <sec id="S3">
      <title>Appraisal phase &#x2013; Methods</title>
	  
      <p>This section will outline the second data collection
used to test and validate the synchronization approach,
alongside the changes to the method to improve the
approach.</p>

      <p>We again chose to collect data within an actual
experiment setting. Ten participants signed four different
sentences to another person standing opposite to them,
resulting in four recordings per person and 40 recordings all
together.</p>

      <sec id="S3a">
        <title>Equipment</title>
		
        <p>Data were recorded with the same eight-camera Oqus
5+ motion capture system, however this time tracking at a
rate of 200 Hz. This was done to increase the temporal
accuracy and to reduce rounding errors by matching the
sampling frequency of the eye tracker in an integer
relationship. The same Ergoneers head mounted eye tracker at a
sampling rate of 50 Hz was used.</p>
      </sec>
	  
      <sec id="S3b">
        <title>Procedure</title>
		
        <p>Two features were added to the procedure. In order to
familiarize the participants with the environment and the
nodding task, several practice nods were included at the
beginning of the recording procedure, thus ensuring that
the participant would understand the task and its
requirements and perform it correctly, in particular keeping the
eyes open during the nod. Furthermore, a nod in the end
was added to test the accuracy of the synchronization and
whether there was any drift in the system (due to technical
challenges to use time coding with the eye tracker). Since
the recordings of the four sentences were rather short
(recordings lasting 10 to 15 seconds), we recorded five longer
conversations of about 1 minute each with a subset of five
of the ten participants. As was done before, the mocap
recording was started before the eye tracker.</p>
      </sec>
	  
      <sec id="S3c">
        <title>Subjective experience of participants</title>
		
        <p>In order to investigate the subjective experience of the
nodding procedure, participants were asked to fill in a
short questionnaire after the data collection. Questions
related to how easy it was to keep the eyes open during the
nodding, how comfortable participants felt during the nod,
how clear it was when to produce the nod, and how much
the participants felt that the nod disturbed the performance
of the main task of the data collection. The items were
rated on 7-step scales ranging from &#x201C;not at all&#x201D; to &#x201C;very
much&#x201D;.</p>
      </sec>
	  
      <sec id="S3d">
        <title>Analysis</title>
		
        <p>The sync points were computed in the same way as
described above. In order to locate the nods in the end of the
recording, the algorithm was used in the reversed way, so
starting the peak detection from the end of the recording.
Whenever the first local minima exceeding the threshold
of -2 was reached, the algorithm would revert to the
previous zero-crossing and choose the frame before the
zerocrossing as the sync point. However, the recording was not
trimmed, so the sync point could be expressed relative to
the beginning of the recording. An exemplification is
shown in Figure 7.</p>

<fig id="fig07" fig-type="figure" position="float">
					<label>Figure 7.</label>
					<caption>
						<p>Illustration of the end nod with the z-scored velocity data. The green arrow indicates the local minima exceeding the threshold of -2 corresponding to the end nod. The red arrow indicates the subsequent zero-crossing that was used as sync point.</p>
					</caption>
					<graphic id="graph07" xlink:href="jemr-11-02-e-figure-07.png"/>
				</fig>

        <p>Moreover, the &#x201C;ground truth data&#x201D; of mocap and eye
tracker were gathered in the same way as previously.
However, it was now performed for both the nod in the
beginning and the nod in the end.</p>
      </sec>
    </sec>
    <sec id="S4">
      <title>Results and evaluation</title>
	  
      <p>We will first present the results regarding the
alignment of computationally extracted sync points and the
manually acquired &#x201C;ground truth data&#x201D; to evaluate the
accuracy of the sync point extraction. Table 2 displays the
differences between the temporal locations from the
computational synchronization approach and the manual
&#x201C;ground truth data&#x201D; of both mocap and eye tracker for each
of the 40 recordings. The differences are given in frames.</p>

<table-wrap id="t02" position="float">
					<label>Table 2.</label>
					<caption>
						<p>Differences in frames between the computational sync points and the &#x201C;ground truth data&#x201D; of both mocap and eye tracker for all 40 recordings. One mocap frame equals 5 ms, while one eye tracker frame equals 20 ms.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="2" style="text-align: center;">mocap</td>
            <td rowspan="1" colspan="2" style="text-align: center;">eye tracker</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">start nod</td>
            <td rowspan="1" colspan="1">end nod</td>
            <td rowspan="1" colspan="1">start nod</td>
            <td rowspan="1" colspan="1">end nod</td>
          </tr>
						</thead>
						<tbody>
          <tr>
            <td rowspan="1" colspan="1">P1</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P2</td>
            <td rowspan="1" colspan="1">0 | 1 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | -1</td>
            <td rowspan="1" colspan="1">1 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P3</td>
            <td rowspan="1" colspan="1">0 | 0 | 1 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">1 | 0 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P4</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">1 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P5</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">1 | 0 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P6</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 1 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P7</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 1 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P8</td>
            <td rowspan="1" colspan="1">1 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 1 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 1</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P9</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 1</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P10</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 1</td>
            <td rowspan="1" colspan="1">0 | 0 | 0 | 0</td>
            <td rowspan="1" colspan="1">0 | 1 | 0 | 1</td>
            <td rowspan="1" colspan="1">0 | 0 | 1 | 0</td>
          </tr>
						</tbody>
					</table>
					</table-wrap>	

      <p>For the mocap data, the &#x201C;ground truth data&#x201D; equaled the
computationally derived sync points in all but five trials
for the nod in the beginning and three trials for the nod in
the end. In all cases, the difference was one frame (5 ms).
For the pupil data, the &#x201C;ground truth data&#x201D; conformed with
the computationally derived sync points in all but four
trials for the nod at the beginning and five trials for the nod
at the end. Each difference was also one frame (20 ms) in
these instances. In all cases but one (P2, end nod mocap),
the sync point was one frame after the &#x201C;ground truth data&#x201D;.</p>

      <p>In order to further evaluate the accuracy of the
synchronization solution regarding synchronization over time of
both systems (i.e., drift), the durations in between the nod
at the beginning and the end for the mocap system and the
eye tracker were compared per trial. The results of the
short recordings are shown in Table 3, while the five
longer recordings are presented in Table 4.</p>

<table-wrap id="t03" position="float">
					<label>Table 3.</label>
					<caption>
						<p>Differences of recording durations in seconds of mocap system and eye tracker between beginning and end nod per trial and average per participant.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="4">Duration differences per trial in seconds</td>
            <td rowspan="1" colspan="1"/>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">T1</td>
            <td rowspan="1" colspan="1">T2</td>
            <td rowspan="1" colspan="1">T3</td>
            <td rowspan="1" colspan="1">T4</td>
            <td rowspan="1" colspan="1">Mean</td>
          </tr>
						</thead>
						<tbody>
          <tr>
            <td rowspan="1" colspan="1">P1</td>
            <td rowspan="1" colspan="1">0.010</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">0.0050</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P2</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">-0.030</td>
            <td rowspan="1" colspan="1">0.0088</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P3</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">-0.010</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0.015</td>
            <td rowspan="1" colspan="1">0.0075</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P4</td>
            <td rowspan="1" colspan="1">-0.020</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">-0.010</td>
            <td rowspan="1" colspan="1">-0.020</td>
            <td rowspan="1" colspan="1">0.0125</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P5</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">-0.005</td>
            <td rowspan="1" colspan="1">-0.005</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0.0038</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P6</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P7</td>
            <td rowspan="1" colspan="1">-0.005</td>
            <td rowspan="1" colspan="1">0.010</td>
            <td rowspan="1" colspan="1">0.020</td>
            <td rowspan="1" colspan="1">0.010</td>
            <td rowspan="1" colspan="1">0.0112</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P8</td>
            <td rowspan="1" colspan="1">-0.010</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">-0.005</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">0.0063</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P9</td>
            <td rowspan="1" colspan="1">0.005</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">-0.005</td>
            <td rowspan="1" colspan="1">0.0025</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">P10</td>
            <td rowspan="1" colspan="1">-0.010</td>
            <td rowspan="1" colspan="1">-0.005</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0.0037</td>
          </tr>
						</tbody>
					</table>
					</table-wrap>
					
<table-wrap id="t04" position="float">
					<label>Table 4.</label>
					<caption>
						<p>Differences of recording durations in seconds of mocap and eye tracker for long recordings.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
          <tr>
            <td rowspan="1" colspan="1"/>		  
            <td rowspan="1" colspan="3">Durations in seconds</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Mocap</td>
            <td rowspan="1" colspan="1">Eye </td>
            <td rowspan="1" colspan="1">Difference</td>
          </tr>
						</thead>
						<tbody>
          <tr>
            <td rowspan="1" colspan="1">R1</td>
            <td rowspan="1" colspan="1">42.700</td>
            <td rowspan="1" colspan="1">42.700</td>
            <td rowspan="1" colspan="1">0</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">R2</td>
            <td rowspan="1" colspan="1">43.010</td>
            <td rowspan="1" colspan="1">43.000</td>
            <td rowspan="1" colspan="1">-0.010</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">R3</td>
            <td rowspan="1" colspan="1">48.895</td>
            <td rowspan="1" colspan="1">48.880</td>
            <td rowspan="1" colspan="1">-0.015</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">R4</td>
            <td rowspan="1" colspan="1">83.895</td>
            <td rowspan="1" colspan="1">83.860</td>
            <td rowspan="1" colspan="1">-0.035</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">R5</td>
            <td rowspan="1" colspan="1">65.815</td>
            <td rowspan="1" colspan="1">65.800</td>
            <td rowspan="1" colspan="1">-0.015</td>
          </tr>
						</tbody>
					</table>
					</table-wrap>						

      <p>For the short recordings, the duration differences are
on average below the sampling frequency of the eye
tracker (50Hz, 20ms); in eight out of ten cases, the average
difference is below half the eye tracker sampling
frequency. In 35 of the 40 recordings, the difference is
smaller than or half of the eye tracker sampling frequency.
In 14 cases of the short recordings, the eye tracker and the
mocap recordings were of exactly the same length,
whereas in 13, the eye tracker recording was shorter than
the mocap, and in the remaining 13, the eye tracker
recording was longer than the mocap.</p>

      <p>In the five longer recordings, the differences ranged
from 0 to 0.035 ms, with four of the five recordings being
below the eye tracker frequency. For all long recordings,
the eye tracker recordings were shorter compared to the
mocap system.</p>

      <sec id="S4a">
        <title>Subjective experiences</title>
		
        <p>Participants were asked to rate four questions regarding
their experiences about the nod after the data collection on
a 7-point scale. The detailed overview of the ratings is
found in Table 5.</p>

<table-wrap id="t05" position="float">
					<label>Table 5.</label>
					<caption>
						<p>Rating results of participants&#x2019; subjective experiences. A 7-step scale (1=not at all &#x2013; 7=very much, reversed for last question) was used.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"><break/>Mean</td>
            <td rowspan="1" colspan="1">Standard deviation</td>
          </tr>
						</thead>
						<tbody>
          <tr>
            <td rowspan="1" colspan="1">How easy was it to keep the eyes open during the nods?</td>
            <td rowspan="1" colspan="1">5.6</td>
            <td rowspan="1" colspan="1">1.65</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">How comfortable did you feel during the nod?</td>
            <td rowspan="1" colspan="1">5.0</td>
            <td rowspan="1" colspan="1">1.89</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">How clear was it when to produce the nod?</td>
            <td rowspan="1" colspan="1">6.4</td>
            <td rowspan="1" colspan="1">0.97</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">How disturbing was it to perform the nod?</td>
            <td rowspan="1" colspan="1">2.3</td>
            <td rowspan="1" colspan="1">1.83</td>
          </tr>
						</tbody>
					</table>
					</table-wrap>	

        <p>Participants were overall positive about the task. They
found it easy to keep the eyes open and were overall rather
comfortable with the task. It was very clear when to
produce the nod. Furthermore, the nod was not perceived as
disturbing.</p>
      </sec>
    </sec>
	
    <sec id="S5">
      <title>Discussion</title>
	  
      <p>In this paper, we described the development of a
computational approach to automatically synchronize
recordings of a motion capture system and an eye tracker. The
aim of the paper was to present a solution that is reliable
and does not depend on a ready-made plug-in by the
manufacturer, but is instead device-free and intrinsic to the
recording of the data.</p>

      <p>The measured accuracy of the motion capture data is
very high; 90% of the (nine out of ten) pilot recordings at
a frame rate of 120 Hz, and 90% (72 out of 80) of the
second data collection at 200 Hz could be optimally aligned
between the &#x201C;ground truth data&#x201D; and the computational
solution, while the remaining ones showed one frame
difference. The difference of one frame (8.3 ms at 120 Hz and 5
ms at 200 Hz) could be due to the smoothing of the data
after the time derivation or due to rounding during the
calculation. Small inconsistencies could also have emerged
from a slower speed or a smoother movement during the
nod. However, the time difference is so small that it can be
considered negligible.</p>

      <p>The accuracy of the eye tracker data was less than the
mocap data in the pilot recordings, though it increased
during the second data collection. In the pilot, five out of the
ten recordings (50%) could be optimally aligned, whereas
the remaining five differed in one frame each. In the
second data collection, 71 out of 80 sync points equaled the
&#x201C;ground truth data&#x201D; (88.75%). These values suggest that
the procedure can be regarded reliable and accurate for the
required purpose of time-critically synchronizing both
systems. Our analysis showed a maximum difference of one
frame in each system, suggesting a maximum difference
(&#x201C;worst case scenario&#x201D;) of 25 ms between mocap and eye
tracker, while the actual differences were mostly much
smaller. These values should be sufficient for most
research questions related to eye movement, unless very fast
saccades and microsaccades are of interest (
          <xref ref-type="bibr" rid="b16 b40">16, 40</xref>
        ). However, if
higher temporal accuracy of the eye movements is needed,
the sampling frequency of the eye tracker should be
(much) higher than 50 Hz.</p>

      <p>We increased the sampling frequency of the mocap
system from 120 Hz to 200 Hz to match the eye tracker
sampling frequency in an integer relationship. Despite
recording more data points per time, this did not increase the
accuracy of locating the sync points (peaks of the nods), as
we received the same percentage of correctly located
synch points. However, it might have still reduced
rounding errors when combining the data with the eye tracker
and thus increased data accuracy when trimming the data,
due to less noisy rounding and interpolation between the
two systems.</p>

      <p>The less accurate synchronization result for the pupil
data, especially in the pilot recordings could be related to
the &#x201C;ground truth data&#x201D; being based on a video signal and
not a time series data representation, like the mocap data.
Local minima of a curve might be more clearly detectable
than the change in frames of the eye tracker video data,
thus that kind of &#x201C;ground truth data&#x201D; could be slightly less
reliable. Issues in pupil detection (i.e., when the pupil was
adjusted manually) could also have influenced the
accuracy. Manual adjustment might have caused less accurate
precision or larger differences between continuous frames
than automatic tracking, thus the resulting velocity curve
could have contained more noise. Furthermore, slight
inconsistencies could have emerged due to different starting
points of the field and the eye camera. D-Lab does not use
a global time clock for its recordings, but records the
devices &#x201C;as they are detected&#x201D;, so there has been a variable
delay (ranging from 14 ms to 85 ms) between the start of
both cameras.</p>

      <p>When trimming the data and comparing the resulting
lengths of both recordings, the recordings were very
similar in length. In most cases, the differences were below the
sampling frequency of the eye tracker (often even half the
sampling frequency), so the accuracy should be sufficient
for most applications as mentioned above. The small
differences in the lengths of the recordings could be related
to rounding errors when deriving the sync points, or
suggest that there is a bit of drift in the alignment of the two
data streams. Since the differences in the shorter
recordings are both positive and negative (i.e., for some
recordings, the eye tracker recording is shorter, whereas in other
cases the mocap recording is shorter), these might be rather
due to rounding errors, whereas in the long recordings, the
eye tracker recordings were all shorter than the mocap,
suggesting a trend that the eye tracker was &#x201C;faster&#x201D;. For a
more extensive investigation of the existing drift as well as
possible jitter, appropriate hardware is required that would
synchronize the recordings using genlocking on a
frame-to-frame basis.</p>

      <p>Moreover, our longer recordings of about one minute
were still relatively short. In order to further investigate
drift and jitter between the motion capture system and eye
tracker, longer recordings (e.g., about 10 minutes) should
be made. However, since recordings in our studies are
usually not longer than one to two minutes, we refrained from
making longer recordings at this stage.</p>

      <p>In the pilot data collection, only two out of six
participants could be reliably synchronized using this approach.
The other four were found difficult due to different
reasons. In two cases, the eye tracker could not reliably track
the participants&#x2019; pupils due to technical difficulties. In the
other two cases, the participants were blinking at the
moment of the nod. The closure time in these cases was in the
middle of the nod, so it was impossible to manually adjust
(or add/estimate) the pupil, after which the computational
synchronization could still have been possible. In the
second data collection, we provided the participants with
more thorough and clear instructions, as well as asked
them to perform practice nods prior to the recording to
make them familiar with the procedure. This seemed to
have clearly helped, as none of the participants blinked
during the nod in the second data collection. This finding
strongly indicates the importance of clear instruction for
the participants, explaining the procedure to them, and
ensuring they understand the underlying rationale.</p>

      <p>The assessment of participants&#x2019; subjective experiences
related to the nod showed that it was not perceived as
disturbing or difficult to perform. It seemed to have been well
integrated into the task and was clear when and how to
produce it. It might have also helped participants to have a
defined start and end of each recording and concentrate on
the task. In order to even further prompt the participants to
perform the nod, a metronome beat could be presented (in
case of hearing participants), so that the participant could
synchronize the nod for instance to the fifth beat.</p>

      <p>In order to check that the nod was performed
successfully, a real-time or close to real-time check could be
included. If it was possible to, for instance, display the
vertical displacement of the eye movement as a time-series
directly during the recording, the success of the nod
(especially whether or not a blink happened) could be checked
immediately after it was performed.</p>

      <p>The question whether more accurate results would
have resulted from the synchronization plug-in provided
by Qualisys or a sync box solution remains. The technical
setup of the plug-in involving two wirelessly connected
computers might point towards such connections
potentially introducing lags. However, in order to answer
this sufficiently, the set-up would have to be tested with
both the plugin and the nod and the results compared
afterwards.</p>

      <p>We also considered other motion sequences for the
synchronization in order to potentially improve our
approach and make the synchronization easier. We piloted
two different approaches, 1) several consecutive nods and
2) a passive application of force by someone else exerting
a sudden strike to the participant&#x2019;s head. However, our
volunteers found both approaches more uncomfortable than
the single nod. The consecutive nods felt very unnatural,
and either the first or the last were less pronounced,
making it difficult for the automatic detection to choose one.
The sudden knock felt uncomfortable, since, despite
knowing about it, it still felt somewhat unexpected.
Additionally, volunteers involuntarily blinked during the nod,
probably due to the sudden and unexpected exertion of
force on them. It seems, therefore, that a single nod is the
best approach for this method.</p>

      <p>The approach described here will be integrated into the
Mocap Toolbox to make it assessable for everyone. As of
now it is available for free on the toolbox website (MoCap
Toolbox website: 
<ext-link ext-link-type="uri" xlink:href="https://www.jyu.fi/hytk/fi/laitokset/mutku/en/research/materials/mocaptoolbox" xlink:show="new">https://www.jyu.fi/hytk/fi/laitokset/mutku/en/research/materials/mocaptoolbox</ext-link>) 
and will be integrated into the toolbox with the next release.
The function set includes the function to read the Dikablis
eye tracker data into Matlab, convert it into a Mocap
Toolbox compatible data structure, and the function to
automatically sync the eye tracker recording with the
corresponding mocap recording. No further Matlab expertise
than basic understanding of how to use the Mocap Toolbox
nor any other external devices would be needed to apply
this syncing method. Furthermore, since the Mocap
Toolbox stores the eye tracker data in the same way as
mocap data, the same functions and procedures can be
used to analyze the eye tracker data.</p>

      <p>The mocap toolbox function also offers the possibility
to adapt the thresholds for detecting the nod for both
mocap and pupil data. In our data sets, a threshold of -2
could reliably detect the nod in both (the z-scored) mocap
and pupil data, though this might not be the case for other
recordings. Thus, adjustable thresholds that can account
for participants performing the nod at different speeds and
spans make the function more flexible.</p>

      <p>Furthermore, the nod was useful for manually
synchronizing the different data streams used in the experiment.
The motion capture, eye tracker, and regular video data
could be accurately synchronized by using the nod as a
reference when importing the data into the freely-available
audio and video annotation and transcription software
ELAN (see screenshot in Fig. 8), developed at the Max
Planck Institute for Psycholinguistics, The Language
Archive, Nijmegen, The Netherlands. Given the different
sampling frequencies of all the systems, it would have
been much more difficult to synchronize the recordings
without the clear displacement that the nod provided.</p>

<fig id="fig08" fig-type="figure" position="float">
					<label>Figure 8.</label>
					<caption>
						<p>Screenshot of the ELAN multimedia annotation software. The upper part of the screen shows video data from the external video camera as well as from the pupil and field camera. The descriptors in the middle visualize the pupil height data (the upmost panel) and the three-dimensional marker location data derived from the front left head marker (the three bottom panels). The markings on the three tiers at the bottom of the screen are annotation cells time-aligned with the video data.</p>
					</caption>
					<graphic id="graph08" xlink:href="jemr-11-02-e-figure-08.png"/>
				</fig>
    </sec>
	
    <sec id="S6">
      <title>Conclusions</title>
	  
      <p>This paper presented a generic, device-free approach to
accurately synchronize eye tracking and motion capture
systems computationally. Since it is a behavior-based
approach, it is expected to work with every motion capture
and (mobile) eye tracking system. The method has so far
only been tested with one motion capture system and one
eye tracker, thus it should be tested with a wider range of
systems in the future. Careful instruction of participants is
crucial in this approach, so that they are aware of what they
are supposed to do. Nevertheless, with participants
performing in the desired manner, the approach offers an easy
and device-free possibility to accurately synchronize both
devices. This method can be especially useful in case
plugin solutions are not available, are technically too
demanding, or are too cost-intensive. Furthermore, when external
devices, such as regular video cameras need to be
synchronized as well, this method has shown to be beneficial.</p>

      <sec id="S6a"  sec-type="COI-statement">
        <title>Ethics and Conflict of Interest</title>
		
        <p>The authors declare that the contents of the article are
in agreement with the ethics described in
<ext-link ext-link-type="uri" xlink:href="http://biblio.unibe.ch/portale/elibrary/BOP/jemr/ethics.html" xlink:show="new">http://biblio.unibe.ch/portale/elibrary/BOP/jemr/ethics.html</ext-link> 
and that there is no conflict of interest regarding the
publication of this paper.</p>
      </sec>
	  
      <sec id="S6b">
        <title>Acknowledgements</title>
		
        <p>This study was supported by the Academy of Finland
(projects 299067, 269089, and 304034). We wish to thank
Emma Allingham for help with the eye tracker and Elsa
Campbell for proofreading the manuscript.</p>
      </sec>
    </sec>
  </body>
<back>
<ref-list>
<ref id="b1"><mixed-citation publication-type="book" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Bishop</surname>, <given-names>L.</given-names></name>, &#x26; <name><surname>Goebl</surname>, <given-names>W.</given-names></name></person-group> (<year>2017</year>). <source>Mapping visual attention of ensemble musicians during performance of “temporally-ambiguous” music.</source> Conference of Music &#x26; Eye-Tracking, Frankfurt, Germany.</mixed-citation></ref>
<ref id="b2"><mixed-citation publication-type="thesis" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Burger</surname>, <given-names>B.</given-names></name></person-group> (<year>2013</year>). Move the way you feel: Effects of musical features, perceived emotions, and personality on music-induced movement (Doctoral dissertation). University of Jyväskylä, Jyväskylä, Finland.</mixed-citation></ref>
<ref id="b3"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Burger</surname>, <given-names>B.</given-names></name>, <name><surname>Thompson</surname>, <given-names>M. R.</given-names></name>, <name><surname>Luck</surname>, <given-names>G.</given-names></name>, <name><surname>Saarikallio</surname>, <given-names>S.</given-names></name>, &#x26; <name><surname>Toiviainen</surname>, <given-names>P.</given-names></name></person-group> (<year>2013</year>a). <article-title>Influences of rhythm- and timbre-related musical features on characteristics of music-induced movement.</article-title> <source>Frontiers in Psychology</source>, <volume>4</volume>, <fpage>183</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2013.00183</pub-id><pub-id pub-id-type="pmid">23641220</pub-id><issn>1664-1078</issn></mixed-citation></ref>
<ref id="b4"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><name><surname>Burger</surname>, <given-names>B.</given-names></name>, <name><surname>Thompson</surname>, <given-names>M. R.</given-names></name>, <name><surname>Saarikallio</surname>, <given-names>S.</given-names></name>, <name><surname>Luck</surname>, <given-names>G.</given-names></name>, &#x26; <name><surname>Toiviainen</surname>, <given-names>P.</given-names></name></person-group> (<year>2013</year>b). <article-title>Oh happy dance: Emotion recognition in dance movement.</article-title> In <person-group person-group-type="editor"><name><given-names>G.</given-names> <surname>Luck</surname></name> &#x26; <name><given-names>O.</given-names> <surname>Brabant</surname></name> <role>(Eds.)</role></person-group>, <source>Proceedings of the 3rd International Conference on Music and Emotion</source>. <publisher-loc>Jyväskylä, Finland</publisher-loc>: <publisher-name>University of Jyväskylä</publisher-name>.</mixed-citation></ref>
<ref id="b5"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Burger</surname>, <given-names>B.</given-names></name>, &#x26; <name><surname>Toiviainen</surname>, <given-names>P.</given-names></name></person-group> (<year>2013</year>) <article-title>MoCap Toolbox – A Matlab toolbox for computational analysis of movement data.</article-title><source>Bresin (Ed.), Proceedings of the 10th Sound and Music Computing Conference. Stockholm, Sweden.</source><fpage>172</fpage><lpage>178</lpage></mixed-citation></ref>
<ref id="b6"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Carlson</surname>, <given-names>E.</given-names></name>, <name><surname>Burger</surname>, <given-names>B.</given-names></name>, <name><surname>London</surname>, <given-names>J.</given-names></name>, <name><surname>Thompson</surname>, <given-names>M. R.</given-names></name>, &#x26; <name><surname>Toiviainen</surname>, <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>Conscientiousness and Extraversion relate to responsiveness to tempo in dance</article-title>. <source>Human Movement Science</source>, <volume>49</volume>, <fpage>315</fpage>–<lpage>325</lpage>. <pub-id pub-id-type="doi">10.1016/j.humov.2016.08.006</pub-id><issn>0167-9457</issn></mixed-citation></ref>
<ref id="b7"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Drai-Zerbib</surname>, <given-names>V.</given-names></name>, <name><surname>Baccino</surname>, <given-names>T.</given-names></name>, &#x26; <name><surname>Bigand</surname>, <given-names>E.</given-names></name></person-group> (<year>2012</year>). <article-title>Sightreading expertise: Cross-modality integration investigated using eye tracking.</article-title> <source>Psychology of Music</source>, <volume>40</volume>(<issue>2</issue>), <fpage>216</fpage>–<lpage>235</lpage>. <pub-id pub-id-type="doi">10.1177/0305735610394710</pub-id><issn>0305-7356</issn></mixed-citation></ref>
<ref id="b8"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><collab>ELAN</collab></person-group>. Version 5.1, published December 22, <year>2017</year>. [computer software]. Nijmegen (The Netherlands): Max Planck Institute for Psycholinguistics [<date-in-citation content-type="access-date">accessed January 14, 2018</date-in-citation>]. Retrieved from: <ext-link ext-link-type="uri" xlink:href="https://tla.mpi.nl/tools/tla-tools/elan/" xlink:show="new">https://tla.mpi.nl/tools/tla-tools/elan/</ext-link></mixed-citation></ref>
<ref id="b9"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Emmorey</surname>, <given-names>K.</given-names></name>, <name><surname>Thompson</surname>, <given-names>R.</given-names></name>, &#x26; <name><surname>Colvin</surname>, <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>Eye gaze during comprehension of American Sign Language by native and beginning signers.</article-title> <source>Journal of Deaf Studies and Deaf Education</source>, <volume>14</volume>(<issue>2</issue>), <fpage>237</fpage>–<lpage>243</lpage>. <pub-id pub-id-type="doi">10.1093/deafed/enn037</pub-id><pub-id pub-id-type="pmid">18832075</pub-id><issn>1081-4159</issn></mixed-citation></ref>
<ref id="b10"><mixed-citation publication-type="conference" specific-use="parsed"><person-group person-group-type="author"><name><surname>Fink</surname>, <given-names>L. K.</given-names></name>, <name><surname>Geng</surname>, <given-names>J. J.</given-names></name>, <name><surname>Hurley</surname>, <given-names>B. K.</given-names></name>, &#x26; <name><surname>Janata</surname>, <given-names>P.</given-names></name></person-group> (<year>2017</year>). <article-title>Predicting attention to auditory rhythms using a linear oscillator model and pupillometry.</article-title> <source>European Conference on Eye Movement</source>, <conf-loc>Wuppertal, Germany</conf-loc>.</mixed-citation></ref>
<ref id="b11"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Gingras</surname>, <given-names>B.</given-names></name>, <name><surname>Marin</surname>, <given-names>M. M.</given-names></name>, <name><surname>Puig-Waldmüller</surname>, <given-names>E.</given-names></name>, &#x26; <name><surname>Fitch</surname>, <given-names>W. T.</given-names></name></person-group> (<year>2015</year>). <article-title>The eye is listening: Music-induced arousal and individual differences predict pupillary responses.</article-title> <source>Frontiers in Human Neuroscience</source>, <volume>9</volume>, <fpage>619</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2015.00619</pub-id><pub-id pub-id-type="pmid">26617511</pub-id><issn>1662-5161</issn></mixed-citation></ref>
<ref id="b12"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Glowinski</surname>, <given-names>D.</given-names></name>, <name><surname>Mancini</surname>, <given-names>M.</given-names></name>, <name><surname>Cowie</surname>, <given-names>R.</given-names></name>, <name><surname>Camurri</surname>, <given-names>A.</given-names></name>, <name><surname>Chiorri</surname>, <given-names>C.</given-names></name>, &#x26; <name><surname>Doherty</surname>, <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>The movements made by performers in a skilled quartet: A distinctive pattern, and the function that it serves.</article-title> <comment>[Gruhn, W., Litt, F., Scherer, A., Schumann, T., Weiß, E.]</comment>. <source>Frontiers in Psychology</source>, <volume>4</volume>, <fpage>841</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2013.00841</pub-id><pub-id pub-id-type="pmid">24312065</pub-id><issn>1664-1078</issn></mixed-citation></ref>
<ref id="b13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gruhn</surname><given-names>W</given-names></name><name><surname>Litt</surname><given-names>F</given-names></name><name><surname>Scherer</surname><given-names>A</given-names></name><name><surname>Schumann</surname><given-names>T</given-names></name><name><surname>Weiss</surname><given-names>EM</given-names></name><name><surname>Gebhardt</surname><given-names>C</given-names></name></person-group><article-title>Suppressing reflexive behaviour:Saccadic eye movements in musicians and non-musicians.</article-title><source>Musicae Scientiae</source><year>2006</year><volume>10</volume><issue>1</issue><fpage>19</fpage><lpage>32</lpage></element-citation></ref>
<ref id="b14"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Hadley</surname>, <given-names>L. V.</given-names></name>, <name><surname>Sturt</surname>, <given-names>P.</given-names></name>, <name><surname>Eerola</surname>, <given-names>T.</given-names></name>, &#x26; <name><surname>Pickering</surname>, <given-names>M. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Incremental comprehension of pitch relationships in written music: Evidence from eye movements.</article-title> <source>Quarterly Journal of Experimental Psychology</source>, <fpage>1</fpage>–<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1080/17470218.2017.1307861</pub-id><pub-id pub-id-type="pmid">28303743</pub-id><issn>1747-0226</issn></mixed-citation></ref>
<ref id="b15"><mixed-citation publication-type="thesis" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Haugen</surname>, <given-names>M. R.</given-names></name></person-group> (<year>2016</year>). Music-Dance. Investigating rhythm structures in Brazilian Samba and Norwegian Telespringar performance (Doctoral dissertation). University of Oslo, Oslo, Norway.</mixed-citation></ref>
<ref id="b16"><mixed-citation publication-type="book" specific-use="restruct"><person-group person-group-type="author"><name><surname>Holmqvist</surname>, <given-names>K.</given-names></name>, <name><surname>Nyström</surname>, <given-names>N.</given-names></name>, <name><surname>Andersson</surname>, <given-names>R.</given-names></name>, <name><surname>Dewhurst</surname>, <given-names>R.</given-names></name>, <name><surname>Jarodzka</surname>, <given-names>H.</given-names></name>, &#x26; <name><surname>Van de Weijer</surname>, <given-names>J.</given-names></name></person-group> (<year>2011</year>). <source>Eye tracking: A comprehensive guide to methods and measures</source>. <publisher-loc>Oxford, UK</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</mixed-citation></ref>
<ref id="b17"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Hosemann</surname>, <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Eye gaze and verb agreement in German Sign Language: A first glance.</article-title> <source>Sign Language and Linguistics</source>, <volume>14</volume>(<issue>1</issue>), <fpage>76</fpage>–<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1075/sll.14.1.05hos</pub-id><issn>1387-9316</issn></mixed-citation></ref>
<ref id="b18"><mixed-citation publication-type="book-chapter" specific-use="restruct"><person-group person-group-type="author"><name><surname>Jantunen</surname>, <given-names>T.</given-names></name></person-group> (<year>2012</year>). <chapter-title>Acceleration peaks and sonority in Finnish Sign Language syllables</chapter-title>. In <person-group person-group-type="editor"><name><given-names>S.</given-names> <surname>Parker</surname></name> (<role>Ed.</role>),</person-group> <source>The Sonority Controversy. Phonetics and Phonology 18</source> (pp. <fpage>347</fpage>–<lpage>381</lpage>). <publisher-loc>Berlin</publisher-loc>: <publisher-name>Mouton De Gruyter</publisher-name>. <pub-id pub-id-type="doi">10.1515/9783110261523.347</pub-id></mixed-citation></ref>
<ref id="b19"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Jantunen</surname>, <given-names>T.</given-names></name></person-group> (<year>2013</year>). <article-title>Signs and transitions: Do they differ phonetically and does it matter?</article-title> <source>Sign Language Studies</source>, <volume>13</volume>(<issue>2</issue>), <fpage>211</fpage>–<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1353/sls.2013.0004</pub-id><issn>0302-1475</issn></mixed-citation></ref>
<ref id="b20"><mixed-citation publication-type="web-page" specific-use="unparsed">Lab streaming layer (LSL) [assessed 2017, September 19]. Retrieved from <ext-link ext-link-type="uri" xlink:href="https://github.com/sccn/labstreaminglayer" xlink:show="new">https://github.com/sccn/labstreaminglayer</ext-link></mixed-citation></ref>
<ref id="b21"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Laeng</surname>, <given-names>B.</given-names></name>, <name><surname>Eidet</surname>, <given-names>L. M.</given-names></name>, <name><surname>Sulutvedt</surname>, <given-names>U.</given-names></name>, &#x26; <name><surname>Panksepp</surname>, <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Music chills: The eye pupil as a mirror to music’s soul.</article-title> <source>Consciousness and Cognition</source>, <volume>44</volume>, <fpage>161</fpage>–<lpage>178</lpage>. <pub-id pub-id-type="doi">10.1016/j.concog.2016.07.009</pub-id><pub-id pub-id-type="pmid">27500655</pub-id><issn>1053-8100</issn></mixed-citation></ref>
<ref id="b22"><mixed-citation publication-type="book" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Marandola</surname>, <given-names>F.</given-names></name></person-group> (<year>2017</year>). <source>Eye-Hand synchronization and interpersonal interaction in xylophone performance: A comparison between African and Western percussionists.</source> Conference of Music &#x26; Eye-Tracking, Frankfurt, Germany.</mixed-citation></ref>
<ref id="b23"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Mauk</surname>, <given-names>C. E.</given-names></name>, &#x26; <name><surname>Tyrone</surname>, <given-names>M. E.</given-names></name></person-group> (<year>2012</year>). <article-title>Location in ASL: Insights from phonetic variation.</article-title> <source>Sign Language and Linguistics</source>, <volume>15</volume>(<issue>1</issue>), <fpage>128</fpage>–<lpage>146</lpage>. <pub-id pub-id-type="doi">10.1075/sll.15.1.06mau</pub-id><pub-id pub-id-type="pmid">26478715</pub-id><issn>1387-9316</issn></mixed-citation></ref>
<ref id="b24"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><collab>McDougal</collab></person-group>, T. / Peter Ward_1 (<year>2015</year>). Timecode versus sync: how they differ and why it matters [assessed February 10, 2018]. Retrieved from <ext-link ext-link-type="uri" xlink:href="https://www.bhphotovideo.com/explora/video/tipsand-solutions/timecode-versus-sync-how-they-differand-why-it-matters" xlink:show="new">https://www.bhphotovideo.com/explora/video/tipsand-solutions/timecode-versus-sync-how-they-differand-why-it-matters</ext-link></mixed-citation></ref>
<ref id="b25"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Muir</surname>, <given-names>L. J.</given-names></name>, &#x26; <name><surname>Richardson</surname>, <given-names>I. E. G.</given-names></name></person-group> (<year>2005</year>). <article-title>Perception of sign language and its application to visual communications for deaf people.</article-title> <source>Journal of Deaf Studies and Deaf Education</source>, <volume>10</volume>(<issue>4</issue>), <fpage>390</fpage>–<lpage>401</lpage>. <pub-id pub-id-type="doi">10.1093/deafed/eni037</pub-id><pub-id pub-id-type="pmid">16000689</pub-id><issn>1081-4159</issn></mixed-citation></ref>
<ref id="b26"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Naveda</surname>, <given-names>L.</given-names></name>, &#x26; <name><surname>Leman</surname>, <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>The spatiotemporal representation of dance and music gestures using topological gesture analysis (TGA).</article-title> <source>Music Perception</source>, <volume>28</volume>(<issue>1</issue>), <fpage>93</fpage>–<lpage>112</lpage>. <pub-id pub-id-type="doi">10.1525/mp.2010.28.1.93</pub-id><issn>0730-7829</issn></mixed-citation></ref>
<ref id="b27"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Penttinen</surname>, <given-names>M.</given-names></name>, <name><surname>Huovinen</surname>, <given-names>E.</given-names></name>, &#x26; <name><surname>Ylitalo</surname>, <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Silent music reading: Amateur musicians’ visual processing and descriptive skill.</article-title> <source>Musicae Scientiae</source>, <volume>17</volume>(<issue>2</issue>), <fpage>198</fpage>–<lpage>216</lpage>. <pub-id pub-id-type="doi">10.1177/1029864912474288</pub-id><issn>1029-8649</issn></mixed-citation></ref>
<ref id="b28"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Penttinen</surname>, <given-names>M.</given-names></name>, <name><surname>Huovinen</surname>, <given-names>E.</given-names></name>, &#x26; <name><surname>Ylitalo</surname>, <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Reading ahead: Adult music students’ eye movements in temporally controlled performances of a children’s song.</article-title> <source>International Journal of Music Education: Research</source>, <volume>33</volume>(<issue>1</issue>), <fpage>36</fpage>–<lpage>50</lpage>. <pub-id pub-id-type="doi">10.1177/0255761413515813</pub-id></mixed-citation></ref>
<ref id="b29"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Puupponen</surname>, <given-names>A.</given-names></name>, <name><surname>Wainio</surname>, <given-names>T.</given-names></name>, <name><surname>Burger</surname>, <given-names>B.</given-names></name>, &#x26; <name><surname>Jantunen</surname>, <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>Head movements in Finnish Sign Language on the basis of motion capture data: A study of the form and function of nods, nodding, head thrusts, and head pulls.</article-title> <source>Sign Language and Linguistics</source>, <volume>18</volume>(<issue>1</issue>), <fpage>41</fpage>–<lpage>89</lpage>. <pub-id pub-id-type="doi">10.1075/sll.18.1.02puu</pub-id><issn>1387-9316</issn></mixed-citation></ref>
<ref id="b30"><mixed-citation publication-type="book" specific-use="restruct"><person-group person-group-type="author"><name><surname>Robertson</surname>, <given-names>D. G. E.</given-names></name>, <name><surname>Caldwell</surname>, <given-names>G. E.</given-names></name>, <name><surname>Hamill</surname>, <given-names>J.</given-names></name>, <name><surname>Kamen</surname>, <given-names>G.</given-names></name>, &#x26; <name><surname>Whittlesey</surname>, <given-names>S. N.</given-names></name></person-group> (<year>2004</year>). <source>Research methods in biomechanics</source>. <publisher-loc>Champaign, IL/Leeds, UK</publisher-loc>: <publisher-name>Human Kinetics</publisher-name>.</mixed-citation></ref>
<ref id="b31"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Su</surname>, <given-names>Y. H.</given-names></name>, &#x26; <name><surname>Keller</surname>, <given-names>P. E.</given-names></name></person-group> (<year>2018</year>). <article-title>Your move or mine? Music training and kinematic compatibility modulate synchronization with self- versus other-generated dance movement.</article-title> <source>Psychological Research</source>. <pub-id pub-id-type="doi">10.1007/s00426-018-0987-6</pub-id><pub-id pub-id-type="pmid">29380047</pub-id><issn>0340-0727</issn></mixed-citation></ref>
<ref id="b32"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Thompson</surname>, <given-names>M. R.</given-names></name>, &#x26; <name><surname>Luck</surname>, <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>Exploring relationships between pianists’ body movements, their expressive intentions, and structural elements of the music.</article-title> <source>Musicae Scientiae</source>, <volume>16</volume>(<issue>1</issue>), <fpage>19</fpage>–<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1177/1029864911423457</pub-id><issn>1029-8649</issn></mixed-citation></ref>
<ref id="b33"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Thompson</surname>, <given-names>R.</given-names></name>, <name><surname>Emmorey</surname>, <given-names>K.</given-names></name>, &#x26; <name><surname>Kluender</surname>, <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>The relationship between eye gaze and verb agreement in American sign Language: An eye-tracking study.</article-title> <source>Natural Language and Linguistic Theory</source>, <volume>24</volume>(<issue>2</issue>), <fpage>571</fpage>–<lpage>604</lpage>. <pub-id pub-id-type="doi">10.1007/s11049-005-1829-y</pub-id><issn>0167-806X</issn></mixed-citation></ref>
<ref id="b34"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Thompson</surname>, <given-names>R.</given-names></name>, <name><surname>Emmorey</surname>, <given-names>K.</given-names></name>, &#x26; <name><surname>Kluender</surname>, <given-names>R.</given-names></name></person-group> (<year>2009</year>). <article-title>Learning to look: The acquisition of eye-gaze agreement during the production of ASL verbs. Bilingualism</article-title>. <source>Cognition and Language</source>, <volume>12</volume>(<issue>4</issue>), <fpage>393</fpage>–<lpage>409</lpage>. <pub-id pub-id-type="doi">10.1017/S1366728909990277</pub-id></mixed-citation></ref>
<ref id="b35"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Tyrone</surname>, <given-names>M. E.</given-names></name>, &#x26; <name><surname>Mauk</surname>, <given-names>C. E.</given-names></name></person-group> (<year>2010</year>). <article-title>Sign lowering and phonetic reduction in American Sign Language.</article-title> <source>Journal of Phonetics</source>, <volume>38</volume>(<issue>2</issue>), <fpage>317</fpage>–<lpage>328</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2010.02.003</pub-id><pub-id pub-id-type="pmid">20607146</pub-id><issn>0095-4470</issn></mixed-citation></ref>
<ref id="b36"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Van Dyck</surname>, <given-names>E.</given-names></name>, <name><surname>Moelants</surname>, <given-names>D.</given-names></name>, <name><surname>Demey</surname>, <given-names>M.</given-names></name>, <name><surname>Deweppe</surname>, <given-names>A.</given-names></name>, <name><surname>Coussement</surname>, <given-names>P.</given-names></name>, &#x26; <name><surname>Leman</surname>, <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>The impact of the bass drum on human dance movement.</article-title> <source>Music Perception</source>, <volume>30</volume>(<issue>4</issue>), <fpage>349</fpage>–<lpage>359</lpage>. <pub-id pub-id-type="doi">10.1525/mp.2013.30.4.349</pub-id><issn>0730-7829</issn></mixed-citation></ref>
<ref id="b37"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Van Zijl</surname>, <given-names>A. G. W.</given-names></name>, &#x26; <name><surname>Luck</surname>, <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Moved through music: The effect of experienced emotions on performers’ movement characteristics.</article-title> <source>Psychology of Music</source>, <volume>41</volume>(<issue>2</issue>), <fpage>175</fpage>–<lpage>197</lpage>. <pub-id pub-id-type="doi">10.1177/0305735612458334</pub-id><issn>0305-7356</issn></mixed-citation></ref>
<ref id="b38"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Vuoskoski</surname>, <given-names>J. K.</given-names></name>, <name><surname>Thompson</surname>, <given-names>M. R.</given-names></name>, <name><surname>Clarke</surname>, <given-names>E. F.</given-names></name>, &#x26; <name><surname>Spence</surname>, <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>Crossmodal interactions in the perception of expressivity in musical performance.</article-title> <source>Attention, Perception &#x26; Psychophysics</source>, <volume>76</volume>(<issue>2</issue>), <fpage>591</fpage>–<lpage>604</lpage>. <pub-id pub-id-type="doi">10.3758/s13414-013-0582-2</pub-id><pub-id pub-id-type="pmid">24233641</pub-id><issn>1943-3921</issn></mixed-citation></ref>
<ref id="b39"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><name><surname>Wehrmeyer</surname>, <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Eye-tracking deaf and hearing viewing of sign language interpreted news broadcasts.</article-title> Journal of Eye Movement Research, 7(1):3, 1-16.</mixed-citation></ref>
<ref id="b40"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Wierts</surname>, <given-names>R.</given-names></name>, <name><surname>Janssen</surname>, <given-names>M. J. A.</given-names></name>, &#x26; <name><surname>Kingma</surname>, <given-names>H.</given-names></name></person-group> (<year>2008</year>). <article-title>Measuring saccade peak velocity using a low-frequency sampling rate of 50 Hz.</article-title> <source>IEEE Transactions on Biomedical Engineering</source>, <volume>55</volume>(<issue>12</issue>), <fpage>2840</fpage>–<lpage>2842</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2008.925290</pub-id><pub-id pub-id-type="pmid">19126467</pub-id><issn>0018-9294</issn></mixed-citation></ref>
<ref id="b41"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Wilbur</surname>, <given-names>R. B.</given-names></name></person-group> (<year>1990</year>). <article-title>An experimental investigation of stressed sign production.</article-title> <source>International Journal of Sign Linguistics</source>, <volume>1</volume>(<issue>1</issue>), <fpage>41</fpage>–<lpage>60</lpage>.</mixed-citation></ref>
<ref id="b42"><mixed-citation publication-type="book" specific-use="restruct"><person-group person-group-type="author"><name><surname>Wilcox</surname>, <given-names>S.</given-names></name></person-group> (<year>1992</year>). <source>The phonetics of fingerspelling</source>. <publisher-loc>Amsterdam, NL</publisher-loc>: <publisher-name>John Benjamins Publishing</publisher-name>. <pub-id pub-id-type="doi">10.1075/sspcl.4</pub-id></mixed-citation></ref>
<ref id="b43"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><name><surname>Woolhouse</surname>, <given-names>M. H.</given-names></name>, &#x26; <name><surname>Lai</surname>, <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Traces across the body: Influence of music-dance synchrony on the observation of dance.</article-title> <source>Frontiers in Human Neuroscience</source>, <volume>8</volume>, <fpage>965</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2014.00965</pub-id><pub-id pub-id-type="pmid">25520641</pub-id><issn>1662-5161</issn></mixed-citation></ref>
</ref-list>
</back>
</article>
