<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">

<article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
 <front>
    <journal-meta>
	<journal-id journal-id-type="publisher-id">Jemr</journal-id>
      <journal-title-group>
        <journal-title>Journal of Eye Movement Research</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1995-8692</issn>
	  <publisher>								
	  <publisher-name>Bern Open Publishing</publisher-name>
	  <publisher-loc>Bern, Switzerland</publisher-loc>
	</publisher>
    </journal-meta>
    <article-meta>
	<article-id pub-id-type="doi">10.16910/jemr.10.5.4</article-id> 
	  <article-categories>								
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
		</article-categories>
      <title-group>
        <article-title>Visual Analytics of Gaze Data with Standard Multimedia Players</article-title>
      </title-group>
	   <contrib-group> 
				<contrib contrib-type="author">
					<name>
						<surname>Sch&#xF6;ning</surname>
						<given-names>Julius</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Gundler</surname>
						<given-names>Christopher</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Heidemann</surname>
						<given-names>Gunther</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>K&#xF6;nig</surname>
						<given-names>Peter</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Krumnack</surname>
						<given-names>Ulf</given-names>
					</name>
					<xref ref-type="aff" rid="aff1">1</xref>
				</contrib>				
        <aff id="aff1">
		<institution>Institute of Cognitive Science, Osnabr&#xFC;ck University</institution>,   <country>Germany</country>
        </aff>
		</contrib-group>
     
	  <pub-date date-type="pub" publication-format="electronic"> 
		<day>20</day>  
		<month>11</month>
        <year>2017</year>
      </pub-date>
	  <pub-date date-type="collection" publication-format="electronic"> 
	  <year>2017</year>
	</pub-date>
      <volume>10</volume>
      <issue>5</issue>
	<elocation-id>10.16910/jemr.10.5.4</elocation-id>
	<permissions> 
	<copyright-year>2017</copyright-year>
	<copyright-holder>Sch&#xF6;ning, Gundler, Heidemann, K&#xF6;nig and Krumnack</copyright-holder>
	<license license-type="open-access">
  <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License, 
  (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">
    https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p>
</license>
	</permissions>
      <abstract>
        <p>With the increasing number of studies, where participants' eye movements are tracked while watching videos, the volume of gaze data records is growing tremendously. Unfortunately, in most cases, such data are collected in separate files in custom-made or proprietary data formats. These data are difficult to access even for experts and effectively inaccessible for non-experts. Normally expensive or custom-made software is necessary for their analysis. We address this problem by using existing multimedia container formats for distributing and archiving eye-tracking and gaze data bundled with the stimuli data. We define an exchange format that can be interpreted by standard multimedia players and can be streamed via the Internet. We convert several gaze data sets into our format, demonstrating the feasibility of our approach and allowing to visualize these data with standard multimedia players. We also introduce two VLC player add-ons, allowing for further visual analytics. We discuss the benefit of gaze data in a multimedia container and explain possible visual analytics approaches based on our implementations, converted datasets, and first user interviews.</p>
      </abstract>
      <kwd-group>
        <kwd>eye-tracking</kwd>
        <kwd>gaze data</kwd>
        <kwd>visualization</kwd>
        <kwd>visual analytics</kwd>
        <kwd>multimedia container</kwd>
      </kwd-group>
    </article-meta>
  </front>
       
  <body>

    <sec id="S1">
      <title>Introduction</title>
	  
      <p>It is still common practice to store gaze information
belonging to video stimuli next to the video file in custom
file formats. In addition to proprietary formats, where the
data structure of the gaze information is unknown, the
data structures are strongly customized&#x2014;sometimes
unique&#x2014;and stored in a diversity of formats, e.g., plain
text, XML, Matlab MAT format, or even binary. As a
consequence, special tools for accessing, visualizing and
analyzing these data and stimuli are needed. For the
general audience, specialized software tools, which are
expensive or require compilation, are a major obstacle and
often prohibit accessing the data.</p>

      <p>An open and standardized exchange format will
overcome this problem and will form the basis for
visualization and visual analytics (VA) software. Thus, why not
encapsulate gaze data alongside the stimuli material in a
joint container? For storing text plus metadata, this has
become common practice&#x2014;a well-known example is the
PDF container. For the encapsulation of multimedia
content comprising video, audio and metadata, such as
subtitles, audio comments, and interactive features,
multimedia container formats were specified. Examples are the
open container formats (
        <xref ref-type="bibr" rid="b30">30</xref>
        ) (OGG), MPEG-4 (
        <xref ref-type="bibr" rid="b10">10</xref>
        ), or the
Matro&#x161;ka container (
        <xref ref-type="bibr" rid="b12">12</xref>
        ) (MKV). Housing multimedia
content in a single file, which can be easily archived, has
fixed and validated synchronization of different data
streams, can be played by standard multimedia players,
and can be streamed via the Internet. If gaze data or other
participant-related data, such as EEG traces, are
encapsulated in such container formats, the accessibility will be
improved for both experts and the general audience, in
scientific publications, or in popular science video
demonstrations<xref ref-type="fn" rid="FN1">1</xref>.</p>


      <p>We present concepts and first software tools for
creating, analyzing, and decomposing multimedia containers
of eye-tracking research data. Our aim is that video
eyetracking data can be stored in common multimedia
containers, carrying (i) the video stimuli, (ii) the gaze
trajectories of multiple participants, and (iii) other
videorelated data. With a strong focus on gaze data, we
evaluate current multimedia containers that support a variety of
video, audio, and subtitle data formats. We want to find
out if such containers can provide instantaneous
visualization and get VA support in standard media players. By
contributing our code extensions to official development
repositories, our long-term aim is to establish a standard
format. Based on that format, research data could be
encoded uniquely and will benefit various fields, ranging
from training neuronal networks based on human sensory
data over highlighting objects in movies for visually
impaired people to creating auditory displays for blind
people. Further, VA on gaze data may experience a boost
once a standard format exists that allows sharing data and
results, and combining gaze data with other metadata.</p>

      <p>Building on previous work on merging video and
eyetracking data (
        <xref ref-type="bibr" rid="b19">19</xref>
        ), we focus on the following aspects in the
present paper: i) a more detailed discussion of feasible
data formats for metadata description, timed data
representation, and multimedia encapsulation (cf. Section Data
Formats and Suitable Data Formats for Gaze Data), ii)
the description and implementation of software tools for
creating and decomposing multimedia containers with
gaze data as well as our first two add-ons for VA on gaze
data using the VLC media player (cf. Section Software
Tools for Gaze Data in Multimedia Containers), and iii) a
discussion about benefits of our approaches highlighted
in user interviews, and a list of available data sets (cf.
Section User Study). Based on a comparison of available
container formats suitable for housing sensory and other
relevant data, we conclude by discussing possible future
work in the field of VA and computer vision, where gaze
data in multimedia containers have a strong impact.</p>
    </sec>
	
    <sec id="S2">
      <title>Data Formats</title>
	  
      <p>By listing available video eye-tracking data sets (
        <xref ref-type="bibr" rid="b27">27</xref>
		, cf. Section 2.2), the diversity of formats, they are stored in,
e.g., plain text, XML, Matlab MAT format, or even binary,
is quite impressive. Any of these formats provides
instantaneous visualizations, and the stimulus sequence can be
streamed together with metadata, like gaze trajectories
and object annotations. In contrast, the domains of DVD,
Blu-ray, or video compression can provide formats with
efficient mechanisms for visualization and streamability
via the Internet. Therefore, it is reasonable to evaluate the
formats used in these domains and analyze whether one
of these fits the requirements for storing research data.
These requirements include e.g. storing the stimuli, as
well as all relevant metadata, providing opportunities for
visualization, and, if possible, VA within a standard
multimedia player.</p>

      <sec id="S2a">
        <title>Metadata Formats</title>
		
        <p>Computer vision algorithms still have great
difficulties gaining semantic content from video sequences.
Hence, VA of gaze trajectories of subjects performing,
e.g., a visual search task, may provide insights for
designing better algorithms. Gaze trajectories might also serve
as training data for deep learning approaches if the data
representations of several data sets are compatible.
However, combining, extending, exchanging, and analyzing
gaze data from different repositories is tedious,
timeconsuming, as well as inefficient, due to the lack of a
standardized format.</p>

        <p>With the rise of the semantic web, standards for
metadata were introduced, which have become quite
popular. The most general standard for expressing
metadata is the resource description framework (RDF)
(
          <xref ref-type="bibr" rid="b25">25</xref>
          ). Its well-defined formal syntax and distributed
representation allow statements about resources, i.e., virtually
everything that can be uniquely described. The sparse
predefined vocabulary of RDF requires extension if RDF
is used in a new domain. By now, several vocabulary
schemes for various domains exist, but there is, to our
knowledge, no scheme for video content. Further, the
temporal and spatial structure of video sequences differs
from the majority of vocabularies. Therefore, a new
vocabulary scheme has to be defined for fitting eye-tracking
data set.</p>

        <p>For annotating time-continuous data with different
types of metadata, the Continuous Media Markup
Language (CMML) (
          <xref ref-type="bibr" rid="b16">16</xref>
          ) was developed. Similar to HTML
description, it provides markup facilities for timed
objects, which are ideal for the integration of multimedia
content on the Internet. In a predefined vocabulary, CMML
provides textual descriptions, hyperlinks, images (e.g.,
keyframes), and&#x2014;in the form of value pairs&#x2014;other
unspecified metadata. By using this vocabulary, temporal
metadata of video and audio sequences can be
represented. Unfortunately, the spatial structure, i.e., reference to
pixels, regions, and areas is not provided by CMML.</p>

        <p>Probably the most advanced metadata framework for
multimedia content is MPEG-7, which has been developed
by the Moving Picture Experts Group. This Multimedia
content description interface defined in the ISO/IEC
14496-3 standard, specifies a set of mechanisms for
describing as many types of multimedia information as
possible. However, this specification has been criticized
and has lost support because of its lack of formal
semantics. This causes ambiguity, leading to interoperability
problems, and hinders a widespread implementation and
a standard interpreter (
          <xref ref-type="bibr" rid="b14 b2">14, 2</xref>
          ).</p>
		  
        <p>Nevertheless, MPEG-7 can describe multimedia
content at different degrees of abstraction and is designed as
an extensible format. Using its own &#x201C;Description
Definition Language&#x201D; (DDL &#x2013; an extended form of XML
Schema), its defines connected vocabularies on all levels of
abstraction. The basic structure of the vocabulary is
focused on GRID LAYOUT}, TIME SERIES, MULTIPLE VIEW,
SPATIAL 2D COORDINATES, and TEMPORAL
INTERPOLATION. For object labeling, as well as continuous gaze
trajectories, the feature TEMPORAL INTERPOLATION could
be useful. Using connected polynomials, it allows
temporal interpolation of metadata over time and forms the
basis of the SPATIO-TEMPORAL LOCATOR which can be
used for describing, e.g., the bounding box of an object in
the complete video sequence.</p>
      </sec>
	  
      <sec id="S2b">
        <title>Timed Text</title>
		
        <p>Like a video transcript, timed text (
          <xref ref-type="bibr" rid="b26">26</xref>
          ) assigns text
labels to certain time intervals based on time-stamps and is
typically used for subtitling of movies, as well as
captioning for hearing impaired or people lacking audio devices.
The simplest grammar of timed text consists of a time
interval (start and end time) and the text to be displayed.
The popular SubRip format (SRT) sticks to that grammar
only. Enhancing this temporal grammar with spatial
features, the second edition of the Timed Text Markup
Language 1 (TTML1) (
          <xref ref-type="bibr" rid="b24">24</xref>
          ) introduces a vocabulary for
regions. Nowadays, the amount of variation within timed
text formats makes them hardly interoperable. Therefore,
a large set of decoders and rendering frameworks have
emerged for supporting the compatibility of media
players.</p>
      </sec>
	  
      <sec id="S2c">
        <title>Subtitle and Caption Formats</title>
		
        <p>With the objective of providing a modern format for
encoding subtitles, the open universal subtitle format
(USF) was specified (
          <xref ref-type="bibr" rid="b15">15</xref>
          ). For being comprehensive, it tries
to form the superset of features from different existing
subtitle formats. However, in contrast to existing subtitle
formats, it provides a XML-based representation which is
flexible, human readable, portable, Unicode compatible,
and hierarchical structured and further it provides an easy
management of entries. Designed for being rendered by
the multimedia players, it is possible to adapt the display
style (color, size, position, etc.) to fit the needs of the
viewer. In contrast to pre-rendered subtitles, like the
VobSub format, this gives the opportunity for adjusting
the visualization of the metadata according to the analysis
task. However, not every multimedia player supports the
rendering of subtitles during playback.</p>

        <p>Regrettably, the open source USF project has become
private and its community has lost all influence on the
development. In addition, the documentation of the latest
version 1.1 has been temporary removed from the
Internet repository. In this working version 1.1, some parts,
including the draw commands, were marked as under
development, but in addition to visualization of metadata
on top of the video sequence, USF provides comment tags
to allow storing additional information that cannot be
visualized.</p>

        <p>Together with a subtitle editor, the Sub Station Alpha
(SSA) has been introduced and is being widely used by
fansubbers. Because of its widespread use, the support of
SSA and its extended version V4.00+ (
          <xref ref-type="bibr" rid="b22">22</xref>
          ), known as
Advanced SSA (ASS), is integrated in almost every
multimedia player. In addition to SSA, ASS includes simple
drawing commands like straight lines, 3<sup>rd</sup> degree Bezier
curves, and 3<sup>rd</sup> degree uniform b-splines, which are
probably sufficient to visualize gaze points, objects
annotation, etc. on a certain frame. The latest version of today&#x2019;s
video players support the ASS drawing commands.</p>
      </sec>
	  
      <sec id="S2d">
        <title>Multimedia Container</title>
		
        <p>Multimedia containers are capable of housing any
multimedia content in a single container, which then can
be used for storing, distributing, broadcasting, and
archiving. Usually, multimedia containers have multiple
parallel tracks such as video, audio, and subtitle tracks
encoded by their format standards. Unlike classical data
archives such as ZIP or TAR, multimedia containers take the
temporal nature of the tracks&#x2014;their payload&#x2014;into
account. As illustrated in Figure 1, the temporal payload is
synchronized to support streaming and playback without
decomposition of the container. To implement these
features, the muxer, which creates the multimedia
container from the data, must be able to understand, as well
as encode, the data into the temporary structure of the
container. Thus, there is a strong dependency between
video, audio, subtitle, etc. data formats and container
formats. In summary, not every payload is suited for
every container format.</p>

        <p>Common multimedia containers used for DVDs are
the VOB and EVO formats, which are based on the
MPEGPS standard. Specified in MPEG-4 (ISO/IEC 14496, Part
14) (
          <xref ref-type="bibr" rid="b10">10</xref>
          ), the more modern MP4 container format was
specified to hold video, audio, and timed text data. In
accordance with the other standards of MPEG, MP4
encapsulates only video, audio, and subtitle formats specified by
the MPEG.</p>

        <p>Originally put forward by the non-profit Xiph.Org (
          <xref ref-type="bibr" rid="b30">30</xref>
          )
foundation, the free OGG container format is designed for
streaming Vorbis encoded audio files and is nowadays
supported by many portable devices. Through the launch
of Theora and Dirac video formats, OGG has become a
more and more popular format for streaming web
multimedia content.</p>

        <p>With the aim to allow an easy inclusion of different
types of payloads, the open Matro&#x161;ka container format
(MKV) (
          <xref ref-type="bibr" rid="b12">12</xref>
          ) is a flexible and widely used container. Also, it
serves as a basis for the WEBM format, which is being
pushed forward to establish a standard for multimedia
content on the web.</p>

        <p>The inclusion of timed metadata that goes beyond
subtitle formats into multimedia containers seems to be
discussed only sporadically. Embedding CMML
descriptions in an OGG container is the only approach discussed
(
          <xref ref-type="bibr" rid="b29">29</xref>
          ). However, the lack of CMML for supporting spatial
features like region markup and object boundaries will
rule out its use in present standard multimedia players.</p>

        <p>For metadata and stimuli, probably the most matured
specified approach is the embedding of MPEG-7 (
          <xref ref-type="bibr" rid="b9">9</xref>
          ) data
into MP4 (
          <xref ref-type="bibr" rid="b10">10</xref>
          ) containers. Nonetheless, there are to our
knowledge still no multimedia players with MPEG-7
support.</p>
      </sec>
      </sec>	  

    <sec id="S3">
      <title>Suitable Data Formats for Gaze Data</title>

        <p>What kind of data format would be best suited for
gaze trajectories of several subjects, the video stimulus
itself, and optional metadata like object annotations? In
the previous section, we gave a brief overview of the
diversity of formats of metadata and multimedia
containers. Fostered by the advance of the semantic web
and other digital technologies, the diversity keeps
growing. While general formats, like RDF (
          <xref ref-type="bibr" rid="b25">25</xref>
          ), are well
established, they are not geared towards the description
of temporal content like video sequences, and hence miss
the required vocabulary. To avoid the introduction of new
formats, we suggest three approaches to a standard
format for gaze data on video stimuli. The first approach
is the extension of a general metadata formalism like
RDF with video, audio, and eye-tracking, as well gaze
data vocabulary. As the second approach, we use a
welldefined standard for metadata like MPEG-7 along with the
implementation of proper multimedia player extensions.
Third, we present an ad hoc approach that utilizes
existing technologies, e.g., for subtitles or captions, to
store gaze data and object annotations.</p>

        <p>The first approach, i.e., the development of a
specialized eye and gaze tracking format on the basis of a
well-established metadata framework like RDF yields the
obvious advantage that it can build on the vast amount of
software libraries and tools that support storage,
management, exchange, and a certain type of reasoning
over such data. The main drawback is that these formats
lack any form of support for the video, as well as the
audio domain, and do not even provide basic spatial or
temporal concepts. The development of a specialized
vocabulary to describe gaze tracking scenarios would
mean a huge effort. Additionally, it would have to
include the implementation of tools for visualization.
Desirable features, like streamability, do not seem to fit
the original static nature of these data formats well.
Furthermore, given the specificity of the field of
eyetracking, a wide support by common multimedia players
seems unlikely.</p>

        <p>Using a specialized, detailed standard for video
metadata like MPEG-7 is the second possible approach.
MPEG-7 provides a well-defined and established
vocabulary for various annotations, but lacks a vocabulary for
gaze data. Nevertheless, MPEG-7 supports the description
of points or regions of interest in space and time, and
hence, seems well suited to store the essential
information of eye-tracking data. Unfortunately, no standard
media player (like VLC, MediaPlayer, and Windows
Media Player) currently seems to support the
visualization of MPEG-7 video annotations. Hence, one would have
to extend these players to visualize embedded
eyetracking data&#x2014;fortunately, one can build on existing
MPEG-7 libraries (
          <xref ref-type="bibr" rid="b3">3</xref>
          ). We think that, when implementing
such multimedia player extensions, one should aim at a
generic solution that can also be used to visualize other
MPEG-7 annotations, as this would foster development,
distribution, and support.</p>

<fig id="fig01" fig-type="figure" position="float">
					<label>Figure 1</label>
					<caption>
						<p>In general, the structure of a multimedia container is divided into two parts based on the temporal dependencies of the content. Content without temporal structure, like the header, general metadata, as well as attachments are stored before temporal related content such as videos, audios, subtitles, and other metadata. As illustrated, the subtitle tracks can be used for the visualization of gaze data, object annotations, etc. and the audio tracks for sonification of, e.g., EEG data. For watching these multimedia containers via the Internet, only the non-temporal content has to be transmitted before playing, while the temporal content is transmitted during playing.</p>
					</caption>
					<graphic id="graph01" xlink:href="jemr-10-05-d-figure-01.png"/>
				</fig>

        <p>Even though the first two approaches seem better
suited in the long run, they appear neither realizable with
manageable effort nor on a short time scale. Hence, in the
remainder of this paper, we focus on the third approach,
which allows for the quick development of a prototype to
demonstrate the idea, concepts, possibilities, and to gain
experience in its application. The idea is to adopt existing
technologies, like subtitles, captions, audio tracks, and
online links (
          <xref ref-type="bibr" rid="b4">4</xref>
          ), already supported by existing media
players. Although these technologies are not yet geared
towards the storage and presentation of gaze data, all
important features can be realized based on these formats.
Reusing or &#x201C;hijacking&#x201D; formats has the benefit that they
will be widely supported by current multimedia players.
Even though there appears to be no general drawing
framework, some subtitle formats include drawing
commands that allow highlighting regions in a scene. These
commands can also be used for visualizing gaze data or
for auditory display of EEG data. Using a player&#x2019;s
standard methods for displaying subtitles and switching
between different languages, one can then display gaze data
and switch between data of different subjects. Using
player add-ons, simple VA tasks are possible by reusing
tracks of the multimedia container.</p>

        <p>When putting metadata into a reused format, one must
be careful that no information is lost. Additionally, one
should bear in mind that the original format was designed
for a different purpose, so it may not have out-of-the-box
support for desirable features, e.g., simultaneous display
of gaze points of multiple subjects. We chose to use USF
for three main reasons: First, its specification considers
possible methods for drawing shapes, a prerequisite for
instantaneous visualization with a normal multimedia
player. Second, it allows for storing additional data, a
necessity for carrying all eye-tracking data so that expert
visualization with specialized tools is possible from the
same single file. Finally, and most important, USF is&#x2014;
like the preferable MPEG-7&#x2014;a XML-based format and
thereby capable of holding complex data structures.
However, although basic USF is supported by some
existing media players, the drawing commands are not
implemented. We provide an extension for the VLC media
player that implements some drawing commands. Also,
we provide a converter to the ASS format, which is widely
supported, including drawing commands, due to the free
ASS library, thereby allowing out-of-the-box visualization
of eye-tracking data that works with many current
multimedia players. However, its plain text format is too
restricted to hold all desirable information. Both
approaches will be discussed in more detail in the next section.</p>
      </sec>

    <sec id="S4">
      <title>Software Tools for Gaze Data in Multimedia Containers</title>
	  
      <p>Beyond two prototypes of multimedia containers
where we embedded gaze data and object annotation
using two different subtitle formats, we present in this
section our new tool for converting metadata along with
the stimulus video sequence into a MKV container.
Followed by a brief introduction of a tool for decomposing a
multimedia container, we present add-ons to VLC which
allow simple VA on multimodal data&#x2014;here gaze data&#x2014;
using a standard media player.</p>

      <sec id="S4a">
        <title>Container Prototypes</title>
		
        <p>For the encapsulation of metadata, we implement two
kinds of prototypes where the subtitle track is reused as a
carrier of metadata. Our first prototype based on USF
encapsulates the complete gaze metadata without loss and
can be visualized in a modified version of the VLC media
player. Based on the ASS format, the second one is only
enabled to carry selected metadata for visualization, but
these visualizations can be displayed by current media
players, as well as standalone DVD players.</p>

        <p><bold>Metadata as USF.</bold> In order to use USF for
encapsulating eye-tracking data, we analyzed which features of USF
are available in the latest releases of common multimedia
players. One of the most common media players is the
VLC media player. The current developer version 3.0.0
already supports a variety of USF attributes, which are
text, image, karaoke, and comment. The latest USF
specification introduces an additional attribute shape that is
still marked as under development, although this
specification is already quite old. Since gaze data is commonly
visualized with simple geometric shapes, like circles and
rectangles, the use of the shape attributes for
instantaneous gaze visualization of subjects seems to be quite
appropriate.</p>

        <p>Since the exact specification of the shape attribute is,
as mentioned above, not complete, we particularized it on
rectangles, polygons, and points, as illustrated in Listing
1. These simple geometric shapes were taken as first
components in order to visualize a multitude of different
types of elements. Point-like visualizations are useful to
describe locations without area information, e.g., for gaze
point analysis in eye-tracking studies. Rectangles are
most commonly used as a bounding box for object of
interest annotations, whereas polygons provide a more
specific, but complex way of describing the contour of an
object. Further, as can be seen in Figure 9, one can use
different geometric shapes to differentiate between, e.g.,
saccades marked by circles and fixations marked by
rectangles.</p>

<fig id="fig10" fig-type="figure" position="float">
					<label>Listing 1:</label>
					<caption>
						<p>Section of the USF specification (<xref ref-type="bibr" rid="b15">15</xref>), * marked attributes are added to the specification and implemented in our altered VLC player.</p>
					</caption>
					<graphic id="graph10" xlink:href="jemr-10-05-d-figure-10.png"/>
				</fig>

		
        <p>The so-called subsusf codec module of the VLC
player, handles the visualization of all USF content. In detail,
this codec module receives streams of the subtitle data for
the current frame from the demuxer of VLC and renders a
frame overlay at runtime. We extended this module with
additional parsing capabilities for our specified shape
data, which is then drawn into the so-called subpictures
and passed on to the actual renderer of VLC. Since the
thread will be called for every frame, the implementation
is time-critical, and we decided to use the fast
rasterization algorithms of Bresenham (
          <xref ref-type="bibr" rid="b5">5</xref>
          ). Additionally, we
added an option to fill the shapes, which is implemented
with the scan line algorithm (
          <xref ref-type="bibr" rid="b28">28</xref>
          ). In order to ensure that
our enhancements of the subsusf module will be available
within the next VLC, we submitted our changes to the
official VLC 3.0.0 developer repository and suggested an
ongoing development of the USF. In Figures 2, 7, and 9,
instantaneous visualization of gaze data by the VLC
player can be seen.</p>

<fig id="fig02" fig-type="figure" position="float">
					<label>Figure 2.</label>
					<caption>
						<p>Instantaneous visualization of gaze data, here shown as red dots: (a) visualization of gaze data on a certain frame with VLC player—(b) like normal subtitles one can easily change between the gaze data of subjects—(c) exemplary gaze data on another video sequence.</p>
					</caption>
					<graphic id="graph02" xlink:href="jemr-10-05-d-figure-02.png"/>
				</fig>

        <p>For testing all features of our implementation, we
created a MKV container, containing a dummy video file, as
well as USF files with all supported attributes. VLC can
open these containers and yields the desired visualization
of geometric object annotations (
          <xref ref-type="bibr" rid="b20">20</xref>
          ). This proves our
concept that the incorporation of metadata into USF is
possible and that streaming, using MKV as a container
format, is also possible, since both content and metadata
are integrated as temporal payload. Opening the container
in a released version of VLC (2.2.4) without the
additional adjustments in the subsusf module will not conflict
with the normal video playback, but will not visualize the
incorporated annotations, either.</p>

        <p><bold>Metadata as ASS.</bold> Since the USF-based prototype
requires a modified version of the VLC media player, a
broad audience is still excluded from watching the gaze
data set without that player. Therefore, we provide a
second prototype based on ASS, a subtitle format with
drawing commands, that is widely supported thanks to
the free ASS library, and which is also supported by many
standalone DVD players. In contrast to USF, the ASS
subtitle format cannot carry all of the desired metadata, as it is
not capable of representing complex data structures. Due
to its design, it is only capable of embedding content that
can be visualized, thus, any other content will be lost by
the conversion from, e.g., gaze data to ASS.</p>

        <p>For generating the lossy ASS files from lossless USF
files, a simple translation stylesheet using XSLT
(Extensible Stylesheet Language Transformations) has been
written. After the conversion, a MKV container is created
including the video and one ASS track for each subject.
The resulting container makes metadata accessible for a
broad audience, as the ASS visualization can be displayed
by many video players without the need of modification.</p>
      </sec>
	  
      <sec id="S4b">
        <title>Muxing Multimedia Containers</title>
		
        <p>At the early stage of this project, we developed an
open source command line tool that converts CSV eye
tracking and gaze data files to USF files of several
subjects, which were then converted to ASS and encapsulated
together with the original video stimuli in a single MKV
file. Using this scriptable command line tool, we have
already converted 319 gaze tracking data sets (
          <xref ref-type="bibr" rid="b17 b11 b6 b1">17, 11, 6, 1</xref>
          )
for testing, as well as for highlighting the potential of our
approach<xref ref-type="fn" rid="FN1">1</xref>.</p>

        <p>Unfortunately, the diversity of available gaze data
formats requires a huge number of options which must be
set for running this command line tool, so that adapting it
to another gaze data format takes at least one hour.
Therefore, we started to develop a graphical tool for
converting gaze data to USF and ASS. Therefrom, these
converted files and the stimuli will be mux into MKV
containers. As seen in Figure 3, the user interface (UI)
implemented in QT is divided into four functional blocks:
input file selection, the selection of rows one wants to
visualize, the video (stimulus) selection, and the output
file naming dialog.</p>

<fig id="fig03" fig-type="figure" position="float">
					<label>Figure 3.</label>
					<caption>
						<p>Demuxing a MKV file, an option for experts to extract the raw data for research. In the track select, one can see the tracks for instantaneous visualization (video and subtitle tracks), sonification (audio tracks), and the attachment (general metadata) carrying the raw data.</p>
					</caption>
					<graphic id="graph03" xlink:href="jemr-10-05-d-figure-03.png"/>
				</fig>

        <p>After pressing the button &#x201C;Input Files&#x201D;, a file dialog
appears in which the user can select one or several gaze
files in the same format. All selected files will be listed
next to the button. In the next step, the user has to
indicate how many lines of the header are in front of the gaze
data. If the correct number of header lines is selected, the
UI displays the data in two tables&#x2014;one table to specify
the values of the X and one of the Y coordinate. These X
and Y coordinates are the center of the circle, which
visualizes the gaze points on top the video stimulus.
Next, the user defines the stimulus video. After defining
an output location, the &#x201C;Conversion&#x201D; button becomes
active, and the user can start the conversion. After a
successful conversion, two MKV containers per gaze data
set were created, one based on USF and one based on ASS.</p>

        <p>Currently, this tool only supports CSV files and is not
able to synchronize gaze data to certain frames. This
issue is caused by various time-stamp formats, which
differ depending on the eye-tracking hardware used. A
further drawback of our first graphical UI is that
additional features, like the different visualization of saccades
and fixations as seen as in Figure 9(b), are not yet
implemented.</p>
      </sec>
	  
      <sec id="S4c">
        <title>Demuxing Multimedia Containers</title>
		
        <p>For expert use, extraction of all data in the MKV
container is necessary. For this purpose, the off-the-shelf tool
mkvextract (
          <xref ref-type="bibr" rid="b13">13</xref>
          ) can be used. As illustrated in Figure 4,
this standard tool for MKV container handling comes with
a graphical UI and a command line UI. Both were
developed and are maintained by the MKV community.</p>

<fig id="fig04" fig-type="figure" position="float">
					<label>Figure 4.</label>
					<caption>
						<p>Prototype of the graphical UI for converting gaze data set into MKV container which provides instantaneous visualization.</p>
					</caption>
					<graphic id="graph04" xlink:href="jemr-10-05-d-figure-04.png"/>
				</fig>
      </sec>
	  
      <sec id="S4d">
        <title>VA Using Standard Multimedia Players</title>
		
        <p>The use of gaze data presented in multimedia
containers is quite intuitive, as the user interface builds on
metaphors known from entertainment content. Hence, the user
can change the visualizations like subtitles. Different
types of auditory display can be selected, e.g. audio
languages (
          <xref ref-type="bibr" rid="b21">21</xref>
          ), shown in Figure 2(b). The nature of the
general subtitle system in VLC only allows for one active
subtitle track at all times, which unfortunately limits the
range of possibilities for VA tasks significantly. Since it
is often important to visualize gaze-object relations, e.g.,
to compare the reactions of subjects to a given stimulus,
we developed two VLC add-ons for the simultaneous and
synchronized playing of different subtitle tracks.</p>

        <p>Since such a feature is rarely needed in general
contexts and is therefore missing, the VLC player also
does not have the required abilities. To improve its
usability as a scientific tool, we extend it with this
operation commonly needed in the domain of visual
analytics. At this point, two VA solutions for providing
additional functionalities are discussed.</p>

        <p>On the one hand, one might directly extend the source
code of the software chosen for the extension if the
proposed USF subtitle format is used. Even if the required
changes are quite complex and low-level, they will pay
off in high performance and in a broad range of
dependencies between inner software parts of the application.
However, the increase in complexity with the resulting
consequences for performance and security is hardly
acceptable if just a small fraction of users will need it.
Moreover, such a custom patch would require custom
compilation of the software until it is sufficiently tested
and has found its way into the binary releases of the
player. The procedure of translating VLC&#x2019;s human-readable
code into binary executables and linking it to the more
than 50 required dependencies on the machine of the user
requires advanced knowledge of the system, and its
infrastructure, let alone that compilation may require root
rights.</p>

        <p>On the other hand, one might use some runtime
processing to avoid these drawbacks. By default, the VLC
multimedia player supports custom add-ons, which are
executed on demand and do not need any code
compilation. Therefore, it would be possible to have pieces of
code that are easy to install, usable, and removable
without any problems or residues such as applications on
current smartphones. Technically, these add-ons are small
text files with the extension LUA stored in
scopedependent directories. These scripts contain code written
in a slightly outdated version of the cross-platform
scripting language LUA, which was designed to provide a fast
interface to extend and customize complex software
projects like video games after their release. Therefore, LUA
extensions match our needs perfectly. We think that the
advantages in usage outweigh the in comparison to binary
code poorer performance of scripting solutions.
Therefore, we built two analysis tools using the dynamic
approach. Even if both add-ons may be able to fulfill the
same goals of interactive visualization and analysis of
different data set items, like the object of interest
annotation and the gaze points of participant P1A, at the same
time, they still differ. However, not only do they differ in
their mode of presentation&#x2014;one stimulus window versus
several stimulus windows&#x2014;, but also in their advantages
and disadvantages of their graphical representation
depending on the context of the VA task and the content of
the data set.</p>

        <p>Our first extension (SimSub), shown in Figure 5(a),
allows visualizing different eye-tracking datasets in
multiple windows. It may be used to provide a rough
overview of the experimental results of multiple participants.
A simple click on the requested subtitle tracks creates
another instance which behaves exactly like the previous
one. The parallel video streams are synchronized within a
certain hierarchy. The VLC player started by the
researcher works as a controller: Changes in its playback
are transmitted in real-time towards the other windows,
guaranteeing an exact comparison on a frame basis. The
other instances have neither control elements nor
timelines, but are purely depending on the &#x201C;master&#x201D; window.
Technically, the extension uses the fact that the VLC
multimedia player provides an interface for remote
access. If an eye-tracking dataset should be visualized in
parallel, it starts another instance in the background, sets
the proper subtitle track, and synchronizes it with the
main window. Any modification of the current state, e.g.
pausing or stopping is immediately sent via interprocess
communication towards all the other instances. This
approach allows using all the embedded features of the
VLC player regarding its capabilities of many supported
subtitle formats. Moreover, the isolated instances may not
influence each other in exceptional circumstances such as
parsing errors and allow a straightforward VA workflow
by a fully customizable order of the visualizations, with
the ability to show and hide them on demand. However,
the multiple instances may have a performance impact if
used in an excessive manner. This restriction arises due to
the limitation of the provided LUA interface that makes it
necessary to handle some commands in a very
cumbersome manner. Hence, our current implementation
sometimes lacks correct synchronization of the video and the
subtitle between all open views, especially when working
with three or more views.</p>

<fig id="fig05" fig-type="figure" position="float">
					<label>Figure 5.</label>
					<caption>
						<p>Different areas of applications for the extensions:(a)visualization of gaze data in multiple windows for finding correlations between subjects—(a)visualization of multiple gaze data within a window for detailed analysis.</p>
					</caption>
					<graphic id="graph05" xlink:href="jemr-10-05-d-figure-05.png"/>
				</fig>

        <p>Our second extension (MergeSub) allows visualizing
eye-tracking data from different participants using a
single instance of the VLC player, which also avoids the
synchronization problems (cf. Figure 5(b)). It is designed
to find tiny differences in the experimental results of
multiple participants, which might become obvious only
if played directly side by side. The researcher selects an
arbitrary number of subtitle tracks, which are then
displayed together on the video sequence. Technically, this
ability is provided by an automatic merge of the selected
subtitle tracks into a single track, saved in a temporary
file (we do not influence the renderer directly). As it is
imported afterwards, just like any other subtitle, the
researcher has full access to all capacities and advanced
functionalities of the VLC player. However, this
approach limits the usable subtitle formats dramatically,
because it requires a custom merging routine for every
format. Our extension so far covers just the formats
which are sophisticated enough to visualize eye-tracking
data. As a further improvement for this add-on, we plan
to assign different colors to the subtitle tracks, which will
drastically improve the usability.</p>

        <p>With a combination of both techniques, a simple
workflow for a basic VA task including reasoning is
possible. The dynamic extensibility of the VLC
multimedia player would allow further developments regarding
the needs of researchers. Thus, an out-of-the-box
multimedia player will become a powerful VA software. With
its ability for easily available and usable custom
extensions, today&#x2019;s multimedia players are suitable for a
variety of VA add-ons, which can be used in parallel or one by
one.</p>
      </sec>
    </sec>
	
    <sec id="S5">
      <title>Benefit of Gaze Data in Multimedia Containers</title>
	  
      <p>Storing gaze data or other scientific data in
multimedia containers will make them accessible to two major
groups&#x2014;the general audience, as well as professionals.
Empowered by our approach, almost everyone can
visualize and sonificate multimodal metadata. This makes
fields like gaze movement research more accessible,
understandable and exciting. Thus, it will become easier
to acquire non-student participants for experiments, and it
will reduce the reluctance against expensive test series.
For researchers, interdisciplinary projects will become
less complex and more ideas of cooperation can be
created if the widespread use of multimedia containers for
storing metadata becomes common practice. For
example, instead of downloading GBs of zipped files and
compiling some software only to be able to get an impression
what data is available, it is simple to explore the data set
online. This type of visualization may even be included in
scientific and popular science publications to demonstrate
unprocessed data. Moreover, using a common metadata
standard will terminate the tedious process of researchers
having to adjust their software for every different gaze
data set. Finally, if there is a significant amount of data
from the same task in the same format available, machine
learning techniques like deep learning can be applied on,
e.g., subject gaze trajectories to learn a human-like object
tracking behavior. Additionally, consumers may profit
from more developed features such as highlighting
objects everyone was looking at, and hearing-impaired
persons may obtain a computer-generated and
understandable auditory description of the scene based on
gazes.</p>

      <p>In order to emphasize these benefits, we will now
summarize a first user study, followed by a short
overview of already converted data sets and some ideas how
auditory display of metadata might extend visualized
data.</p>

      <sec id="S5a">
        <title>User Study</title>
		
        <p>Initial findings indicate that the proposed multimedia
containers are suitable for explorative multimodal data
analyses with standard multimedia players. Thus, experts,
as well as non-experts, can now easily access the data
(
          <xref ref-type="bibr" rid="b21">21</xref>
          ). Both user groups underline that the beauty of our
approach is, to get a first expression of the whole data set
with standard software. In their opinion, the proposed
format serves as a helpful means of communication, so
that, e.g., the exploratory investigation of human
sensorimotor interaction in natural environments (
          <xref ref-type="bibr" rid="b7 b8">7, 8</xref>
          )
will be improved.</p>

        <p>By extending these initial user interviews, we
performed a software user study with nine subjects. The
focus of this study is to i) reevaluate the initial findings
gained by the interviews (
          <xref ref-type="bibr" rid="b21">21</xref>
          ), ii) evaluate our first two
custom add-ons as to whether and in what manner will
they be used for performing a task and iii) collect
feedback on which features are missing, which use cases are
not considered, and criticisms affecting the usages.</p>

        <p><bold>Methodology.</bold> We designed the user study
questionnaire in four parts. The first part served for grouping the
users regarding their working and research experiences.
Before starting on the second part of the questionnaire, a
five-minute introduction and demonstration of our
approach was given. We demonstrated the applicability of
downloading, instantaneous visualizing of gaze data, and
demuxing the multimedia container for getting the
stimulus video, audio as well as the raw gaze data, on the
fourth video sequence of Coutrot and Guyader (
          <xref ref-type="bibr" rid="b6">6</xref>
          )
showing a moving bell. In addition to gaze data and to make
the participants think outside the box, the sonification of
EEG data (
          <xref ref-type="bibr" rid="b21">21</xref>
          ) was also illustrated. Based on this
introduction, the participants rated on a Likert scale, in the
questionnaires&#x2019; second section, how different target
groups, here unskilled users, domain experts and experts
from other domains, might benefit from providing
metadata in multimedia containers. The last question of
this section asked the participants to brainstorm and
collect possible use cases of the demonstrated approach.</p>

        <p>The third part of the questionnaire was focused on our
two VLC add-ons for the exploitative gaze data analysis.
Similar to the previous part, the add-ons SimSub and
MergeSub were demonstrated first. The visualization of
multiple object annotations in multiple windows (cf.
Figure 5(a)) was explained on the air plane video
sequence (
          <xref ref-type="bibr" rid="b20">20</xref>
          ) by showing the annotations of the three
airplanes, as well as the river&#x2014;each annotation in a single
playback window. We then introduced the second
addon, the merge of multiple gaze data on top of the stimulus
video (cf. Figure 5(b)), using the data set ball game (
          <xref ref-type="bibr" rid="b11">11</xref>
          ).
Afterwards, the participants performed three tasks. In the
first task, the participants evaluated whether the data set
car pursuit (
          <xref ref-type="bibr" rid="b11">11</xref>
          ) suffers from measurement errors based
<bold>on the MKV encoded version.</bold> In the second task, the
participants reevaluated their statement of task one <bold>on the
original</bold> data set, i.e., on the raw stimulus video file, the
annotation file, as well as the eye-tracking files from the
25 observers. For this task, the participants were allowed
only the same amount of time as they needed for the first
task. In the third task, the participants made statements
about the general visual attention behavior, as well as the
most focused object on the second video of Riche et al.
(
          <xref ref-type="bibr" rid="b17">17</xref>
          ), which is encoded by our proposed format (cf. Figure
7).</p>

        <p>The last part of the user study presented four closing
questions. The first two questions queried the data
accessibility of our approach, including our add-ons against the
data accessibility of the original, non-converted, data set.
Then, a question followed on missing or desirable
features for improving our approach. Finally, we asked for
general statements, criticisms, and ideas.</p>

        <p><bold>Results.</bold> Out of all participants, three had already
worked with gaze data and performed research on data
analysis related topics. The other six participants never
worked with gaze data before and had various kinds of
research interests, like deep learning, computer vision or
quantum chemistry.</p>

        <p>Figure 6 summarizes the first major assessments of
the participants after being introduced to metadata in
multimedia containers. All of the participants felt able to
indicate the significance of the approach to non-experts
and experts from non-sensory data domains. For the
experts in the sensory domain, only two participants were
not able to indicate their assumptions. Participants
suggested many potential use cases: sharing EEG and
eyetracking data in a simple and useful way, demonstrating
to the general public what the technology does, gaining
new insights from the data through novel visualizations,
inspecting commercials in customer behavior studies,
giving domain experts a feeling for their data, presenting
data in a dynamic way, for e.g. talks, get first impression
of raw data, and improve e-learning.</p>

<fig id="fig06" fig-type="figure" position="float">
					<label>Figure 6.</label>
					<caption>
						<p>Results based on the opinions of the participants about how different audiences (cf. (a)–(c)) might benefit from our approach, storing metadata in multimedia container. n=9 for (a) and (c), n=7 for (b). Mean marked by +, median by x, quantiles by I.</p>
					</caption>
					<graphic id="graph06" xlink:href="jemr-10-05-d-figure-06.png"/>
				</fig>

        <p>Within the practical tasks, seven of nine participants
answered the first task completely correct and two
partially. On an average, all participants solved the task in
two and a half minutes. For example, one participant
answered this task as follows: &#x201C;When the car drives
through the roundabout, the calibration of the eye-tracker
seems to be off. All subjects look to the similar areas
below the car.&#x201D; By reevaluating their statements on the
original unconverted data set, all participants stated that
this is impossible without scripts and programming. In
the last task, illustrated in Figure 7, eight of nine
answered correctly, that in the beginning, the subjects look
at different objects (non-uniform behavior) and, as soon
as the car entered the scene, nearly all of the participants
focused on the car and followed it. Note that for coping
with tasks 1 and 3, all participants only used the add-on
MergeSub. The add-on SimSub was never used.</p>

<fig id="fig07" fig-type="figure" position="float">
					<label>Figure 7.</label>
					<caption>
						<p>Fourth video sequence of (<xref ref-type="bibr" rid="b17">17</xref>) with visualized gaze tracks—yellow dots—from all ten subjects simultaneously by using MergeSub. (a) non-uniform behavior, subjects look at different objects. (b) most subjects focused their view on the car.</p>
					</caption>
					<graphic id="graph07" xlink:href="jemr-10-05-d-figure-07.png"/>
				</fig>

        <p>Within the closing section of the questionnaire, the
usability based on the three practical tasks was assessed
by all participants. Figure 8 illustrates the results. As
possible improvements for the MergeSub add-on, four
participants recommended assigning different colors to
each metadata track. In their opinion, the visualization
would then yield even more useful insights. In addition,
to that improvement, one participant suggested the user
could click on, e.g., a gaze point or an object annotation
to quickly extract quantitative information such as
coordinates and time. As criticism, the participants listed the
fact that the VLC Player resets the selected subtitle at the
end of the playback. Thus, one had to recreate the setting
used for the analytics task again and again. Further, one
participant argued that the current add-ons are yet not
stable so that sometimes rendering errors occur. Finally,
the participants remarked that it is a &#x201C;very interesting and
nice product&#x201D; as well as it is &#x201C;very useful for researchers
in eye-tracking and human behavior studies.&#x201D;</p>

<fig id="fig08" fig-type="figure" position="float">
					<label>Figure 8.</label>
					<caption>
						<p>Result of the Likert poll about the usability assessed directly after the practical task. For all sub-figures n=9. Mean marked by +, median by x, quantiles by I.</p>
					</caption>
					<graphic id="graph08" xlink:href="jemr-10-05-d-figure-08.png"/>
				</fig>

      </sec>
	  
      <sec id="S5b">
        <title>Available Data Sets</title>
		
        <p>We have already converted the first 319 video
sequences, including their metadata, into MKV containers
for promoting their use. The metadata of 319 data sets are
available in both the USF and the ASS format. Thus,
working with the USF version, there is still access to all data,
while the ASS version visualized instantaneously with
almost every currently available video player.</p>

        <p>Currently, five converted data sets with visualizations
can be downloaded<xref ref-type="fn" rid="FN1">1</xref>. The first converted data set (
          <xref ref-type="bibr" rid="b11">1</xref>
          ),
seen in Figure 5, contains eleven video stimuli together
with gaze trajectories of multiple subjects and annotated
objects. The second data set (
          <xref ref-type="bibr" rid="b6">6</xref>
          ) contains 60 video
stimuli including gaze data of 18 participants influenced by
four auditory conditions. The third data set (
          <xref ref-type="bibr" rid="b17">17</xref>
          ) shows four
different classes of content within the video stimuli,
while gazes of ten subjects were tracked (cf. Figure 7).
To evaluate the scalability of our approach, the 214
stimuli and the corresponding gaze trajectories of a data set
(
          <xref ref-type="bibr" rid="b1">1</xref>
          ) were converted, showing a short video sequence and
a freeze frame from it (example frames are shown in
Figures 2(a), 2(b), and 9). Further, as an example from
outside the gaze tracking domain, we converted seven
sequences of the Berkeley Video Segmentation Dataset
(BVSD) (
          <xref ref-type="bibr" rid="b23">23</xref>
          ) proving bounding-boxes, as well as
pixelaccurate annotation of several objects, illustrated in
Figure 2(c).</p>

<fig id="fig09" fig-type="figure" position="float">
					<label>Figure 9.</label>
					<caption>
						<p>Visualized gaze data by a reused subtitle track on a video by a standard multimedia player. The gaze position on the frames is visualized by a red circle with its diameter depending on the relative pupil size. Green squares visualize fixations. In this example, the sampling rate of the eye-tracker is higher than the frame rate of the video. Thus, multiple points and squares are shown.</p>
					</caption>
					<graphic id="graph09" xlink:href="jemr-10-05-d-figure-09.png"/>
				</fig>

      </sec>
	  
      <sec id="S5c">
        <title>Auditory Display of Metadata</title>
		
        <p>The sixth available dataset provides, in addition to
visualization of gaze points, an auditory display of the
alpha rhythm from the raw EEG data of a participant. For
transferring the resulting alpha power into sound, two
different approaches were taken. For the frequency
modulation, a carrier frequency of
220Hz&#x2014;corresponding to note a&#x2014;was modulated in its frequency by the
power of the alpha signal. For the volume modulation,
the same carrier frequency was modulated in its power
with respect to the alpha power, with a louder tone
meaning a stronger power. As visualized in Figure 1, the audio
tracks are reused for muxing the resulting audio streams
as WAV files into the MKV container.</p>
      </sec>
      </sec>	  
	  
      <sec id="S6">
        <title>Conclusion</title>
		
        <p>The importance of gaze data in general, and especially
the amount of VA applications, will significantly
increase, once gaze data, as well as other metadata, are
provided in multimedia containers. We have
demonstrated instantaneous visualization using these multimedia
containers with common video players. By extending
today&#x2019;s multimedia player with add-ons, such as our
addons for multi-subtitle visualization, simplistic VA is also
possible. Hence, no specific VA software is needed and
no new interaction metaphors must be trained. Familiar
metaphors are used in our add-ons, e.g. metaphors like
switching subtitles. In the long term, and as a result of the
data format comparison in Table 1, the research
community and the industry should seek to establish a standard
based on MPEG-7 for recording, exchanging, archiving,
streaming and visualizing, gaze as well as other metadata.
Due to its proper specification, MPEG-7 provides a
platform for consistent implementations in all media players,
but due to its specification, the implementation and
testing will require many resources. By muxing MPEG-7 into
MP4 containers, streaming, visualization, and auditory
display are guaranteed. Unfortunately, we recognize a
lack of MPEG-7 standard libraries and integration in
current media players. In order to quickly close this gap, we
have presented ad-hoc prototypes allowing us to promote
embedding metadata into multimedia containers, leading
to the immediate use of the metadata. Our approach
reuses the USF subtitle format to encode eye-tracking data,
allowing to visualize them in a patched version of the
popular VLC media player. For other media players, as
well as for standalone DVD players, we provide the
ASSbased multimedia container. To motivate the VA
software designers also to develop VA add-ons based on
existing software, we provide two VLC player add-ons,
which allow the reasoning if, e.g., two gaze trajectories of
two different subjects are focused on the same object of
interest. For the future, we plan to expand VA add-ons to
support almost every kind of scientific metadata and to
use the insights gained for bio-inspired, as well as,
sensory-improved computer vision (
          <xref ref-type="bibr" rid="b19">19</xref>
          ).</p>
		  
<table-wrap id="t01" position="float">
					<label>Table 1</label>
					<caption>
						<p>Comparison of the USF-based prototype, the ASS-based prototype, and the MPEG-7 approach. 1 indicate full, &#xBD; indicate partially, and 0 indicate non-support.</p>
					</caption>
					<table frame="hsides" rules="groups" cellpadding="3">
						<thead>
							<tr>
								<td align="center" rowspan="1" colspan="1"/>
								<td align="center" rowspan="1" colspan="1">USF prototype</td>
								<td align="center" rowspan="1" colspan="1">ASS prototype</td>
								<td align="center" rowspan="1" colspan="1">MPEG-7prototype</td>
							</tr>
						</thead>
						<tbody>						
          <tr>
            <td rowspan="1" colspan="1">General metadata representation</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">&#xBD;</td>
            <td rowspan="1" colspan="1">1</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Supporting complex metadata</td>
            <td rowspan="1" colspan="1">&#xBD;</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">1</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Out of the box media player integration</td>
            <td rowspan="1" colspan="1">&#xBD;</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">0</td>
          </tr>
	          <tr>
            <td rowspan="1" colspan="1">Parallel visualization of different metadata entries</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">&#xBD;</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Easy accessibility of metadata with scientific tools</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">&#xBD;</td>
            <td rowspan="1" colspan="1">1</td>
          </tr>
	          <tr>
            <td rowspan="1" colspan="1">Open source development tools available</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">&#xBD;</td>
          </tr>
	          <tr>
            <td rowspan="1" colspan="1">Encapsulation in a single file container</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">1</td>
          </tr>
	          <tr>
            <td rowspan="1" colspan="1">Streamable container with temporal data</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">1</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Formats supported for encapsulation</td>
            <td rowspan="1" colspan="1">MKV</td>
            <td rowspan="1" colspan="1">MKV, MP4, OGG,...</td>
            <td rowspan="1" colspan="1">MP4</td>
          </tr>		  
						</tbody>
					</table>
					</table-wrap>		  
		  
      </sec>
	  
      <sec id="S7">
        <title>Acknowledgements</title>
		
        <p>We acknowledge support by Deutsche
Forschungsgemeinschaft (DFG) and Open Access Publishing
Fund of Osnabr&#xFC;ck University.</p>
        <p>This work was funded by Deutsche
Forschungsgemeinschaft (DFG) as part of the Priority Program
&#x201C;Scalable Visual Analytics&#x201D; (SPP 1335).</p>
      </sec>	  
  </body>
      
  <back>
<fn-group>
  <fn id="FN1">
  <p>Demonstration video, software including all tools, and converted eye-tracking data sets can be downloaded at https://ikw.uos.de/%7Ecv/projects/mm-mkv</p>
  </fn>
  </fn-group> 
<ref-list>
<ref id="b1"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Açik</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Bartel</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>König</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Real and implied motion at the center of gaze.</article-title> <source>Journal of Vision (Charlottesville, Va.)</source>, <volume>14</volume>(<issue>1</issue>), <fpage>1</fpage>–<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1167/14.1.2</pub-id><pub-id pub-id-type="pmid">24396045</pub-id><issn>1534-7362</issn></mixed-citation></ref>
<ref id="b2"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Arndt</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Staab</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Troncy</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name><surname>Hardman</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2007</year>). Adding formal semantics to MPEG-7 (Arbeitsberichte des Fachbereichs Informatik No. 04/2007). Universität Koblenz-Landau. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="https://userpages.uni">https://userpages.uni</ext-link>- <ext-link ext-link-type="uri" xlink:href="koblenz.de/~staab/Research/Publications/2007/arbeitsberichte_4_2007.pdf">koblenz.de/~staab/Research/Publications/2007/arbeitsberichte_4_2007.pdf</ext-link></mixed-citation></ref>
<ref id="b3"><mixed-citation publication-type="book-chapter" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Bailer</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Fürntratt</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Schallauer</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Thallinger</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Haas</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2011</year>). A C++ library for handling MPEG- 7 descriptions. In ACM international conference on multimedia (MM) (pp. 731–734). ACM Press. doi:10. 1145/2072298.2072431</mixed-citation></ref>
<ref id="b4"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Bertellini</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Reich</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2010</year>). <article-title>DVD supplements: A commentary on commentaries.</article-title> <source>Cinema Journal</source>, <volume>49</volume>(<issue>3</issue>), <fpage>103</fpage>–<lpage>105</lpage>. <pub-id pub-id-type="doi">10.1353/cj.0.0215</pub-id><issn>0009-7101</issn></mixed-citation></ref>
<ref id="b5"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Bresenham</surname>, <given-names>J. E.</given-names></string-name></person-group> (<year>1965</year>). <article-title>Algorithm for computer control of a digital plotter.</article-title> <source>IBM Systems Journal</source>, <volume>4</volume>(<issue>1</issue>), <fpage>25</fpage>–<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1147/sj.41.0025</pub-id><issn>0018-8670</issn></mixed-citation></ref>
<ref id="b6"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Coutrot</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Guyader</surname>, <given-names>N.</given-names></string-name></person-group> (<year>2014</year>). <article-title>How saliency, faces, and sound influence gaze in dynamic social scenes.</article-title> <source>Journal of Vision (Charlottesville, Va.)</source>, <volume>14</volume>(<issue>8</issue>), <fpage>5</fpage>–<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1167/14.8.5</pub-id><pub-id pub-id-type="pmid">24993019</pub-id><issn>1534-7362</issn></mixed-citation></ref>
<ref id="b7"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Einhäuser</surname>, <given-names>W.</given-names></string-name> &amp; <string-name><surname>König</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2010</year>). Getting real—sensory processing of natural stimuli. Current Opinion in Neurobiology, 20(3), 389–395. doi:<pub-id pub-id-type="doi">10.1016/j.conb.2010</pub-id>. 03.010</mixed-citation></ref>
<ref id="b8"><mixed-citation publication-type="journal" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Einhäuser</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Schumann</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Bardins</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Bartl</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Böning</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Schneider</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>König</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2007</year>). <article-title>Human eye-head co-ordination in natural exploration.</article-title> <source>Network (Bristol, England)</source>, <volume>18</volume>(<issue>3</issue>), <fpage>267</fpage>–<lpage>297</lpage>. <pub-id pub-id-type="doi">10.1080/09548980701671094</pub-id><pub-id pub-id-type="pmid">17926195</pub-id><issn>0954-898X</issn></mixed-citation></ref>
<ref id="b9"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><collab>ISO/IEC</collab></person-group>. (<year>2001</year>). Information technology—multimedia content description interface—Part 3: Visual (ISO/IEC 15938-3:2001).</mixed-citation></ref>
<ref id="b10"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><collab>ISO/IEC</collab></person-group>. (<year>2003</year>). Information technology—coding of audiovisual objects—Part 14: MP4 file format (ISO/IEC 14496-14:2003).</mixed-citation></ref>
<ref id="b11"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Kurzhals</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Bopp</surname>, <given-names>C. F.</given-names></string-name>, <string-name><surname>Bässler</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Ebinger</surname>, <given-names>F.</given-names></string-name>, &amp; <string-name><surname>Weiskopf</surname>, <given-names>D.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli.</article-title> In <source>Workshop on beyond time and errors novel evaluation methods for visualization (BELIV)</source> (pp. <fpage>54</fpage>–<lpage>60</lpage>). <publisher-name>ACM Press.</publisher-name> doi:<pub-id pub-id-type="doi" specific-use="author">10.1145/2669557</pub-id>. 2669558 <pub-id pub-id-type="doi">10.1145/2669557.2669558</pub-id></mixed-citation></ref>
<ref id="b12"><mixed-citation publication-type="web-page" specific-use="unparsed">Matroska. (<year>2017</year>). Matroska media container. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="https://www.matroska.org/">https://www.matroska.org/</ext-link></mixed-citation></ref>
<ref id="b13"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><collab>MKVToolNix</collab></person-group>. (<year>2017</year>). mkvmerge. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from https : / / mkvtoolnix . download / doc / mkvmerge.html</mixed-citation></ref>
<ref id="b14"><mixed-citation publication-type="unknown" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Mokhtarian</surname>, <given-names>F.</given-names></string-name> &amp; <string-name><surname>Bober</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2003</year>). Curvature scale space representation: Theory, applications, and MPEG-7 standardization. Springer Netherlands. doi:10 . 1007 / 978-94-017-0343-7</mixed-citation></ref>
<ref id="b15"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Paris</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Vialle</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Hammer</surname>, <given-names>U.</given-names></string-name></person-group> (<year>2010</year>). TitleVision - USF specs. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="http://www.titlevision.dk/usf.htm">http://www.titlevision.dk/usf.htm</ext-link></mixed-citation></ref>
<ref id="b16"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Pfeiffer</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Parker</surname>, <given-names>C. D.</given-names></string-name>, &amp; <string-name><surname>Pang</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2006</year>). The continuous media markup language (CMML), version 2.1. Internet Engineering Task Force (IETF). <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="https://www.ietf.org/archive/id/draft-pfei_er-cmml-03.txt">https://www.ietf.org/archive/id/draft-pfei_er-cmml-03.txt</ext-link></mixed-citation></ref>
<ref id="b17"><mixed-citation publication-type="book-chapter" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Riche</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Mancas</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Culibrk</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Crnojevic</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Gosselin</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name><surname>Dutoit</surname>, <given-names>T.</given-names></string-name></person-group> (<year>2013</year>). <chapter-title>Dynamic saliency models and human attention: A comparative study on videos.</chapter-title> In Computer vision – ACCV (pp. 586–598). Springer Berlin Heidelberg. doi:10. 1007/978- 3- 642- 37431- 9_45 <pub-id pub-id-type="doi">10.1007/978-3-642-37431-9_45</pub-id></mixed-citation></ref>
<ref id="b18"><mixed-citation publication-type="book-chapter" specific-use="restruct"><person-group person-group-type="author"><string-name><surname>Schöning</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Faion</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name><surname>Heidemann</surname>, <given-names>G.</given-names></string-name></person-group> (<year>2016</year>). <chapter-title>Interactive feature growing for accurate object detection in megapixel images</chapter-title>. In <source>Computer vision – ECCV workshops</source> (pp. <fpage>546</fpage>–<lpage>556</lpage>). <publisher-name>Springer International Publishing</publisher-name>; <pub-id pub-id-type="doi">10.1007/978-3-319-46604-0_39</pub-id></mixed-citation></ref>
<ref id="b19"><mixed-citation publication-type="conference" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Schöning</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Faion</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Heidemann</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Krumnack</surname>, <given-names>U.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Eye tracking data in multimedia containers for instantaneous visualizations.</article-title> In <source>Workshop on eye tracking and visualization (ETVIS)</source> (pp. <fpage>74</fpage>–<lpage>78</lpage>). Institute of Electrical and Electronics Engineers (IEEE). doi:<pub-id pub-id-type="doi" specific-use="author">10.1109/etvis.2016.7851171</pub-id> <pub-id pub-id-type="doi">10.1109/ETVIS.2016.7851171</pub-id></mixed-citation></ref>
<ref id="b20"><mixed-citation publication-type="book-chapter" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Schöning</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Faion</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Heidemann</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Krumnack</surname>, <given-names>U.</given-names></string-name></person-group> (<year>2017</year>). <chapter-title>Providing video annotations in multimedia containers for visualization and research.</chapter-title> In Winter conference on applications of computer vision (WACV) (pp. 650–659). Institute of Electrical and Electronics Engineers (IEEE). doi:10. 1109 / wacv. 2017.78 <pub-id pub-id-type="doi">10.1109/WACV.2017.78</pub-id></mixed-citation></ref>
<ref id="b21"><mixed-citation publication-type="conference" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Schöning</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Gert</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Açik</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Kietzmann</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Heidemann</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>König</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2017</year>). Exploratory multimodal data analysis with standard multimedia player – multimedia containers: A feasible solution to make multimodal research data accessible to the broad audience. In International joint conference on computer vision, imaging and computer graphics theory and applications (VISAPP) (pp. 272–279). SCITEPRESS - Science and Technology Publications. doi:10. 5220 / 0006260202720279</mixed-citation></ref>
<ref id="b22"><mixed-citation publication-type="unknown" specific-use="unparsed">SSA v4.00+. (<year>2012</year>). Sub station alpha v4.00+ script format. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="http://moodub.free.fr/video/ass-specs.doc">http://moodub.free.fr/video/ass-specs.doc</ext-link></mixed-citation></ref>
<ref id="b23"><mixed-citation publication-type="conference" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Sundberg</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Brox</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Maire</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Arbelaez</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name><surname>Malik</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2011</year>). Occlusion boundary detection and figure / ground assignment from optical flow. In Conference on computer vision and pattern recognition (CVPR) (pp. 2233–2240). Institute of Electrical and Electronics Engineers (IEEE). doi:<pub-id pub-id-type="doi">10.1109/cvpr.2011.5995364</pub-id></mixed-citation></ref>
<ref id="b24"><mixed-citation publication-type="web-page" specific-use="unparsed">W3C. (<year>2013</year>). Timed text markup language 1 (TTML1) (second edition). <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="https://www.w3.org/TR/ttml1/">https://www.w3.org/TR/ttml1/</ext-link></mixed-citation></ref>
<ref id="b25"><mixed-citation publication-type="web-page" specific-use="unparsed">W3C. (<year>2014</year>). RDF - semantic web standards. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="https://www.w3.org/RDF/">https://www.w3.org/RDF/</ext-link></mixed-citation></ref>
<ref id="b26"><mixed-citation publication-type="web-page" specific-use="unparsed">W3C. (<year>2017</year>). Timed text working group. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="http://www.w3.org/AudioVideo/TT/">http://www.w3.org/AudioVideo/TT/</ext-link></mixed-citation></ref>
<ref id="b27"><mixed-citation publication-type="conference" specific-use="unparsed"><person-group person-group-type="author"><string-name><surname>Winkler</surname>, <given-names>S.</given-names></string-name>&amp;<string-name><surname>Ramanathan</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2013</year>). <article-title>Overview of eye tracking datasets.</article-title> In <source>Workshop on quality of multimedia experience (QoMEX)</source> (pp. <fpage>212</fpage>–<lpage>217</lpage>). Institute of Electrical and Electronics Engineers (IEEE). doi:10. 1109 / qomex.2013.6603239</mixed-citation></ref>
<ref id="b28"><mixed-citation publication-type="book-chapter" specific-use="linked"><person-group person-group-type="author"><string-name><surname>Wylie</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Romney</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Evans</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name><surname>Erdahl</surname>, <given-names>A.</given-names></string-name></person-group> (<year>1967</year>). <chapter-title>Half-tone perspective drawings by computer.</chapter-title> In Fall joint computer conference on AFIPS (pp. 49–58). New York, NY, USA: ACM Press. doi:<pub-id pub-id-type="doi" specific-use="author">10.1145/1465611</pub-id>. 1465619 <pub-id pub-id-type="doi">10.1145/1465611.1465619</pub-id></mixed-citation></ref>
<ref id="b29"><mixed-citation publication-type="web-page" specific-use="parsed"><person-group person-group-type="author"><collab>Xiph.org</collab></person-group>. (<year>2013</year>). <article-title>CMML mapping into Ogg.</article-title> <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="https://wiki.xiph.org/index.php/CMML#CMML_mapping_into_Ogg">https://wiki.xiph.org/index.php/CMML#CMML_mapping_into_Ogg</ext-link></mixed-citation></ref>
<ref id="b30"><mixed-citation publication-type="web-page" specific-use="unparsed"><person-group person-group-type="author"><collab>Xiph.org</collab></person-group>. (<year>2016</year>). Ogg. <date-in-citation content-type="access-date">Retrieved November 15, 2017</date-in-citation>, from <ext-link ext-link-type="uri" xlink:href="https://xiph.org/ogg/">https://xiph.org/ogg/</ext-link></mixed-citation></ref>
</ref-list>
  </back>
  
</article>

