Integration and prediction difficulty in Hindi sentence comprehension: Evidence from an eye-tracking corpus

This is the first attempt at characterizing reading difficulty in Hindi using naturally occurring sentences. We created the Potsdam-Allahabad Hindi Eyetracking Corpus by recording eye-movement data from 30 participants at the University of Allahabad, India. The target stimuli were 153 sentences selected from the beta version of the Hindi-Urdu treebank. We find that wordor low-level predictors (syllable length, unigram and bigram frequency) affect first-pass reading times, regression path duration, total reading time, and outgoing saccade length. An increase in syllable length results in longer fixations, and an increase in word unigram and bigram frequency leads to shorter fixations. Longer syllable length and higher frequency lead to longer outgoing saccades. We also find that two predictors of sentence comprehension difficulty, integration and storage cost, have an effect on reading difficulty. Integration cost (Gibson, 2000) was approximated by calculating the distance (in words) between a dependent and head; and storage cost (Gibson, 2000), which measures difficulty of maintaining predictions, was estimated by counting the number of predicted heads at each point in the sentence. We find that integration cost mainly affects outgoing saccade length, and storage cost affects total reading times and outgoing saccade length. Thus, word-level predictors have an effect in both early and late measures of reading time, while predictors of sentence comprehension difficulty tend to affect later measures. This is, to our knowledge, the first demonstration using eye-tracking that both integration and storage cost influence reading difficulty.

Unfortunately, research on eyetracking corpora involving Asian languages is rare (exceptions are Chinese, e.g., Yan, Kliegl, Richter, Nuthmann, &Shu, 2010, andUighur, Yan et al., 2014).In this paper, we present an analysis of an eyetracking corpus of Hindi that we have developed, the Potsdam-Allahabad Hindi Eyetracking Corpus.Our focus in this paper is on predictors of language processing difficulty as indexed by fixation-based measures.
Hindi is a language spoken primarily in India.It is difficult to estimate the number of speakers worldwide; one estimate is 180-258 million speakers (http://en.wikipedia.org/wiki/Hindi).Hindi belongs to the Indo-European family and is head-final; i.e., the default word order is subject-object-verb.It is characterized by relatively free word order and overt case-marking using postpositions.
The Hindi sentences used in the study have several attractive properties: the sentences used in the corpus are taken from the beta version of the Hindi-Urdu treebank (Bhatt et al., 2009), and are therefore already annotated for syntactic structure and part-of-speech.This allows us to compute several low-level (lexical-level) and high-level (sentence-level) predictors of reading difficulty.The corpus can therefore serve as a basis for investigating theories of eye-movement control and theories of sentence comprehension.It is intended to add to the existing large-scale naturalistic data-sets that are available for investigating theories of reading difficulty.
Our study represents a first attempt at characterizing reading difficulty in Hindi in naturally occurring sentences.We begin by explaining how the Hindi script (Devanagari) is structured; understanding the details of the script is important for the various word-level predictors we discuss.Then, we describe the various predictors of reading difficulty that were computed from the Hindi Treebank.We then provide statistical analyses using various reading time measures and outgoing saccade length as a dependent variable.In particular, the effect of the following predictors on reading difficulty is investigated: graphemic complexity, syllable length, unigram and bigram frequency, integration and storage cost.
Devanagari is read from left to right; words and casemarking morphemes are separated by spaces, and there is no upper-and lower-case distinction.Each word-unit that is separated by spaces usually has a horizontal line spanning the characters.Sentence-final full-stops are written as a vertical line, but standard punctuation markers such as commas are also used.
An interesting feature of Hindi orthography is that the linear position of a grapheme in the text does not always correspond to the order in which the graphemes are pronounced.For example, vowels can be written as diacritic symbols below or above a consonant but pronounced after the consonant; and short vowels can occur before a consonant even though they are pronounced after the consonant.In all such instances, when this asymmetry between writing order and pronunciation order occurs, it is possible that the difficulty in reading increases.Vaid and Gupta (2002) have investigated this issue, and found some evidence that mismatches between orthography and pronunciation impact reading of isolated words.

Participants
Thirty graduate and undergraduate students of the University of Allahabad participated in the experiment for payment.All of them had had an Urdu medium education until at least high school and described themselves as fluent in reading both the Perso-Arabic script used for Urdu as well as the Devanagari script used for Hindi.
As noted above, the experiment was conducted in an urban (university) setting.Mono-lingual readers in India are rare (especially in an urban areas).Most educated individuals have considerable familiarity with more than one script.For example, English is taught in almost all schools (except remote areas where illiteracy is also an issue).Therefore familiarity with Latin script is common.Similarly, all college-going individuals have good command over the Latin script, as the medium of higher education in India is often English.The speakers who participated in this study could read Hindi (in Devanagari script) and Urdu (in Perso-Arabic script).In the part of India where this experiment was conducted, exposure to Devanagari script happens quite early in schooling as Hindi is a compulsory subject from preschool until at least pre-college education.This holds true irrespective of whether the medium of instruction is Hindi, Urdu or English.In addition, individuals often need to know Hindi in order to negotiate their day-today activities; this is because road signs, advertisements, shop signs, etc., are often in Hindi.To summarize, in a setting where this experiment was conducted, monolingual readers are rare and finding such individuals is very difficult.We decided to make a virtue out of this difficulty by systematizing the collection of data in the two languages that readers were likely to be familiar with in that particular part of India (Allahabad).We do not report the Urdu data here as it would make the paper too complex.We plan to discuss the Urdu data in a separate paper.

Equipment
The experiment was conducted using the SMI iView X HED eyetracker with 500Hz sample rate.The subject was seated 50cm from the stimulus screen.Sentence were shown at the centre of the screen in a single line.The monitor used to display was Acer 19" LED with a 1600 × 900 screen resolution.The refresh rate of the monitor was 60Hz.Hindi text was displayed using the Mangal true-type 17 point font1 .On average approximately 1.8 syllables2 subtend 1 • of visual angle in this experimental setup.

Materials
A subset of the Hindi-Urdu treebank data (Bhatt et al., 2009) which has 400,000 words was used to get the experimental sentences.We transcribed the Hindi data using the Perso-Arabic script to get the Urdu sentences.This gave us identical text in two different scripts for the two languages.This provided us an opportunity to study reading processes in both languages using our bilingual subject pool.As noted earlier, while Hindi is written in the Devanagari script, Urdu is written in the Perso-Arabic script.Structurally Hindi and Urdu are almost identical; however some differences exist in the lexicon.Hindi/Urdu spoken colloquially have a shared vocabulary as well.Since Hindi treebank text had some Sanskritized words, which participants may not be familiar with, these were substituted with more colloquial alternatives.We used 153 sentences (2610 words) for each language (Hindi and Urdu), and additionally four sentences were used as practice sentences.We avoided using sentences that had a political bent.The target sentences that were chosen were about topics such as movies, entertainment, and sports.The target sentences chosen were not isolated sentences, but formed short narratives consisting of several sentences.Each sentence from a narrative was presented separately on a screen, and the end of a narrative was signaled by a blank screen.

Procedure
Participants were required to read identical texts in Hindi (Devanagari) script and Urdu (Perso-Arabic) script.Since the content of the two scripts was identical, the experiment was conducted in two blocks over two days.The order of presentation was pseudo-randomized such that participants were exposed to one of eight combinations of these two factors.Table 1 shows all the groups.Each participant was randomly assigned to one of these groups.For example, in Group 1, the first part of Hindi text (74 sentences) was read in the first block of the first session, then after an interval of 5 minutes, the second part of the Urdu text (79 sentences) was read in the second block.In order to reduce the effect of familiarity, participants read the remaining sentences of each language after a few days.The reading task on the second day also consisted of two blocks.The average gap between the two sessions was 5.7 days.One concern with such a setup could be that the text read in the second session will be influenced by the text read in the first session.We therefore report the results by using session id as a factor in the final analysis to determine whether this had an effect; all interactions with session were also investigated.
The experiment started with the experimenter orally briefing the participant as regards the task.This was followed by subject reading written instructions on the computer screen.Following this, a 13-point calibration was performed.The experiment started with four practice sentences, following which the experimental sentences were presented.A trial started with the presentation of a gaze-correction point on the centre left of the screen.Fixating on this point briefly led to the presentation of the sentences.After reading the sentence, the participant looked at a small dot on the bottom-right of the screen and pressed the left-button of a mouse.Recalibration was done after every 15 sentences or if the fixation on the gaze-correction point didn't trigger the sentence presentation.A blank screen was presented to signal the end of a narrative.Comprehension questions were not asked due to time constraints.Although participants were instructed to read the sentences carefully so that they understand its meaning, it is quite possible that they did not do so while reading a sentence.However, the results show that the participants were indeed attending to the sentences carefully.Further evidence comes from the results that are consistent with reading patterns in other languages.For example, the effect of word (syllable) length and word frequency is consistent with previous literature.In addition to this, we see a significant effect of sentence-level processing factors such as storage cost and distance cost; this suggest active involvement of the subjects during the reading process.

Computing word and sentence level predictors
We computed several measures of processing difficulty for this corpus.It is well-known in the eyemovement research literature that word length, and unigram and bigram frequency are predictors of reading difficulty (Rayner, 1998;McDonald & Shillcock, 2003;Kliegl et al., 2006).In addition, due to the special properties of Devanagari characters, we also developed a metric for graphemic complexity.We also computed a metric for sentence comprehension difficulty based on the work of Gibson (2000); distance cost and storage cost.Together, these predictors can be seen as representative of so-called low-level and high-level predictors of processing difficulty (Boston et al., 2008(Boston et al., , 2011; Dem-A metric for word complexity.In the appendix we present a first attempt at quantifying the complexity cost of Hindi characters.The work by Vaid and Gupta (2002) on the effect of character complexity on reading served as a guide when developing this metric.In essence, our metric defines a linear penalty metric for mismatches in character order and pronunciation order: (a) if a vowel diacritic appears to the left of the consonant but is pronounced after the consonant, the cost is 1; (b) if a diacritic appears above or below a consonant, the cost is 0.5, (c) if a consonant appears in a consonant cluster, i.e. without its inherent vowel, the cost is 0.5, and (d) ligatures get a cost of 1.The assumption here is that violation of character order (relative to pronunciation order) should get the maximum penalty because that seems to be the cause of greatest complexity (Vaid & Gupta, 2002); diacritics and consonants without vowels do not violate order as such, but they do require more processing effort than the cases where character order matches pronunciation order perfectly.Under this metric, the mean word complexity in the Hindi text was 0.46 (minimum: 0, maximum: 5.5).We also experimented with a metric that penalizes all deviations from the simplest case equally; the results were comparable to the one reported using the metric described above.
Syllable length.The syllable boundary is used for computing word length, in particular, a consonant-vowel combination is considered a single unit.For example, िमल /mIl/ has a syllable length of 2 = 1 (िम) + 1 (ल).In case of ligatures leading to complex forms or for composite character, the entire combination is considered as a single unit, for example, the syllable length of कािनर् वाल /kArnIval/ will be 4 = 1 (का) + 1 (िनर् ) + 1 (वा) + 1 (ल).Likewise, the syllable length of पर्धानमं तर्ी /pr@d h anm@ntri/ will be 5 = 1 (पर्) + 1 (धा) + 1 (न) + 1 (मं ) + 1 (तर्ी).This criterion for segmentation is also influenced by practical concerns of the eyetracking paradigm.It would be difficult to ascertain the gaze position accurately at the level of the individual character especially in cases such as discussed above.The mean syllable length in the experimental items was 2.2 (minimum: 1, maximum: 10).
Unlike character-based scripts such as Latin, in Devanagari, a consonant or a vowel need not take constant space.In addition, as stated above the characters combine to form ligatures; they also appear as diacritics above or below another character.Given these properties, we found it reasonable to compute word length based on syllable count.We also computed the standard definition of computing word length, i.e., counting the number of consonants and vowels in a word.Word length computed using this standard definition is correlated (.60) with graphemic complexity.The results obtained using this definition were similar to the one obtained using the syllable-based word length.

Frequency (unigram and bigram).
The unigram and the bigram frequencies were computed using the beta version of the Hindi-Urdu treebank data (Bhatt et al., 2009), which has 400,000 words.
Distance cost.Integration cost is a processing metric proposed by Gibson (2000) as part of a more general Dependency Locality Theory (DLT).It intends to capture the retrieval cost of a dependent at its integration site (also see Lewis & Vasishth, 2005); in other words, the integration cost metric aims to characterize the online processing cost of completing the dependency link between an already seen/heard word and co-dependent being currently processed.Some examples are subjectverb dependencies, and antecedent-reflexive dependencies.We computed an approximation of integration cost: the distance in words between two co-dependents.For example (1), the distance cost at 'narrated' would be 8 = 5 (for दीिपका) (ne, Abhay, ko, ek, kahaanii), and 3 (for अभय) (ko, ek, kahaanii).The distance cost was calculated manually; we did not compute dependencies using a dependency grammar representation because we wanted to ensure that there was no loss of accuracy.

Storage cost.
While integration cost is intended to characterize the cost of completing a dependency, storage cost was proposed by Gibson (2000) to characterize the processing load incurred as a result of maintaining predictions of upcoming heads.In example (1), the storage cost at the verbal arguments (दीिपका, अभय and कहानी) would be 1, while the storage cost at the verb is 0. The mean storage cost was 1.01 (minimum: 0, maximum: 3).Storage cost was also computed by hand.
The correlations between the predictors are shown in Table 3.As expected, syllable length and frequency are negatively correlated (−0.63), word frequency and bigram frequency have correlation 0.36.Distance cost

Statistical analyses
All analyses for fixation measures were carried out with Bayesian linear mixed models using Stan, version 2.5 (Stan Development Team, 2014).We fit full variance-covariance matrices for the subject-and itemlevel main effects and interactions, including correlation estimates (i.e., we fit two 14 × 14 variance-covariance matrices for subject and item effects, respectively).One of the advantages of using Bayesian hierarchical models rather than frequentist ones is that we can directly compute the posterior probability of the coefficient of a particular effect being positive or negative given the data; unlike the frequentist approach, there is no need to indirectly draw inferences about the effect by appealing to the questionable procedure of rejecting a null hypothesis and computing a p-value (see, for example, Gelman (2013)).Another advantage is that we can fit a statistical model that takes into account all possibly relevant variance components.This currently cannot be done with the frequentist tools available, because of convergence or estimation failures.Bayesian hierarchical models do not suffer from this problem because mildly informative priors are defined over all parameter estimates; if there is insufficient data to estimate the parameters, the prior will dominate in determining the posterior distribution, and will ensure that the posterior mean is near 0.
The details of the Bayesian model-fitting procedure are discussed in detail in (Sorensen & Vasishth, 2014) and in the R package RePsychLing, available on github.
The source code for the models fit in the present paper is available from https://github.com/vasishth/StanJAGSexamples.The Stan analyses are summarized in the tables below using means and 95% posterior credible intervals for each coefficient.Credible intervals present the bounds within which we can be 95% certain that the true value of the parameter lies (given our particular data).We assume that an effect is present if the 0 value is not within the 95% credible interval.
All predictors were scaled; each predictor vector (centered around its mean) was divided by its standard deviation.Saccade and fixation detection was done using the saccades package developed by von der Malsburg (https://github.com/tmalsburg/saccades).Fixation measures were computed using the R package em2 (Logačev & Vasishth, 2014) (downloadable from http://cran.r-project.org/src/contrib/Archive/em2/).We present analyses for one representative first-pass measure, first-pass reading time, and two representative measures that often show the effects of sentence comprehension difficulty, regression-path duration and total reading time (Clifton, Staub, & Rayner, 2007;Vasishth, von der Malsburg, & Engelmann, 2012).First-pass reading time on a word refers to the sum of the fixation durations on the word after it has been fixated after an incoming saccade from the left, until the word is exited to the right.Regression path duration on a word refers to the sum of the first-pass reading times and all fixations on preceding words, until the word is exited to the right.Total reading time is the sum of all fixations on a word; in other words, it is the sum of first-pass reading times and re-reading times.Each word served as a region of interest.All data points recorded with zero ms for these fixation measure (about 25% of the data) were removed, and the data analysis was done on log-transformed reading times to achieve approximate normality of residuals.Most of the zero ms fixations were due to short words being skipped entirely; this is quite normal in eyetracking data.
We also computed the length of the outgoing saccade (in syllables) from each word.This is defined as the length of a rightward saccade from a given word to a subsequent word during any pass, first-pass, or a revisit.The distribution of the saccade lengths can be modeled as an exponential distribution, with rate 0.36.Minimum 8(2):3, 1-12 Husain, Vasishth, Narayanan (2015) Integration and prediction difficulty in Hindi  2013)).Moreover, it is well known at least since Rayner (1979) that the length of an outgoing saccade depends partly on the length of the word fixated next; this is because the reader attempts to direct the saccade to the preferred viewing location of the next word.This preferred viewing position is slightly to the left of the center of a word.There is of course much more to be said about constraints on saccade launch and landing; but since our primary interest is in measures of sentence comprehension difficulty, we do not discuss these details any further.

Reading time and outgoing saccade length analysis
In log first pass reading times, we see effects of syllable length and bigram frequency in the expected directions: increase in syllable length leads to slower reading times, and higher bigram frequency leads to faster reading times.The credible intervals for unigram frequency include 0, but the posterior probability of the coefficient for frequency being less than 0 is 0.79.The distance cost metrics of integration cost and storage also have credible intervals including 0; the posterior probability of the IC coefficient being positive is 0.88, and of the SC coefficient is 0.67.Thus, there is only weak evidence for distance cost playing a role even in this relatively early measure of reading difficulty.Finally, although the credible interval for the effect of session includes 0, the posterior probability of the coefficient for session being less than 0 is 0.94; in other words, in the second session, readers tended to read faster.None of the interactions between session and the other factors seem have a large effect.
In log regression path durations, we see effects of syllable length and bigram frequency in the expected directions.The credible intervals for all other predictors include 0. Perhaps surprisingly, the coefficient for storage cost is negative, with a posterior probability of the coefficient being negative being 0.91.Thus, in log regression-path duration, we see faster reading times with increasing storage cost.We return to this point in the general discussion.
In log total reading time, we see effects of syllable length, unigram and bigram frequency, in the expected directions.In addition, we see an effect of storage cost, with higher cost leading to longer log total reading time.There is evidence for a session effect as well, with the second session leading to faster log reading time.None of the interactions between session and the other predictors seem to be relevant.
In log outgoing saccade length, we find effects of syllable length and unigram and bigram frequency.The effect of syllable length of the current word on log saccade length is consistent with the findings reported by Rayner (1979).As expected, the length (in syllables) of the word fixated next also has an effect: the outgoing saccade length is longer if the word fixated next is longer.This is due to the preferred viewing location effect discussed earlier.Regarding the syntactic distance measures, increasing integration cost leads to shorter saccade length, and increasing storage cost leads to longer saccade length.No effect of session seems to be present, and no interactions between session and the other predictors appears to have an impact.8(2):3, 1-12 Husain, Vasishth, Narayanan (2015) Integration and prediction difficulty in Hindi

Discussion
To summarize the results, in log first pass reading times we primarily see stronger effects of "low-level" predictors than for syntactic-level processing difficulty such as integration cost; we also see some weak evidence for a session effect, with the second session showing faster reading times.In log regression path duration, we see clear effects of syllable length and frequency, and weak evidence for faster reading time with increasing storage cost.Log total reading time shows effects of syllable length and frequency in the expected directions, with an effect of storage cost, such that increasing SC results in longer reading times.Session effects are also seen: the second session is read faster.Finally, consistent with previous work on reading, log outgoing saccade length shows effects of syllable length: longer syllable length leads to longer log saccade length.Frequency also shows a clear effect: increasing frequency leads to longer outgoing saccades.Finally, increased integration cost leads to shorter saccade length.
The effects of low-level predictors on reading times are consistent with the findings in the literature on reading: longer syllable length leads to longer fixations, and higher frequency leads to shorter fixations.Perhaps surprisingly, we don't find a reliable effect of graphemic complexity on reading difficulty.This is surprising because Vaid and Gupta (2002) did find effects of graphemic complexity.However, this absence of an effect may be due to several reasons.First, Vaid and 8(2):3, 1-12 Husain, Vasishth, Narayanan (2015) Integration and prediction difficulty in Hindi Gupta did not test natural reading, but rather presented isolated words to subjects to read out.It is possible that in natural reading, readers process complex graphemes as a unit and are not affected by mismatches between character order and pronunciation order.A second possibility is that our graphemic complexity metric may not characterize the sources of difficulty correctly.A third possibility is that it may simply be a question of low statistical power.A larger scale study can clarify this point.
The effects of increasing word frequency on saccade length are as expected: increasing frequency (unigram and bigram) leads to longer outgoing saccades.This frequency effect is easily explained: higher frequency translates to greater processing ease, which may allow the current fixation to process more letters, thereby allowing a saccade to be programmed further to the right (Rayner et al., 2004), (Wei et al., 2013).
The effects of sentence-level processing difficulty are discussed next.We see reliable effects of dependencyhead distance (integration cost) in log outgoing saccade length, but only weak evidence for this complexity metric in the reading time measures.The effect of integration distance on outgoing saccade length is perhaps not surprising: increased distance cost represents greater integration difficulty, which could lead to shorter outward saccades due to greater processing load.Storage cost shows an effect in log total reading times and outgoing saccade length; increased storage cost leads to longer total reading times, and longer outgoing saccades.Since no effect was seen in first-pass reading time, the total reading time result suggests that the storage cost effect is driven by re-reading times.In other words, it seems to be a late-emerging effect.It is difficult to be certain that storage cost does not have any effect in early measures such as first-pass reading time; it is possible that we failed to find a storage effect in these measures due to the relatively small sample size (30 participants; compare this to the Potsdam Sentence Corpus of Kliegl et al. (2006), which had over 200 participants).With a larger sample size, storage cost may well have an effect on early measures.It is interesting that increased storage cost leads to longer outgoing saccades.Although speculative, one possible explanation for this result could be that increased storage cost encourages the reader to look further to the right in order to verify whether the predicted head appears further downstream.This is a possibility worth investigating in a planned experiment.

General Discussion
This study reveals several interesting facts about Hindi sentence comprehension difficulty.A new result, not noticed in previous work on eyetracking corpora from other languages, is that both integration and storage cost impact reading difficulty, but only when we consider so-called late measures (regression-path duration and total reading time) and outgoing saccade length; we did not find strong evidence that the early measure, first-pass reading time, is affected by these variables.
Integration cost estimates the difficulty with which co-dependents are integrated while parsing a sentence.A standard assumption, going back to Just and Carpenter (1992) but more fully worked out by Gibson (2000), and Lewis and Vasishth (2005), is that the greater the dependent-head distance, the greater the difficulty in completing the dependency.The cause for this so-called locality effect could lie in decay (this is how the Dependency Locality Theory explain this, see Gibson, 2000), or in interference or some combination of interference and decay (this is how the cue-based retrieval model of Lewis & Vasishth, 2005 explains it; also see Lewis, 1996).Whatever the underlying explanation, there is clear evidence for locality effects in planned experiments (e.g., Grodner & Gibson, 2005;Bartek, Lewis, Vasishth, & Smith, 2011).However, there are several important counterexamples too; examples are the German studies done by Konieczny (2000), and the experiments involving Hindi by Vasishth and Lewis (2006).Konieczny suggests a variant of the idea that delaying the appearance of a head (effectively increasing head-dependent distance) can facilitate processing if the conditional prob-ability of the head appearing increases with distance (Levy, 2008).The Vasishth and Lewis proposal is that if the intervening material activates the upcoming head, the dependent-head integration could be facilitated to the head being reactivated.It has been suggested by Levy, Fedorenko, and Gibson (2013) that these so-called anti-locality effects may be restricted to head-final languages.Our results show that, while that could be correct, at least in the present Hindi data, when dependency distance is increased, there is some evidence that processing difficulty generally increases.The fact that even a head-final language like Hindi shows effects of integration difficulty is, however, not inconsistent with the finding by Husain, Vasishth, and Srinivasan (2014) that integration effects occur in Hindi when expectation strength is low.These mutually inconsistent effects due to increased dependency distance (facilitation in some published work, greater difficulty in others) suggest that the inter-relationship between expectation and locality effects needs much more study (for a recent attempt that explores the effect of individual differences on locality and expectation effects, see Nicenboim, Vasishth, Gattei, Sigman, & Kliegl, 2015).
The effect of storage cost is also quite interesting.Storage cost characterizes the effort required to maintain predictions of upcoming heads.For example, when reading a main clause, readers may predict an upcoming verb (a storage cost of 1).If a sentence with an embedded clause is read, then the reader would predict two heads (one for the embedded clause, and the other for the main clause), leading to a storage cost of 2. Although some evidence does exist for storage cost (Chen, Gibson, & Wolf, 2005), the present work may be the first eyetracking study using naturally-occurring sentences that investigates this metric.The evidence in favor of storage cost has interesting implications for theories of expectation-based processing.The current view in the field of sentence processing is that the dominant predictor of expectation cost is surprisal: the conditional probability of an upcoming part of speech or word given the left context (Hale, 2001).Our study shows that, at least in this head-final language, the number of expected heads may also play a role.An obvious question this raises is whether surprisal-based expectation has a larger effect size than integration-and storage-cost effects.To answer this question, a probabilistic parser needs to be developed for Hindi, and the surprisal metric computed.This would allow us to investigate the relative effect size of storage vs surprisal cost.We expect to take up this and other issues in future work.

Conclusions
This is, to our knowledge, the first study of Hindi sentence processing difficulty using an eyetracking corpus containing naturally occurring text.We show that the standard so-called "low-level" predictors influence reading time in the expected manner.In addition, we show Appendix A Details of the word complexity metric 1. Vowel diacritic appears to the left of a consonant: The short, unrounded, high front vowel (/i/ इ) when appearing with a consonant is represented as a diacritic ि◌ and precedes the consonant in the text, for example, in िदन /[dI[n/ the vowel ि◌ /I/ precedes the consonant द /[d@/ but is pronounced after the consonant.In effect, the written vowel appears displaced with respect to the point of it utterance.In related work, Vaid and Gupta (2002) found that words with such vowels lead to slower naming latencies and higher naming errors compared to control.In all such cases we posit a complexity cost of 1; for example, the complexity cost of िदन /[dI[n/ would be 1.In addition there is also a cost for the distance of displacement of the vowel, for example in प ब्ल सटी /p@blIsIúi/ there is an additional consonant ब् /b/ between the vowel ि◌ /I/ and the consonant its associated with ल /l@/.In such cases the cost becomes 1+d, where d is the number of intervening consonants between the vowel and the consonant its associated with.Note that this situation will happen in cases where the preceding consonant appears without its inherent vowel.So the total cost for a word like प ब्ल सटी would be 3.5 = 2 (for the ि◌ in ब्ल) + 1 (for the ि◌ in स) + .5 (for the ब् ; see below) 2. Diacritic above a consonant: Although all vowels in Hindi have an independent form, when they combine with a consonant some of them can appear above the consonant.In all such cases (see, table A1) we assume a complexity cost of .5.

Table A2
Diacritics appearing below a consonant ◌ु /u/ ◌ू /U/ ◌़ (see footnote 1) 4. Consonant without inherent vowel: Consonants, when occurring without the inherent vowel, are written with a slightly different form, in many cases the vertical bar associated with the consonant is missing (eg.घ् + ट → घ्ट /g h ú/), while in some cases a special diacritic called halant (◌् ) is added below the character (eg.ट् + क → ट्क /úk/).This will arise when the consonant is part of a conjunct consonant.In all such cases a cost of .5 is assumed.
5. Ligatures and composite characters: Unlike the above cases of conjunct consonants, lack of vowel on one of the consonant can some times cause the ligature to take a complex form (for example, /t/ + /r@/ = त् + र → तर्; /r/ + /t@/ = र् + त → तर् ; /k/ + /t@/ = क् + त → क्).In all such cases a cost of 1 is posited.A cost of 1 is also posited for composite characters such as क्ष /kù@/ and ज्ञ /gy@/.In addition, in cases such as कािनर् वाल /kArnIval/ where /r/ has been displaced further due to the intervening ि◌ /I/, the cost incorporates the distance of displacement.

Table 1
Experiment session groups.

Table 2
Minimum, first quartile, median, mean, third quartile and maximum values of all the predictors.

Table 3
The upper triangular correlation matrix for the predictors.

Table 4
The effect of the predictors on log first-pass reading time, regression path duration.
Notes:The columns present the results of the Bayesian hierarchical linear models; we show the estimated mean effect of each predictor, along with 95% credible intervals.All effects that have intervals excluding 0 are in bold.Int: intercept; sl: syllable length; comp: word complexity; freq: word unigram frequency; bifreq: word bigram frequency; IC: integration cost; SC: storage cost; session: session id.

Table 5
The effect of the predictors on total reading time, and outgoing saccade length.