Phonological and semantic aspects of German intonation

Phonological and semantic aspects of German intonation

Jörg Peters (Oldenburg)


1 Introduction

For more than 25 years phonological descriptions of German intonation have adopted the autosegmental-metrical (AM) framework (Uhmann 1991; Féry 1993; Grabe 1998; Grice/Baumann 2002; Grice/Baumann/Benzmüller 2005). Intonational analyses in the AM framework have dealt with various aspects of intonational function, with a particular interest in information structure including the signaling of focus and thematic relations (e. g. Uhmann 1991; Féry 1993; Baumann 2006; Braun 2005; Féry/Kügler 2008). Whereas for English a number of general models of intonational meaning in the AM framework have been proposed (e. g. Pierrehumbert/Hirschberg 1990; Bartels 1999; Truckenbrodt 2012; Steedman 2014; for an overview, see Büring 2016), general accounts are largely missing for German. The aim of this paper is to give an outline of such an account for Standard German providing a feature-based interpretation of intonational units along the lines of the seminal paper of Pier-rehumbert/Hirschberg (1990).

The phonological description presented here adopts Gussenhoven’s (1983, 2005) AM ap-proach to Dutch intonation, which gave rise to the ToDI annotation (Transcription of Dutch Intonation, Gussenhoven/Terken/Rietveld 2003). This model has been adapted for the analy-sis of Standard German by Peters (2006, 2014, 2016) and Fuhrhop and Peters (2013). Sec. 2 gives an outline of the AM account of the tonal system of Standard German as spoken in north-western Germany. This model differs from the classical AM approach underlying ToBI annotations, such as the German Tone and Break Indices (GToBI; Grice/Baumann 2002; Grice et al. 2005), in a num-ber of aspects, which will be summarized in sec. 2.7. In sec. 3 a feature-based semantic model that derives from earlier accounts is set out. Both the phonologi-cal and the semantic model are largely based on the analysis of conversational speech re-corded from speakers of Standard German (Peters 2006; Peters et al. 2015). As the analysis of intonation contours in a conversational framework would go beyond the scope of this paper the presentation will be limited to fictitious examples.

2 Intonational phonology

2.1 Preliminaries

Intonational phonology deals with the distinctive use of pitch. The speech melodies of single utterances can be characterized as variants of distinct pitch contours, which are described either as configurations, such as falls and rises, or as sequences of discrete tonal units, such as high and low tones, as will be done in the remainder of this chapter. In a wider perspective, the use of intonation involves three aspects of utterance production, which Halliday (1967) has called tonality, tonicity, and tone, and which we refer to as intonational phrasing, accent allocation, and contour choice, respectively.

Intonational phrasing is the division of utterances into intonational phrases (IPs). We assume IPs to be those parts of an utterance that form the domain for a complete intonation contour. Accent allocation involves the decision how many items shall be accented in a given IP and the decision where to locate these accents within the IP. These decisions depend on both information structure and syntactic structure. The information structure determines which constituent shall be focused, whereas syntactic rules determine which syntactic unit needs to be accented in order to project the focus to the whole constituent. Contour choice determines the phonological form of the syntactically allocated accents by selecting particular pitch accent types. In addition, contour choice involves the allocation of boundary tones to the beginning and end of the intonational phrase and possibly other types of phrases.

The following account will focus on contour choice, which is that aspect of intonation that is least dependent on syntactic structure. We will therefore largely ignore issues related to the formation of focus domains, as these depend on the location of pitch accents rather than on the type of pitch accents chosen, even if accent choice may add to focus interpretation, as in the case of I-topicalization (Jacobs 1997) or in models which assume distinct pitch accents for contrastive focus (e. g. Grice et al. 2005). We will also largely ignore the semantic effects of the phonetic realization of tones (e. g. Gussenhoven 2004; Chen 2005; Michalsky 2015a).

2.2 Tonal units

For Standard German, we assume three tone classes: starred tones (H*, L*), accompanying tones (H, L), and boundary tones, or edge tones, which specify the beginning of the IP (%H, %L) or its end (H%, L%). Accompanying tones are called “leading tones” if they precede the starred tone to which they belong. Otherwise they will be called “trailing tones”.

The presence of a starred tone depends on the availability of an accented syllable, and the pitch targets of these tones are usually synchronized with this syllable. When an accented syllable becomes deaccented both the starred tone and any accompanying tone are lost. If an accent moves from one syllable to another, both the starred tone and its accompanying tone move as well. For Standard German we do not assume “phrase accents”, or “phrase tones”, as in the classical model (Bruce 1977; Pierrehumbert 1980; Beckman/Pierrehumbert 1986; Grice/Ladd/Arvaniti 2000) and GToBI (Grice/Baumann 2002; Grice et al. 2005) (see sec. 2.7).

Standard German has four pitch accents, the falling accent (H*L), the high accent (H*), the rising accent (L*H), and the low accent (L*). All these accents may occur in nuclear and in prenuclear position. The presence of the boundary tones depends on the availability of an IP boundary. The final IP boundary may lack a boundary tone. In this case, the end of the pitch contour is specified by the preceding tone, which undergoes tone spreading. When a stretch of pitch is specified by two equal tones it may follow an overall declination trend. When it is specified by a single tone spreading rightwards it resists the declination trend, which results in a plateau, or level pitch, with no or only little declination. (1) illustrates tone spreading by comparing (a) the H*LL% contour (no spreading) with (b) the H*L0% contour (spreading). In (1b), the right arrow indicates the spreading of the low trailing tone and 0% the missing final boundary tone (following Grabe 1998). Accented syllables are underlined and high and low pitch targets specified by the tone sequence are highlighted by black dots.

In both Dutch and German the contrast between H*LL% and H*L0% is usually enhanced by raising the final plateau of H*L0% to lower-mid level. Gussenhoven (2004: 299f., 2005) refers to this contour as the half-completed fall.1

2.3 Nuclear contours

Table 1 lists eight nuclear contours that are commonly used in Standard German. The contours in the same row share the nuclear accent. The contours in the same column share the tonal specification of the final boundary.

Table 1: Nuclear contours of Northern Standard German

For all contours except the Fall we assume tone spreading indicated by the right arrow, which yields three level contours in the right column, where a final boundary tone is missing. In case of the High Rise and the Low Rise the final H% is upstepped after the preceding H tone. The phonetic realization of the nuclear contours can be accounted for by the phonetic implementation rules in (2).



Spreading rule

If a H tone is preceded by another tone within the same nuclear contour that is not part of a pitch accent associating to the same or an adjacent tone-bearing unit, the first tone specifies an additional pitch target before the second tone.


Dissimilation rule (Upstep rule)

If a H tone follows after another H tone within the same nuclear contour, the target of the second tone is raised.

In (3) the application of these rules is illustrated for the Fall-Rise (a) and the High Rise (b).

2.4 Accent modifications

Whereas tonal spreading und dissimilation derive from phonetic implementation rules, a given language may allow to modify pitch accents in systematic ways to express additional semantic meanings (see sec. 3.5). Standard German has at least three accent modifications, which are called Accentual Downstep, Late Peak, and Early Peak (for the latter two cf. Kohler 1991). Accentual Downstep is found both in nuclear and prenuclear H*L, H*, and L*H accents and is indicated here by an exclamation mark (!H*L, !H*, L*!H). It causes a lowering and compression of the pitch range of that part of the utterance that begins with the accented syllable. The extent to which the high target is lowered may vary, which in case of the H*L accent has been described as partial or total downstep (Grabe 1998: 89f., 185–187). The Late and Early Peak result from variation of the synchronization of the f0 peak with the accented syllable. The Late Peak modification moves the nuclear peak to the next syllable if one is available. The Early Peak modification retracts the nuclear peak to the preceding syllable (Kohler 1991; for a detailed account of further dimensions of phonetic variation see Niebuhr 2007).

Following Gussenhoven (2004: 306f.) we represent the late peak modification of H*L by L*HL, which results from prefixing a L tone which occupies the position of the accent tone and moves the H tone rightwards. If one or more syllables follow after the nuclear syllable the nuclear peak occurs on the first postnuclear syllable, as in (4a). If the nuclear syllable occurs in IP-final position, the nuclear peak moves towards the end of this syllable and part of the falling pitch movement is truncated, as in (4b) (the dashed line indicates the pitch contour without the late peak modification).

The Early Peak is often combined with Accentual Downstep. We represent this modification by a prefixed H tone, which leads to HH*L, or H!H*L. (5) gives an illustration of the combination of Early Peak with Accentual Downstep.

Whereas Accentual Downstep may be applied both to prenuclear and to nuclear accents if there is a preceding H tone within the same IP, the Late Peak and Early Peak modification are possibly restricted to nuclear accents. Accentual Downstep may co-occur with both the Early Peak and the Late Peak.

Accentual Downstep can trigger the lowering of final plateaus, as in !H*L0% or !H* 0%. We also use downstep to represent the lowering of the final plateau of the chanted call. In Standard German, this calling contour consists of two plateaus, a high plateau beginning on the nuclear syllable and a lowered plateau beginning on the stressed syllable of the first postnuclear foot. We represent this contour by a rightward-spreading nuclear H* accent and a rightward-spreading downstepped H tone, which aligns with the first postnuclear stress, as illustrated in (6a). (6b) shows that tone spreading also applies when no more than one syllable is available for the second plateau. In this case, even schwa syllables are lengthened to guarantee that a pitch plateau can be realized. (6c) shows a case where a single syllable with a lax vowel is lengthened so that it can be produced with two pitch plateaus (for more variants of the German calling contour see Gibbon 1976: chap. 4.3).

2.5 Prenuclear accents

In prenuclear position the same four pitch accents are attested that occur in nuclear position. There is no restriction for the combination of these accents but there is a tendency to use no more than two types of prenuclear accents within the same IP. Various transitions between successive pitch accents can be observed, which have been characterized by Gussenhoven (1983) as different forms of “tone linking”. We account for these transitions by a different alignment of accompanying tones and the optional use of tone spreading. Table 2 illustrates three types of transitions between the prenuclear falling accent (H*L) and the nuclear fall (H*LL%) for the phrase Maria and Anastasia (for a more detailed account see Peters 2014: 51f.).

Table 2: Transitions between prenuclear H*L and nuclear H*LL% on the phrase Maria und Anastasia (The tones of the prenuclear accent are given in boldface)

2.6 Intonational phrasing

The intonation phrase (IP) is a prosodic constituent in the Prosodic Hierarchy (Nespor/Vogel 2007; Selkirk 1995), which is immediately dominated by the utterance phrase (UP). It is the domain for the production of complete intonation contours. In Standard German, single sentences and even smaller phrases can be subdivided into one or more IPs as illustrated in (7).

Both syntactic and intonational phrasing convey information about information structure. A mismatch between the information conveyed by levels of phrasing may reduce the acceptability of the utterance.

IPs can be identified by both global and local cues. Global cues, such as an overall declination trend, help in deciding whether an utterance is divided over one or more IPs. Local cues help in identifying the beginning and end of an IP. Such cues derive from discontinuities in the time and frequency domain. In the time domain typical cues for IP boundaries are phrase-final lengthening, pauses with or without inhalation, and a switch to allegro style at the beginning of the new IP. In the frequency domain new IPs often start with pitch reset resulting in a higher initial pitch and a higher scaling of pitch accents.

Full IPs are autonomous in the sense that they may occur irrespective of the presence of other IPs. There are also clitic IPs, which require a preceding IP, from which they “copy” the last two tones (Gussenhoven 1990). Following Gussenhoven (2004: 291f.), clitic contours will be represented without an initial boundary tone suggesting that they are expansions of the preceding IP. An illustration is given in (8) with copied tones highlighted in boldface.

GToBI (German Tone and Break Indices, Grice/Baumann 2002; Grice et al. 2005) is an adaption of the classical ToBI (Tone and Break Indices) annotation (Beckman/Ayers 1997; Beckman/Hirschberg/Shattuck-Hufnagel 2005) to Standard German. The model adopted here, which is abbreviated as ToGI (Transcription of German Intonation) in table 3, is an adaption of the AM annotation model ToDI (Transcription of Dutch Intonation) (Gussenhoven 2005; Gussenhoven et al. 2003) and has been applied to English by Gussenhoven (2004: chap. 14–15).

Table 3 illustrates differences between ToBI- and ToDI-style annotations of common nuclear contours. The British School labels for the contours are adopted from O’Connor/Arnold (1973) and Ladd (2008: 91). ToBI annotations are adopted from Beckman/Ayers (1997) and GToBI annotations from Grice et al. (2005).

Table 3: Notation of common English contours. The boxes indicate the position of the nuclear accented syllable

At first view, ToDI/ToGI and GToBI annotations can be derived from ToBI and GToBI annotations by simple translation rules. For example, in the High Fall, the Low Fall, and the Fall-Rise the low phrase accent L- of ToBI and GToBI translates to the trailing tone L of ToDI and ToGI. But the differences illustrated in Table 3 are not just notational differences. They result from differences in the definition of tone classes and differences in the modelling of pitch movements that are linked to accented syllables. (i)–(iii) summarize the three most important differences between ToBI and ToDI, which likewise apply to their adaptions to German, GToBI and ToGI (see also Gussenhoven 2004: 316–319).

(i) Tone classes. In ToDI and ToGI tone classes are defined in purely structural terms, whereas in ToBI systems both structural and phonetic criteria can be used to define tone classes. One example is the trailing tone, which in ToBI is characterized as a tone which belongs to the preceding starred tone and occurs immediately after this tone. In ToDI and ToGI, trailing tones are tones which are structurally dependent on a preceding starred tone irrespective of their timing behavior, which is guided by language-specific implementation rules and may vary from one dialect to the other (see Peters/Hanssen/Gussenhoven 2015 for a discussion). Similarly, Grice et al. (2000) characterize phrase accents as accents (or tones) that derive from the presence of a phrase boundary, which is the final boundary of the intermediate phrase, and that in English are attracted by metrical stress. In ToDI and ToGI, phrase accents, or phrase tones, would be characterized as tones deriving from intermediate phrases irrespective of their timing. As trailing tones are not bound to a position close to the starred tone and may likewise be attracted by metrical stress, there is no need to account for the “elbow” of the Fall or Fall-Rise by a phrase accent. Also note that Barnes et al. (2010) and Peters et al. (2015) have shown for American English and for varieties of Dutch, High German, Low German, and Frisian that there is no evidence that the “elbow” of the nuclear Fall is stress-seeking as suggested by its representation with a phrase accent. More generally, a definition of tone classes without reference to timing allows to account for variation in the phonetic alignment of non-starred tones without the need to assume different tonal structures. If, for example, a given dialect aligns the “elbow” with the first postnuclear metrical stress, whereas a closely related dialect aligns it close to the starred tone, ToBI models would need to represent the two contours distinctly by (L+)H* L‑L% and (L+)H*+L L‑L%, whereas in ToDI and ToGI both contours can be represented by H*LL%, with an implementation rule accounting for the variation in the timing of the trailing tone. The analysis of nuclear falls without phrase accents is paralleled by the analysis of nuclear rises, which in ToDI and ToGI are represented by L*H rather than by L* H-, with H- indicating a high phrase accent or a high phrase tone.

The differences in the representation of nuclear falls and rises derive from a more general difference in the use of bitonal accents, which also affects the representation of prenuclear contours. Whereas ToBI annotations use bitonal accents primarily to account for the pitch movement leading towards an accented syllable adopting the approach by Pierrehumbert (1980), in ToDI and ToGI bitonal accents account for the pitch movement leading off the accented syllable, which is more in line with the British School characterizations of falls and rises as pitch movements starting on the accented syllable (Crystal 1969; O’Connor/Arnold 1973). Gussenhoven (2004: 127–128) characterizes these two approaches as “on-ramp analysis” and “off-ramp analysis”, respectively. The difference between the two analyses is illustrated in (9), with (...) ip indicating intermediate phrase boundaries.

(ii) Upstep of H%. ToBI and ToDI as well as their adaptions to German have an upstep rule accounting for the extra-high level of H% after a high tone in the High Rise and the Low Rise. In ToBI H% is upstepped after a high phrase tone H-. In ToDI, H% is upstepped after a high tone, which may be an accent tone or a trailing tone, as in H* H% and L*HH%, respectively. Note that GToBI differs from ToBI by the notation of upstepped H% as ^H% rather than as H% which is realized extra-high.

(iii) Level contours.  In ToBI, the final high pitch level of the Stylized Fall, the Stylized High Rise, and the Stylized Low Rise results from an upstepped final low boundary tone, leading to H*+LH-L%, H* H-L%, and L* H-L%. In GToBI, the notation of upstepped H% by ^H% allows to account for level contours without recourse to an upstepped L%. Note that Grice et al. (2005) do not distinguish between L-% and L-L% and between H-% and H-H% (for L-% vs. L-L% see Grice et al. 2005, footnote 6). In ToDI and ToGI, the final plateau results from tonal spreading of a single tone, which specifies the beginning and end of the final plateau. As a consequence, there is no need for a final boundary tone. This missing boundary tone is indicated by 0% (after Grabe 1998). The mid-level plateau of the stylized fall results in ToBI and GToBI from a downstepped H tone, triggered by H*+L, whereas in ToDI a phonetic implementation rule guarantees that the low level plateau is raised, which makes it distinct from the final low pitch of the H*LL% contour. In GToBI, the stylized rise and the stylized fall could be represented as L* H-% and H*+L H-%, respectively, but Grice et al. (2005) do not include these contours in their account of Standard German intonation. The ‘stylized step down’ of Grice et al. (2005: 72, 74) is restricted to the use as a calling contour.

3 Intonational meanings

3.1 Basic assumptions

It is a common view that intonation is significant in the sense that the choice of pitch contours contributes to utterance meaning in systematic ways (e. g. O’Connor/Arnold 1973). In the following we propose a feature-based account of intonational meanings for Standard German, which starts from three heuristic assumptions. First, intonational meanings are constant across utterances and sufficiently abstract to account for tonal contrasts across sentence types and different types of speech acts (cf. Gussenhoven 1983). Generally, we do not assume that intended types of speech acts are linked to distinctive contours and that the abstract semantic interpretation of tones varies depending on the speech act performed (see Peters 2014; 53–55, 86). Second, intonational meanings are compositional. In a feature-based approach, complex meanings derive from the meanings of semantic features attached to smaller tonal units. It cannot be ruled out that there are instances of idiomatic meanings attached to whole contours, such as in the case of the chanted call, but we do not start with the general assumption that intonational meanings are attached to whole contours (for a discussion see Liberman/Sag 1974; Sag/Liberman 1975; Cutler 1977; Ladd 1980; Bolinger 1982 and Gussenhoven 1983). Finally, the minimal units that bear abstract semantic features are single tones. These tones may be characterized as “tonal morphemes” in the sense of Gussenhoven (1983). Intonational meanings are linked to tonal contrasts, which are established by the choice of tones, by the presence or absence of tones of a particular tone class, and by accent modifications.

In sec. 3.2, we present tonal contrasts that are established by the choice of pitch accents and boundary tones. In sec. 3.3–3.6, the semantic relevance of these tonal contrasts will be illustrated by comparing intonation contours that differ by one or more contrasting units. These analyses suggest that tones differing by tone class bear semantic features that relate to different aspects, or levels, of communication, which are the mutual belief space, information packaging, conversational structure, thematic structure, conceptual structure, and speaker attitudes. Sec. 3.7 summarizes this view and sec. 3.8 points out the benefits of a semantically motivated model of intonation as outlined in the preceding sections.

3.2 Tonal contrasts

For Standard German, it seems advisable to establish tonal contrasts within each tone class (cf. sec. 2.2). In the following we determine tonal contrasts separately for accent tones, accompanying tones, and IP boundary tones.

In the position of the accent tone, H contrasts with L. This contrast distinguishes both H* from L* and H*L from L*H. There is no need to postulate an additional tonal contrast between the trailing L and H in H*L and L*H. In the system proposed in sec. 1 for Standard German the tone quality of the trailing tone is predictable from the tone quality of the accent tone. After H* the trailing tone is low and after L* it is high. Accordingly, H*L and L*H could be rewritten as H*T and L*T, respectively, with T denoting a tone that is not specified for tone quality. In the position of the trailing tone, the presence of a trailing tone contrasts with its absence, as in H*L vs. H* and in L*H vs. L*. The semantic features proposed in the following sections are intended to account for these tonal contrasts in both nuclear and prenuclear accents. At the initial and final IP boundaries H contrasts with L (%H vs. %L and H% vs. L%). At the final IP boundary, there is an additional contrast between the presence of a boundary tone and its absence, as in H*LL% vs. H*L0%, H* H% vs. H* 0%, and L*HH% vs. L*H0%. Finally, a semantic model has to account for the accent modifications proposed in sec. 2.4, which are accentual downstep (H*L vs. !H*L, H* vs. !H*, L*H vs. L*!H), the Late Peak (H*L vs. L*HL), and the Early Peak (H*L vs. HH*L).

3.3 Nuclear contours2

3.3.1 Fall and Fall-Rise

The Fall (H*LL%) differs from the Fall-Rise (H*LH%) by the tone quality of the final boundary tone. It is a common view that the choice of H% instead of L% signals incompleteness (e. g. Cruttenden 1981; for an overview see Michalsky 2015a: chap. 2). But incompleteness on what level? Further analysis shows that the choice of the final boundary tone refers to the status of the IP within a larger stretch of speech (cf. Pierrehumbert/Hirschberg 1990). The use of final H% in declaratives is largely limited to IPs which do not occur at the end of a conversational exchange. If they do, the conversational exchange appears to be incomplete. For this reason the use of the Fall-Rise may appear less acceptable when the IP is used to conclude an interview, as illustrated in (10).

Whereas in statements the use of H*LH% indicates that the speaker may be expected to continue, the function of H*LH% in questions is less straightforward. On the one hand, H% may indicate incompleteness in the sense that a question needs to be complemented by an answer. On the other hand, it seems possible that H% refers to the completeness of the conversational exchange including the question-answer pair, suggesting that a follow-up question or further comments will follow after the current question has been answered. In the latter case the final H% would make the Fall-Rise suitable for signaling the incompleteness of the ongoing conversational exchange irrespective of whether the contour is used in statements or questions. There is growing evidence, however, that the scaling of H% may provide cues as to whether an utterance is to be interpreted as a statement or a question. As shown by Haan (2002) for Dutch and by Michalsky (2014, 2015a, 2015b) for German, questions in these languages tend to have higher-ending rises than non-final statements and the final pitch level of rising contours serves as a perceptual cue to distinguish questions from statements (on the general issue of telling apart linguistic and paralinguistic meanings of intonation see Gussenhoven 2004: chap. 4 and Ladd 2008: chap. 1.4).

3.3.2 High Rise

The High Rise (H* H%) differs from the Fall (H*LL%) by the missing trailing tone of the nuclear accent. By using H*L the accented unit is classified as being informationally complete. By using H* without the trailing tone the accented unit is classified as being informationally incomplete.

The use of H* instead of H*L in nuclear position is illustrated in (11). In this case, both con-tours of the first IP are equally acceptable but suggest different interpretations. H*L in (11a) suggests that Paul had three visitors, whereas H* in (11b) suggests that Paul had only two visitors. In (11a) meine Schwester and Anastasia are presented as two information units suggesting that the two expressions refer to different persons. In (11b) meine Schwester und Anastasia are presented as parts of a single information unit suggesting that the two expressions refer to the same person.

3.3.3 Low Rise

The Low Rise (L*HH%) differs from the Fall-Rise (H*LH%) by using a high accent tone rather than a low accent tone. As indicated in sec. 3.2, there is no need to assume an additional contrast between the low and high trailing tone as the tone quality of the trailing tone is always opposite to the tone quality of the accent tone. According to Pierrehumbert and Hirschberg (1990: 286), pitch accents of English convey information about the status of discourse referents, modifiers, predicates, and relationships specified by accented lexical items. The H* accent conveys that the items made salient by the accent are to be treated as “new” in the discourse. IPs which contain only H* accents signal “that the open expression is to be instantiated by the accented items and the instantiated proposition realized by the phrase is to be added to H’s mutual belief space” (Pierrehumbert/Hirschberg 1990: 289f.).3 The L* accent, on the other hand, marks items that the speaker wants “not to be instantiated in the open expression that is to be added to H’s mutual beliefs” (Pierrehumbert/Hirschberg 1990: 291). The same distinction turns out to be useful for Standard German. Accordingly, we link the tonal contrast between high and low in the position of the starred tone to the feature ± to be added to the mutual beliefs of speaker and hearer. This feature applies to the information that is conveyed by the accented unit, which in case of a prenuclear accent may be a single object or event referred to by a noun phrase and in case of the nuclear accent may be a an object, an event, or a whole proposition.

In statements, the contrast between H and L in nuclear H*L and L*H shows up in the relation between what is said in the current IP and what has to be added or inferred from the context. Both, statements with H*LH% and statements with L*HH% are incomplete on a conversational level due to H% (see sec. 3.3.1). In addition, L*HH% is incomplete on the level of the communicative relevance of what is said. Whereas the proposition expressed by a statement with H*LH% is to be added to the mutual belief space of the speaker and hearer irrespective of what follows, the proposition expressed by a statement with L*HH% is not to be added to the mutual belief space independently of something else which will be uttered or can be inferred by the hearer. For this reason, L*HH% is preferred over H*LH% when the speaker wants to signal that the proposition expressed is to be extended or modified by a second statement. In (12), the scope of the proposition expressed in the first IP is restricted by the second IP by emphasizing that the proposition in question is only partly true.

In questions, L*HH% signals that more is expected than an answer providing just the information requested. In the case of (13a), for example, a simple yes or no may be a sufficient answer. In (13b), however, a simple yes or no may not be sufficient. Here, the hearer is expected to give more information such as the title of the movie or whether the speaker did enjoy it. In this sense, the conversational exchange in (13b) is not complete. In line with this, Selting (1995, chap. 3) observes that questions with the rising contour are typically used to start a longer conversational exchange.

3.3.4            Low Low Rise

The Low Low Rise (L* H%) differs from the Low Rise (L*HH%) by the missing trailing tone H, and from the High Rise (H* H%) by using L* in place of H*. As in the Low Rise, L* signals that what is said is not to be added to the mutual beliefs of speaker and hearer. In statements with L* H% the proposition realized is not to be added to the mutual belief space independently of something else that will be uttered or can be inferred by the hearer. In questions, the relevance of the question is not restricted to the information requested. Additionally, the missing trailing tone signals that what is said is informationally incomplete, as in the H* H% contour. Both the use of L* in place of H* and the use of L* in place of L*H restrict the uses of this contour considerably, which is in line with its infrequent use in natural conversations (cf. Peters 2006). The difference between L* H% and L*HH% may have consequences similar to those resulting from replacing H*LH% by H* H% in (11), which are illustrated in (14).

The use of L*HH% in the first IP of (14a) suggests that Paul had three visitors: his sister, Anastasia, and Angelique. The use of L* H% in (14b) suggests that Anastasia is the sister of Paul. Hence, Paul had only two visitors: Anastasia and Angelique. It turns out that the use of L* H% in place of L*HH% has the same consequences as the use of H* H% in place of H*LH% in (11). Note that this generalization would be missed in a model which assumes tonal contrasts between whole pitch accents rather than between single tones and between the presence and absence of a tone in a structural position like that of the trailing tone.

3.3.5 Level contours

The level contours H*L0%, H* 0%, and L*H0% signal that what is said is to be taken as an element of a multi-part unit such as a list, in which several objects are represented as instances of the same category (Schiffrin 1994), or as a routine, in which some event is represented as an instance of the same repetitive or uniform activity (cf. Ladd 1978). Prototypical instances of lists are enumerations consisting of ordered numerals, as in (15).

The enumerations in (15) are instances of open lists as the three numerals stand for an unlimited sequence of numbers starting with the first number. They can be distinguished from closed lists, which contain a limited number of list members (cf. Selting 2004). In closed lists the number of list elements is often mentioned in advance, as in (16). The last member of a closed list typically has a nuclear fall with accentual downstep (!H*LL%) rather than a level contour. Non-initial members of closed lists encompassing full IPs usually have a lower pitch level than the preceding one, which may be a result of an overall declination trend or of phrasal downstep.

Level contours are not restricted to members of a list. In Standard German the use of level contours is quite common in single IPs with which an event is presented as part of a routine. When a level contour is used in a conversational activity which is not expected to be part of a routine, such as the offer of something to drink, the offer may appear rude or impolite. (17) illustrates this effect by comparing the use of  H* 0% instead of H* H% when offering a cup of coffee.

Note that impoliteness is not an effect that is linked to a particular contour. In (17b), it results from using a level-contour signaling routine in a situation where routine is not appropriate.

3.4 Prenuclear accents

3.4.1 H*L vs. H* and L*H vs. H*

As in nuclear position, prenuclear H*L and L*H differ from H* and L*, respectively, by signaling informational completeness. This difference is illustrated in (18). In (18a) both sisters are introduced together into the mutual belief space, whereas in (18b) they are introduced separately. The two contours in (18) appear equally acceptable, which is in line with the fact that it is up to the speaker whether the two individuals are introduced in one or two informational chunks. However, when H*L and H* are used to highlight a unit which hardly can be presented in two informational chunks, as in (19) and (20), the use of the two pitch accents is not equally acceptable. Both, the person named Angelique and the elephant are entities that hardly can be divided into two information units. Hence, acceptability decreases if prenuclear H* is replaced by H*L.

3.4.2 H*L vs. L*H and H* vs. L*

As in nuclear position, the use of starred H and L in prenuclear position depends on whether the information conveyed by the accented unit is to be added to the mutual belief space. H*L and H* are used to highlight information that is to be added to the mutual belief space, whereas L*H and L* are used to highlight information that the speaker assumes to be accessible to the interlocutors either by being mentioned before or by inference and as such is at best part of an informational unit that is to be added to the mutual belief space. Hence, the acceptability decreases when H*L or H* is used to highlight accessible information. In (21) H* and H*L seem to be less acceptable on sie (‘she’) than L*, L*H, or no accent, as the object to which sie refers is already introduced by Maria in the preceding question.

3.5 Accent modifications

Accent modifications in Standard German intonation convey information about attitudes of the speaker or hearer towards what is said.

3.5.1 Accentual downstep

In statements, downstep of the nuclear accent adds an aspect of finality: This is what I mean and I don’t want/need to talk about it anymore. For this reason, the use of accentual downstep in statements appears inappropriate when the statement is supplemented by an encouragement to make further comments on the issue, as illustrated in (22b).

In yes-no questions the use of accentual downstep may indicate that the speaker wants nothing more than to get the information requested. In alternative questions, the use of accentual downstep may suggest an exclusive reading (either ... or), as illustrated in (23).

According to (23a), the speaker wants to know whether the hearer is married or divorced without excluding alternatives like living together, being separated, or being a single. The accentual downstep in (23b), on the other hand, suggests that the speaker assumes that the hearer is either married or divorced.

The fact that accentual downstep can restrict a set of alternatives may explain why in some circumstances the use of a downstepped accent does not seem to be suitable in making a kind offer. The offer in (24b) appears to be less polite as the use of !H*L suggests that the speaker leaves the hearer with no other choice than to drink coffee or not. In (24a), on the other hand, the addressee may feel free to express other wishes.

3.5.2 Late peak

The late peak modification adds an aspect of unexpectedness. In statements, the late peak suggests a qualification like ‘This is the case even if you didn’t expect it to be the case’ (cf. Kohler 1991). Accordingly, (25b) can be paraphrased as ‘Anna is from Oldenburg even if you did not expect that’.

In yes-no questions, the late peak suggests that a positive answer may be unexpected to the speaker. Accordingly, (26b) may be interpreted as saying that ‘Is she really from Oldenburg? I didn’t expect that’.

Similarly, in wh-questions the late peak seems to suggest that the issue on which more information is requested is unexpected to the speaker. Accordingly, the Late Peak in the answer of speaker B in (27) suggests a further qualification like ‘Do you really live there?’.

3.5.3 Early peak

The Early Peak marks the information conveyed by the IP as being established or to be expected. According to Kohler (1991) the Early Peak can be used to conclude an argumentation. Baumann and Grice (2006) found the early peak, which they represent as H+L*, to be preferred in scenarios where the referent was predictable from the contextually given schema or frame. Suggesting such a scenario for (28b), the utterance may be paraphrased as ‘She is from Oldenburg and you could have known that’.

The Early Peak is also found in confirmation-seeking yes-no questions. In those questions the early peak often co-occurs with the use of the particle also, as in Sie ist also eine Oldenburgerin? (‘So she is from Oldenburg?’).

3.6 Initial boundary tones

As noted in sec. 3.3.1, final boundary tones convey information about the relation between the current and subsequent IPs. In particular, these boundary tones signal whether the current IP is qualified as a final or non-final part in the ongoing conversational exchange. Initial boundary tones, on the other hand, convey information about the relation of the current IP to the preceding IP. They signal whether the current IP elaborates or expands the topic of the preceding discourse or initiates a new topic. In the latter case, the initial boundary tone signals thematic discontinuity, which may be achieved by starting with a high boundary tone when the preceding IP ends with a low tone (%H after L%) or with a low boundary tone when the preceding IP ends with a high boundary tone (%L after H%).

3.7 Synopsis

In the preceding sections we proposed abstract semantic features which relate tonal contrasts to different levels of communication. The contrast between high and low accent tones refers to the mutual belief space of the speaker and hearer, and the tonal contrast involving the trailing tone refers to informational packaging. The contrast between high and low final boundary tones conveys information about the status of the current IP in the ongoing conversational exchange and thus refers to the conversational structure. The contrast between the initial boundary tones conveys information about the relation between the current IP and the preceding IPs, indicating thematic coherence or lack thereof. The contrast between the presence and absence of final boundary tones conveys information about the conceptualization of single objects or events, which may be presented as self-contained entities or as elements of a list or routine. Finally, the three accent modifications were interpreted with respect to doxastic attitudes towards the proposition realized by a given IP. Table 4 summarizes the tonal contrasts, the tone classes involved, the (minimally) contrasting tones or accents, and the domains of their semantic interpretation.

Tonal contrast

Tone class

Contrasting accents

Semantic domain

H vs. L

Accent tone

H* – L*

H*L – L*H

Mutual belief space

T vs. Ø

Trailing tone

H*L – H*

L*H – L*

Information packaging

H vs. L

IP boundary tone

H% – L%

Conversational structure

%H – %L

Thematic structure

T vs. Ø

H% – 0%

L% – 0%

Conceptual structure

X vs. !X

Pitch accent

H*L – !H*L

H* – !H*

L*H – L*!H

Speaker attitudes

X vs. L*X

H*L – L*HL

X vs. HX

H*L – HH*L

Table 4: Summary of tonal contrasts. T indicates a tone being either H or L. X indicates a pitch accent

3.8 Well-formedness and missing contours

The main purpose of the current description was to give an illustration of a phonological analysis which starts from tonal contrasts between single tones which are interpreted within a feature-based compositional semantics along the lines of Pierrehumbert and Hirschberg (1990).

The current model can also be used to demonstrate the benefits of a semantically motivated model of intonation when compared to a model that proposes an intonational analysis on purely phonological grounds. Take, for example, the lack of nuclear contours such as H* L% and L*HL% from Standard German. In an intonational model which is not semantically interpreted those contours have to be excluded from the set of possible contours of German by a rule that qualifies them as ill-formed, such as the no-slump rule proposed by Gussenhoven (2004: 301) to account for the missing H* L% contour in English. In the present account, the absence of H* L% and L*HL% from Standard German intonation can be explained by a mismatch between the semantic features attached to tonal components of these contours. In the case of H* L%, H* signals informational incompleteness of the accented unit due to the missing trailing tone. L%, on the other hand, qualifies the current IP as a possible closing-unit of the current conversational exchange. Hence, H* L% signals incompleteness and completeness at the same time. Even if H* and L% refer to different levels of communication there will be hardly any opportunities in the conversational exchange for using this contour. The second contour, L*HL%, likewise bears two semantic features that do not match well. Here, the possible completeness on the conversational exchange signaled by L% is in conflict with the lack of independence on the level of the formation of a mutual belief space signaled by L*H. Whereas L% qualifies the current IP as a possible closing-unit of the current conversational exchange, L*H signals that what is said needs to be complemented or elaborated on further. Again, there will be few opportunities for using this contour in natural conversations.

Compared to rules of well-formedness the recourse to semantic features has the further advantage that it allows to account for differences in the frequency of use of particular contours. L* H%, for example, is much less frequent than H*LL% in natural conversations, which may be explained by the fact that L* H% restricts the use of the respective utterance much more than H*LL% does. In contrast to H*LL%, L* H% is bound to IPs that occur non-finally in a conversational exchange, which are informationally incomplete, and which are to be complemented or elaborated on in the further discourse. In general, the recourse to semantic features in place of well-formedness rules frees us from the need to make a clear distinction between contours that belong to the inventory of a given language and those that do not. There may be, for example, contours whose usage is restricted by their semantic features to such a degree that these contours will be rarely used by some speakers and totally avoided by others. One example may be the application of accentual modifications such as the Late Peak with the Fall-Rise or the combination of Downstep and the Late Peak to the Fall-Rise, or the combined use of Downstep and Late Peak.

4 Conclusion

In this paper, a phonological analysis of Standard German intonation has been proposed along the lines of the AM model developed by Gussenhoven (1983, 2005) for Dutch and English. This model differs from classical AM approaches like those underlying the ToBI and GToBI annotations by a definition of tone classes in purely structural terms (see sec. 2.7).

In sec. 3 a feature-based compositional model of intonational meaning has been proposed. This model suggests that the distinctive use of the most common pitch contours of Standard German can be accounted for by a small set of abstract semantic features that are ascribed to single tones of different tone classes rather than to pitch accents or larger tone configurations. This account allows for generalizations that would be missed when linking semantic features to tonal units larger than single tones. Finally, the proposed model has been used to demonstrate the benefits of a semantically motivated model of intonation when it comes to the explanation of the absence or infrequent use of particular intonation contours.


Barnes, Jonathan et al. (2010): “Turning points, tonal targets, and the English L-phrase accent”. Language and Cognitive Processes 25: 982–1023.

Bartels, Christine (1999): The intonation of English statements and questions. A compositional interpretation. New York: Garland.

Baumann, Stefan (2006): The intonation of givenness: Evidence from German. Tübingen: Niemeyer.

Baumann, Stefan/Grice, Martine (2006): “The intonation of accessibility”. Journal of Pragmatics 38: 1636–1657.

Beckman, Mary E./Ayers, Gayle (1997): Guidelines for ToBI labelling. [19.08.2017].

Beckman, Mary E./Hirschberg, Julia B./Shattuck-Hufnagel, Stefanie (2005): “The original ToBI system and the evolution of the ToBI framework.” In: Jun, Sun-Ah (ed.): Prosodic typology. The phonology of intonation and phrasing. Oxford, Oxford University Press: 9–54.

Beckman, Mary E./Pierrehumbert, Janet B. (1986): “Intonational structure in Japanese and English”. Phonology Yearbook 3: 15–70.

Bolinger, Dwight (1982): “Intonation and its parts”. Language 58/3: 505–533.

Braun, Bettina (2005): Production and perception of thematic contrast in German. Oxford etc.: Lang.

Bruce, Gösta (1977): Swedish word accents in sentence perspective. Lund: Gleerup.

Büring, Daniel (2016): Intonation and meaning. Oxford: Oxford University Press.

Chen, Aoju (2005): Universal and language-specific perception of paralinguistic intonational meaning. Utrecht: LOT.

Cruttenden, Alan (1981): “Falls and rises: meanings and universals”. Journal of Linguistics 17: 77–91.

Crystal, David (1969): Prosodic Systems and Intonation in English. Cambridge: Cambridge University Press.

Cutler, Anne (1977): “The context-dependence of intonational meanings”. In: Beach, Woodford/Fox, Samuel/Philosoph, Schulamith (eds.): Papers from the 13th regional meeting. Chicago, Chicago Linguistic Society: 104–115.

Féry, Caroline (1993): German intonational patterns. Tübingen: Niemeyer.

Féry, Caroline/Kügler, Frank (2008): “Pitch accent scaling on given, new and focused constituents in German”. Journal of Phonetics 36: 680–703.

Fuhrhop, Nanna/Peters, Jörg (2013): Einführung in die Phonologie und Graphematik. Stuttgart: Metzler.

Gibbon, Dafydd (1976): Perspectives of intonation analysis. Frankfurt a. M.: Lang.

Grabe, Esther (1998): Comparative intonational phonology: English and German. Nijmegen: University of Nijmegen. (= Nijmegen: MPI Series in Psycholinguistics 7).

Grice, Martine/Baumann, Stefan (2002): „Deutsche Intonation und GToBI“. Linguistische Berichte 191: 267–298.

Grice, Martine/Ladd, D. Robert/Arvaniti, Amalia (2000): “On the place of phrase accents in intonational phonology”. Phonology 17: 143–185.

Grice, Martine/Baumann, Stefan/Benzmüller, Ralf (2005): “German intonation in Autosegmental-Metrical Phonology”. In: Jun, Sun-Ah (ed.): Prosodic typology: The phonology of intonation and phrasing. Oxford, Oxford University Press: 55–83.

Gussenhoven, Carlos (1983): A semantic analysis of the nuclear tones of English. Indiana: Indiana University Linguistics Club. [19.08.2017].

Gussenhoven, Carlos (1990): “Tonal association domains and the prosodic hierarchy in English”. In: Ramsaran, Susan (ed.): Studies in the pronunciation of English. A commemorative volume in honour of A. C. Gimson. New York, Routledge: 27–37.

Gussenhoven, Carlos (2004): The phonology of tone and intonation. Cambridge: Cambridge University Press.

Gussenhoven, Carlos (2005): “Transcription of Dutch intonation”. In: Jun, Sun-Ah (ed.): Prosodic typology: The phonology of intonation and phrasing. Oxford, Oxford University Press: 118–145.

Gussenhoven, Carlos et al. (2003): ToDI: transcription of Dutch intonation. 2nd ed. [19.08.2017].

Haan, Judith (2002): Speaking of questions: An exploration of Dutch question intonation. Utrecht: LOT.

Halliday, Michael A. K. (1967): Intonation and grammar in British English. Berlin: de Gruyter.

Jacobs, Joachim (1997): „I-Topikalisierung“. Linguistische Berichte 168: 91–133.

Kohler, Klaus J. (1991): “Terminal intonation patterns in single-accent utterances of German: phonetics, phonology and semantics”. Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung der Universität Kiel (AIPUK) 25: 15–185.

Ladd, D. Robert (1978): “Stylized intonation”. Language 54: 517–539.

Ladd, D. Robert (1980): The structure of intonational mea ning. Evidence from English. Bloomington: Indiana University Press. [= rev. PhD thesis Cornell University 1978].

Ladd, D. Robert (2008): Intonational Phonology. 2nd ed. Cambridge: Cambridge University Press.

Liberman, Mark/Sag, Ivan (1974): “Prosodic form and discourse function”. In: La Galy, Michael W./Fox, Robert A./Bruck, Anthony (eds.):Papers from the Tenth Regional Meeting, Chicago Linguistic Society, April 19–21, 1974. Chicago, The Society: 416–427.

Michalsky, Jan (2014): “Scaling of final rises in German questions and statements”. In: Proceedings of Speech Prosody 7, Dublin, May 20 23: 978–982.

Michalsky, Jan (2015a): Frageintonation im Deutschen. Zur intonatorischen Markierung von Interrogativität und Fragehaltigkeit. Berlin/Boston: Mouton de Gruyter. (= Linguistische Arbeiten 566).

Michalsky, Jan (2015b): “Pitch scaling as a perceptual cue for questions in German”. In: Proceedings of Interspeech 2015, September 6–10, 2015.

Nespor, Marina/Vogel, Irene (2007): Prosodic phonology. 2nd ed. Berlin: de Gruyter.

Niebuhr, Oliver (2007): Perzep tion und kognitive Verarbeitung der Sprechmelodie: Theoretische Grundlagen und empirische Untersuchungen. Berlin: de Gruyter.

O’Connor, Joseph D./Arnold, Gordon F. (1973): Intonation of colloquial English: A practical handbook. 2nd ed. Vol. 1. London: Longman.

Peters, Jörg (2006): Intonation deutscher Regionalsprachen. Berlin/New York: de Gruyter.

Peters, Jörg (2014): Intonation. Heidelberg: Universitätsverlag Winter. [= KEGLI 16].

Peters, Jörg (2016): „Intonation“. In: Duden-Grammatik. 9. Aufl. Mannheim, Duden-Verlag: 95-128.

Peters, Jörg/Auer, Peter/Selting, Margret (2015): „Untersuchungen zur Struktur und Funktion regionalspezifischer Intonationsverläufe im Deutschen“. In: Kehrein, Roland/Lameli, Alfred/Rabanus, Stefan (eds.): Regionale Variation des Deutschen. Projekte und Perspektiven. Berlin/Boston, de Gruyter: 53–80.

Peters, Jörg/Hanssen, Judith/Gussenhoven, Carlos (2015): “The timing of nuclear falls: Evidence from Dutch, West Frisian, Dutch Low Saxon, German Low Saxon, and High German”. Laboratory Phonology 6: 1–52.

Pierrehumbert, Janet B. (1980): The phonology and phonetics of English intonation. Indiana: Indiana University Linguistics Club.

Pierrehumbert, Janet B./Hirschberg, Julia (1990). “The meaning of intonational contours in the interpretation of discourse”. In: Cohen, Philip R./Morgan Jerry/Pollack Martha E. (eds.): Intentions in communication. Cambridge/MA, MIT: 271–311.

Sag, Ivan/Liberman, Mark (1975): “The intonational disambiguation of indirect speech acts”. In: Grossman, Robin E. et al. (eds.): Papers from the Eleventh Regional Meeting, Chicago Linguistic Society, April 18-20, 1975. Chicago, The Society: 487–497.

Schiffrin, Deborah (1994): “Making a list”. Discourse Processes 17: 377–406.

Selkirk, Elisabeth O. (1995). “Sentence prosody: Intonation, stress, and phrasing”. In: Goldsmith, John A. (ed): The handbook of phonological theory. Oxford, Blackwell: 550–569.

Selting, Margret (1995): Prosodie im Gespräch. Aspekte einer interaktionalen Phonologie der Konversation. Tübingen: Niemeyer.

Selting, Margret (2004): „Listen: Sequenzielle und prosodische Struktur einer kommunikativen Praktik – eine Untersuchung im Rahmen der Interaktionalen Linguistik“. Zeitschrift für Sprachwissenschaft 23: 1–46.

Steedman, Mark (2014): “The surface-compositional semantics of English intonation”. Language 90: 2–57.

Truckenbrodt, Hubert (2012): “Semantics of intonation”. In: Heusinger, Klaus/Maienborn, Claudia/Portner, Paul (eds.): Semantics. An international handbook of natural language meaning. Berlin, de Gruyter: 2039–2969. [= Handbooks of linguistics and communication science 33.1].

Uhmann, Susanne (1991): Fokusphonologie. Eine Analyse deutscher Intonationskonturen im Rahmen der nicht-linearen Phonologie. Tübingen: Niemeyer.


1 Note that the half-completed fall proposed by Gussenhoven (2004) for English has a right-aligned trailing tone, whereas the plateau observed in the Standard German realization of this contour suggests a left-aligned trailing tone that spreads rightwards. back

2 Most examples in sec. 2 and 3 are adopted from Peters (2014) or Peters (2016). back

3 Bartels (1999) ascribes this aspect to the phrase accent rather than to the accent tone. Truckenbrodt (2012) presents an alternative analysis by recasting the analysis of H* by Pierrehumbert and Hirschberg (1990) in terms of salient propositions, a concept borrowed from Bartels (1999). back