Tracing a standard language in Austria using methodological microvariations of Verbal and Matched Guise Technique

Based on the key hypothesis, that there are heterogeneous conceptualizations of “standardness” within the German speaking countries, this paper both methodologically and empirically tangles main aspects on “standard in Austria” from the perspective of perceptual variationist linguistics. Two series of comprehensive listener judgment tests based on 536 informants, considering different sociolinguistic parameters and assumptions on model speakers, indicate shiftings away from competing (country-specific) conceptualizations towards heterogeneous dimensions of “standard in Austria” with complex evaluative patterns.


Introduction
This paper sets out to critically review, discuss, and ultimately disentangle the complex phenomenon of (spoken) standard language use in the German-speaking context in general, and of an "Austrian standard" in particular, from the perspective of "perceptual variationist linguistics", or the study of laypersons' perceptual and attitudinal linguistic evaluations. 1 In the following, we begin by outlining why and how exactly the phenomenon "standard language" is so complex in the present context (section 2). Subsequently, we review the current state of research on standard in Austria under a layperson perspective (section 3), describing central findings from attitude and perception studies. This serves as a starting point for the presentation of our methodology used, and specifically of the research design of our listener judgment tests (section 4). There, we present our data, drawn from more than 500 informants who participated in a test series featuring several methodological (micro)variations regarding both the verbal guise technique (VGT) and the matched guise technique (MGT). We conclude the paper in section 5 with a critical discussion of our findings and methodology, and an outlook on where to go from here. 2 2 "Standard language" in the context of Austria -a highly complex phenomenon Any discussion of standard language use in Austria with laypersons inevitably gives rise to numerous designations for the phenomenon (connected to various conceptualizations), of which "High German" (Hochdeutsch) is verbalized more often than any other (cf. Koppensteiner/Lenz 2017: 65-68). To a certain extent, such terminological and also conceptual heterogeneity is actually reflected within the scientific community as well, considering that different terms are at times used for the notion of "standard German" by different researchers (cf. Soukup/Moosmüller 2011: 40, fn 3). To navigate these complexities, when we speak of standard (and standard in Austria in particular) in the present article, this term simply acts as "placeholder" for concrete yet perhaps differing attitudinal conceptualizations among laypersons when they are referring to something like the ("highest") high variety. In sociolinguistic theory, this variety is imagined as situated at the apex of a "diaglossic pyramid model", the basis of which is made up by local dialects, and in which "intermediate varieties" (regiolects) form a vertical continuum between dialect and standard (cf. Chambers/Trudgill 1980: 10f.;Auer 2005, and for German, Lenz 2010). Yet, in practice, especially the term standard (language) is first and foremost an academic linguistic projection that has, as we have seen time and again, minor (if any) lifeworld relevance and meaning for non-linguists in the German speaking area.
In this vein, closer perceptual and attitudinal examination of Hochdeutsch "High German" and its presumed differentiations typically yield heterogeneous findings. 3 This becomes evident, for example, in answers to questions such as where and by whom reines "pure", korrektes "correct", or schönes "beautiful" Hochdeutsch "High German" actually is spoken, 4 and how this (stereotypical) register correlates with or is distinguished from other kinds of (near-)standard registers (e. g., typically realized by prototypical speakers such as [Austrian] newscasters; cf. Koppensteiner/Lenz 2017). 5 In short, with regard to conceptualization(s) of "standard in Austria" from a lay perspective, there are still numerous pieces of the puzzle to be found and put together. 6 What we know so far is, for one, "that speakers do conceptualize as well as perceive and evaluate 'standard' and 'dialect' as distinguishable entities and use them as contextualization resources in a differentiated way" (Soukup 2009: 42;emphasis in original). Further, (supra-regional) comprehensibility seems to be a central, because recurringly mentioned, (functional) feature of "standard in Austria" for its speakers.
Still, "researchers usually find their informants to hold rather vague and ambivalent notions, if any at all, regarding a specifically Austrian standard language usage" (Soukup/Moosmüller 2011: 41; emphasis in original). As already mentioned, the ways in which "standard in Austria" can be grasped terminologically varies on both the lay and scientific level (cf. fn 7). Thus, it is not clear from a linguistic perspective how "standard in Austria" differs from other standard(-oriented) varieties in other German speaking countries (cf. Soukup/Moosmüller 2011;Koppensteiner/Lenz 2017), let alone from supra-national forms of "an abstract kind of standard language detached from all situational and national / regional conditions" 7 (Kaiser 2006: 242;own translation). In addition, questions concerning areal-national and social positioning of the (or: an) Austrian standard are yet to be satisfactorily addressed. This concerns especially the linguistic and sociolinguistic relationship between the (or: an) Austrian standard on the one hand and other German standards in other countries on the other from an attitudinalperceptual perspective.
In light of the host of complexities just listed, and on the evidence gathered so far in existing studies, the present paper, in its investigation of the phenomenon "standard language" in Austria, explicitly builds on and commits to the key hypothesis that we are dealing with heterogeneous conceptualizations of "standardness" within the German speaking countries. These conceptual differences are manifest in (at least) a) the particular sociolinguistic parameters ascribed to "standard", such as e. g. "social force", b) the particular relations / weighting of these parameters, as well as c) the particular type of speech / speaker that represents these parameters best.
In our present paper, we propose to make a start by focusing on two sociolinguistic aspects to capture and describe the particularities of the notion of "standardness" in the Austrian context: a (lay linguistic) ideal of (language) "pureness" and (alleged) "model speakers". The former is derived from previous research findings indicating the presence of a certain "ideology of pureness" (cf. section 3.4). The latter parameter draws on and explores the assumption that especially newscasters (of certain media formats) are said to be model speakers with regard to standard, at least for the German speaking area (cf. section 3.3).
By way of investigating these two parameters, the central goal of our paper is thus to present empirical evidence on reigning "standard language ideologies" (Milroy 2001) in Austria, addressing how Austrians perceive and conceptualize standard or near-standard varieties of the German language spectrum. The particular methodology we deploy to this end are a series of listener judgment tests using the VGT and MGT in various configurations and permutations, involving changing of task types, question labeling, text types, and speakers (regarding their ISSN 1615-3014 50 origin and professional training). 8 The idea is that such design permutation and subsequent comparison of results allows us to isolate the specificities and relations of the parameters "pureness" and "model speaker" in attitudinal-perceptual evaluations.
Incidentally, with this study design, we also contribute significantly to a methodological discussion in attitudinal and perceptual research at large, which concerns the effects of different aspects of the research design and context on outcomes. While typical studies in this area are only able to implement one particular set-up, we created a total of five trials, manipulating one particular design aspect between each. Thus, the following parameters of evaluation were tested in a total of two series of listener judgment tests, "Series One" and "Series Two": This section focuses on the state of language attitude research in Austria, with an emphasis on attitudes towards the standard language. The following overview provides the foundation for our own perceptual linguistic studies (cf. section 4), which significantly build on previous attitudinal-perceptual results from Austria with regard to their methodological approach.
A comprehensive review of the theories, definitions, and methods of language attitude research at large is beyond our scope but note that quite a variety of definitions of the object of study can be found (cf. Garrett 2010;Niedzielski/Preston 2000;Preston 2010). Often, a tripartite model of "attitude", featuring a cognitive, an affective-evaluative and a conative component, is assumed, with perhaps a focus on certain parameters within (cf. Purschke 2018: 245). 9 However, the "reduction" of attitudes to these components has their own problems (cf. Preston 2010: 8). In tribute to the conceptual and definitional complexity of this field of research, Preston (2017: 17), for one, has actually suggested the term "language regard" to refer more comprehensively to the study of "the cognitive foundations of language attitudes, metalinguistic ISSN 1615-3014 51 beliefs about language, and language ideologies" -a circumscription of the area of study that in fact fits well with our current purposes. 10 3.1 Areal and social parameters of "standard in Austria" To date, there are few studies available that deal with the regional location of German standard varieties in Austria from an attitudinal perspective. Such results often situate "standard in Austria" in urban regions, most prominently in the capital city of Austria, Vienna, which is located in the East-Central Bavarian area (cf. Fig. 1). Interview data and listener judgment tests (collected in the cities Vienna, Graz, Salzburg and Innsbruck), suggest that "higher social classes" ("höhere soziale Schichten") of that city speak "High language" ("Hochsprache") with a presumed supra-regional range (cf. Moosmüller 1991: 21f.; own translation). 11 The finding of such prestige-focused evaluation is replicated by Steinegger (cf. 1998: 377), whose written questionnaire data (comprehensively collected across Austria) indicate a correlation of "standardness" with cultural and educational centers like Vienna and Salzburg (the capital city of the federal state of the same name, cf. Figure 1). 12 These attitudinal findings with country-wide reach, while scarce, are supported by a substantially higher number of non-attitudinal, variationist studies. These state some kind of "influential force" (i. e. "reputation" in the broader linguistic sense), especially of Vienna. 13 The sociolinguistic importance of the capital of Austria is also shown by perceptual analyses: Moosmüller (1991), for example, indicates that certain inhabitants of Vienna are more frequently perceived as speaking some kind of "high language" than inhabitants of other Austrian cities (cf. Moosmüller 1991: 27). When these regionally oriented results were complemented by further social evaluations (e. g. social stratification), Salzburg was added to this picture as well (cf. Moosmüller 1991: 28f.). 14 Soukup sums up the complex situation on (perceived) "standard in Austria", as also sketched by Moosmüller (1991): Thus, while Vienna can be assumed to function as a center for national standard-setting, influencing provincial capitals such as Graz, Linz, and Salzburg, these capitals as well as smaller regional centers in turn function as standard-setting focal points for their own local periphery. What may be perceived as 'standard' in Innsbruck, for example, may thus not be perceived as 10 Amongst other definitions that influence this article's perspective, Purschke (2015: 49) conceptualizes attitudes as "relevance-driven targeting and evaluation routines on a high level of activation that sediment in an individual's stock of knowledge and are situationally (re)constructed in interaction" -a view he embeds within a broader theory of listener judgments (Purschke 2011). For further review of concepts and methodologies in language attitude research in general, see also Soukup (2019). 11 However, there seem to be differences between Eastern and Western Austria, as interviewees from Western regions of the country "located the standard in other parts of Austria as well" (Soukup/Moosmüller 2011: 40). 12 Salzburg is the fourth-largest city in Austria, with approx. 154.000 inhabitants, and famous for the Salzburg Festival (Salzburger Festspiele), one of the biggest cultural events in Austria, taking place every summer. 13 This "Viennese influence" is discussed on several linguistic levels, e. g. in Clyne (1995b); Ebner (1988Ebner ( ), (2008; Hornung (1999); ÖWB (2012); Wiesinger (2014) and Wolf (1994). From an international perspective, capital cities are often said to play special roles with regard to -also perceived -(near-)standard usage, e. g. Oslo for the Norwegian context (cf. Thelander 2011), whereas there are also completely diverging situations, e. g. Helsinki for the Finnish context (cf. Nuolijärvi/Vaattovaara 2011). 14 However, this was not the case for Innsbruck and Graz in cases of perceived (salient) presence of regionallyspecific characteristics (cf. Moosmüller 1991: 29).
'standard' in Vienna. Standard Viennese speech, however, appears to have at least some linguistic and iconic currency as a super-regional norm. (Soukup 2009: 39) Both in inner-city contexts and even more so in rural areas, standard usage in Austria seems to be of pronounced domain-specificity. Its typical domains include more formal/public situations of interaction, e. g. with public officials of civil services, in schools and other educational facilities, in the media (news, TV, radio), with foreigners (cf. Wiesinger 1992Wiesinger , 2014Steinegger 1998;Soukup 2009;Soukup/Moosmüller 2011;Kleene 2017). Yet, standard also serves to (allegedly) "increase one's social profile" 16 (cf. Steinegger 1998: 372; own translation). By contrast, standard registers do not seem to play an important role in run-of-the-mill language use. In everyday situations, as previous studies state, standard registers seem to be used on fairly rare occasions (cf. Steinegger 1998;Ender/Kaiser 2009;Soukup/Moosmüller 2011;Wiesinger 2014;Kleene 2017).
Evaluations of actual speakers largely fall in step with general evaluations of language in Austria. Thus, using speaker evaluation experiments, Soukup finds that "standard speakers were perceived as more polite, intelligent, educated, gentle, serious, and refined, but also as sounding more arrogant" in comparison with dialect speakers (Soukup 2009: 127). 17 Based on their own review of attitudinal-perceptual research, Soukup and Moosmüller (2011) recap the areal and social parameters associated with "standard in Austria" as follows: We can therefore conclude that Standard Austrian German is generally seen as a 'non-dialectal' variety spoken by the educated people from the middle-Bavarian [i. e. 'Central-Bavarian' in Figure 1; ANL/WK] region […], meaning that, while it should not show any salient regional features, it does in reality have a middle-Bavarian [i. e. 'Central-Bavarian', cf. above comment; ANL/WK] basis with respect to non-perceptually salient aspects of phonology, phonetics, and prosody. (Soukup/Moosmüller 2011: 41) Given that their assessment still seems to hold today, we return to this statement as an input hypothesis in the context of our own survey (cf. section 4).

Austrian standard versus German German standard? Aspects of pluricentricity
Another aspect of the complexities surrounding "standard in Austria" concerns the discussion of the relationship to the "standard in Germany". This is, of course, due to the fact that Austria and Germany (and other German-speaking countries and areas) are part of a pluricentric sociolinguistic context, 18 i. e. a context in which a standard language (in our case, the German standard language) has spread over various "centers" 19 that belong to/represent different countries. 20 Clyne (1995a: 22) assumes the German-speaking area to be a case of "asymmetrical pluricentricity", in which Germany is hierarchically classified as "dominant center" (cf. also Auer 2013). This (alleged) asymmetry between the "centers" of the German-speaking areas has in fact been found to be reflected in speakers' attitudes (cf. Clyne 1995a, Schmidlin 2011, Ammon/Bickel/Lenz 2016: LII).
Some themes raised by the "pluricentricity" discourse, and influential extra-linguistic factors that oscillate around Austria's "relationship to Germany" and its "(global) economic integration", including factors like economic development, mobility, tourism, as well as media consumption, time and again make for an emotion-, identity-and ideology-driven discussion in this context (cf. Moosmüller 1991;Steinegger 1998;Pfrehm 2009;Wiesinger 2015), which is, however, beyond the scope of our paper. 21 Yet, these influential factors need to be considered when dealing with language attitudinal aspects and especially "standard in Austria" from a speaker's and listener's perspective. 22 to certain varieties: "functional prestige" was assigned to dialect speakers as well, a result that was not observed by Moosmüller (1991). 18 On the concept of pluricentricity/pluriareality in general cf. e. g. Ammon (1995); Clyne (1989); Schmidlin (2011), and especially Auer (2013 Leitner 1992). 21 For an overview see e. g. Glauninger (2013); Soukup/Moosmüller (2011);Steinegger (1998). 22 Research efforts of our project, thus, also include the exploration of laypersons' equivalents to the linguistic concept of pluricentricity, especially with regard to standard varieties. The necessity to differentiate linguistic and Comprehensive analyses regarding the evaluation of co-existing linguistic variants from Germany, Austria and Switzerland currently only exist in Schmidlin (2011), who places a special focus on (written) lexical items. Her results support the hypothesis that "pluricentricity" (in its linguistic-scientific understanding only -layperson's understanding might and probably does deviate from that) is not mapped in this respect within non-linguists (cf. also Steinegger 1998;Kaiser 2006;Pfrehm 2009). Another study on "standardness" within the written domain was conducted by Pfrehm (2009), who let German and Austrian participants rate lexical items (18 per "variety") on a four-point scale (umgangssprachlich / nicht standardsprachlich vs. standardsprachlich; i. e. "colloquial / non-standard" vs. "standard"; own translation). He concluded "that the Austrians regard the 18 ASG [i. e. Austrian Standard German; ANL/WK] items presented in the quantitative questionnaire as standard, but not as standard as the 18 GSG [German Standard German; ANL/WK] items, and decidedly less standard than the Germans regard their own 18 words." (Pfrehm 2009: 90; emphasis in the original.) This-in addition to the areal and social parameters (cf. above) -once more suggests that there exist transnational evaluation differences. 23

Newscasting in Austria -Sociolinguistic background
Dealing with "standardness" of German, the media-related sphere has to be taken into account as well. The hypothesis is, that media (formats) with high communicative range may (indirectly) propagate certain models of "standard". Thus, they particularly have to be taken into account in studies on language attitudes. 24 In Ammon's model of a "social force field" of a standard variety" 25 ("Soziales Kräftefeld einer Standardvarietät"; Ammon 1995: 80; own translation), such media clearly represent so-called "model writers" (Modellschreiber) and "model speakers" (Modellsprecher). As the focus of our paper is on "spoken Standard in Austria", further aspects with regard to "model speakers" in Ammon's diction need to be considered.
The most prominent role in newscasting in Austria is taken up by the ORF (Österreichischer Rundfunk), the state-owned, public Austrian broadcasting station. Its main TV channels ORF1 lay levels of data need to be considered in this approach, as always (cf. Herrgen 2015; Lenz 2003Lenz , 2011Mattheier 1983). 23 In comparable tasks, similar results were found, e. g. cf. Ammon (1995); with regard to language attitudinal studies in Austrian schools; see also  and de Cillia/Ransmayr (2015). Project part PP10 of the SFB "German in Austria" deals with perceptions of and attitudes towards varieties and languages at Austrian schools, too, cf. DiÖ (2017). 24 This is not an Austria-specific situation (cf. Soukup/Moosmüller 2011) but the case in numerous countries, see e. g. Coupland/Kristiansen (2011) for Denmark, Garrett/Selleck/Coupland (2011) for England or Östman/Mattfolk (2011) for Finland, Vandenbussche (2010) for Flanders. In general, today, mass-media influence (not restricted to standard language) has to be taken into consideration in all sociolinguistic analysis: "[M]odern media are increasingly flooding our lives with an unprecedented array of social and sociolinguistic representations, experiences and values, to the extent (to put the case negatively) it is inconceivable that they have no bearing on how individuals and communities position themselves and are positioned sociolinguistically" (Coupland/Kristiansen 2011: 31). and ORF2 together reach an unmatched market share of 48.6%). 26 This sheer market dominance and unique official position is reflected attitudinally-perceptually: Language attitude research also shows that Austrians do commonly associate standard language with the Austrian broadcast media -at least in the context of supra-regional distribution, and specifically in connection with news-casting. (Soukup/Moosmüller 2011: 42) Results of two Austrian-wide questionnaire surveys in 1984/1985and 1991(cf. Steinegger 1998 show that over 80% of the participants actually "demanded" that radio and TV broadcastings use of (undifferentiated) "High German" (Hochdeutsch) on air (Wiesinger 2014: 95).
The particular type of speech ORF newscasters produce has been subjected to (phonetic) linguistic analyses as well, indicating a language behavior that is on the one hand conform to a standard norm, yet to some extend hybrid on the other hand. This is probably caused by a heterogeneity in the sources used in the speech training of newscasters, varying codices of reference (e. g. Kleiner/Knöbel/Dudenredaktion 2015; Siebs 1958) as well as a joint pronunciation database by the ORF and a number of German broadcasting stations (cf. Moosmüller 1991Moosmüller , 2015Soukup/Moosmüller 2011;Moosmüller/Schmid/Brandstätter 2015). 27 Former (rather) normative guidelines (set by persons expressly in charge of monitoring and coaching "adequate" language use within e. g. ORF newscasts) seem to have been reduced, favoring the development of a rapprochement to "actual usage by presumed 'standard speakers' in non-professional contexts" (Soukup/Moosmüller 2011: 43). This "cultivated, yet less pronounced High German" should serve the purpose "to accommodate to listeners" (Wiesinger 2014: 95; own translation). 28 The amount of linguistic publications on the ORF and its speech training and language policy is quite small. 29 In order to complement the above-sketched "outside-view" to a certain extent, Austrian ORF newscasters were interviewed. 30 Statements suggest, that supra-regional broadcasts throughout Austria aim at high common comprehensibility, not necessarily ruling out any "idiomatic peculiarities", in favor of an orientation towards general language use in Austria. In addition to certain guidelines of a former so-called ORF "chief speaker" (Chefsprecher -i. e. 26 Source: ORF Medienforschung, regarding daily reach in the main target group adults 12 years and older (ORF Medienforschung 2018). The figures also indicate that ORF2, the main channel broadcasting nation-wide news, has succeeded in having a constant market share over the last 10 years, whereas ORF1 shows declining figures. 27 This character of "hybridity" might stem from the lack of a codified norm in Austria, as Moosmüller (1991: 178; own translation) explains: "As there does not exist any codified standard norm in Austria, with the exception of the Siebs addendum, the electronic media are forced to solve this problem quasi themselves." (Original quote in German: "Da es in Österreich mit Ausnahme des Siebschen Beiblattes keine kodifizierte hochsprachliche Norm gibt, sind die elektronischen Medien vor die Aufgabe gestellt, dieses Problem sozusagen selbst zu lösen."). 28 Original quote in German: "Das änderte sich aber […] weil inzwischen eine Rundfunkreform dahingehend durchgeführt worden war, daß an die Stelle distanziert in Hochlautung angesagter Programme moderierte Sendungen in zwar gepflegtem, doch weniger prononcierten Hochdeutsch traten, was den Hörern entgegenkommen und den weiterhin indirekten Kontakt verringern sollte." (Wiesinger 2014: 95). 29 Cf. Ehrlich (2009), Mohn (2017); in preparation, Wonka (2015). 30 Disclaimer: All utterances are expressions of the personal opinions and experience of the respective interviewees and do not reflect the official corporate policy of ORF.
the resident expert in charge of monitoring "adequate" language use), valuable audience feedback is considered as well. With regard to pronunciation, a joint "pronunciation database" of a couple of broadcasting companies is available, too.
Summarizing the briefly sketched "outside" and "inside" views, the following conclusion can be drawn: The ORF as the state broadcasting media institution is of critical importance for the concept of "standard in Austria", from both the speakers' and listeners' perspective. Due to its nationwide reach and its dominance in the media market, it may arguably exert a major -including linguistic -influence on its viewers. In turn, viewers articulate their opinions on language and the way it should be spoken directly and thus influence the ORF as well, which again strives to take their evaluations and demands into account.
Also, even at the level of language use in newscasting, just as we have seen in other contexts (see our discussion above), there seems to be an antagonistic differentiation between "pure High German" and other forms of "High German" that the ORF seems to deal with.

"My own High German" -Standard in the individual repertoire
As repeatedly pointed out, "High German" (Hochdeutsch) is a highly frequent label with regard to the notion of "standard language", both in Austria as well as in Germany and Switzerland. 31 In earlier research (cf. Koppensteiner/Lenz 2017), based on more than 150 interviews conducted within the framework of the same project the current paper springs from, we were able to show that this lay term is used equally across different regions of Austria. 32 Comprehensive content analyses of the interviews reveal that "High German" (Hochdeutsch) typically refers to a base category that nearly all interviewees (at least gradually) subdivide further into "speech styles". 33 The following issues coming up in this research are relevant to our present purposes: (1) These interviews demonstrate that the type of speech used by a newscaster of the ORF is not simply labeled as "High German" per se and without restrictions. It is rather conceived as a specific type of "High German" (cf. Koppensteiner/Lenz 2017). 34 On the one hand, these specifications are verbalized by attributions such as "High German with an Austrian accent" (Hochdeutsch mit österreichischem Akzent), "an Austrian type of High German" (österreichische Art von Hochdeutsch) or "Austrian High German" (österreichisches Hochdeutsch). On the other hand, this is evident in the fact that the interviewees also propose additional designators or paraphrases that can be subsumed under "High German" as an umbrella category: a. "German" (Deutsch) as in "comprehensible German" (verständliches Deutsch) and "Austrian German" (Österreichisches Deutsch), 31 For reference, see the comparable findings e. g. by Lenz (2003) for Western Central Germany, by Kleene (2017) for Upper German area, by Koppensteiner/Lenz (2017) for Austria, by Christen et al. (2010) for Switzerland. 32 These interviews were conducted within the project modules PP03 and PP08 of the SFB 'German in Austria', in 13 locations all over Austria and with different socio-demographic groups (cf. fn 1 and Lenz 2018; Koppensteiner/Lenz 2017; see also, again DiÖ (2017). 33 These are considered to be base categories with regard to prototype-theory (cf. e. g. Geeraerts 1989). 34 These findings correspond with e. g. Soukup (2009) and Kleene (2017). b. "domain specifications" like "TV-language" (Fernsehsprache) and "school language" (Schulsprache) or c. "country specifications" such as "Austria German" (Österreichdeutsch) and "Austrian" (Österreichisch). d. "written speech" (Schriftsprache): The "ORF-newscaster's type of speech" is also often associated with this sub-category, expressed by labels like "(talking) according to what's written" (i. e. orthography/writing conventions; nach der Schrift (reden)).
Obviously, the labels of these sub-ordinated categories -with fuzzy boundaries -indicate varying domains of association that are connected with "High German". In addition, the findings indicate that the interviewees conceptualize their "individual High German" as divergent from an alleged prototypical speaker of "real" High German, e. g. an ORF-newscaster (cf. Koppensteiner/Lenz 2017: 67): (2) The "individual High German" is rarely reported as "High German". Instead, we observe strategies of relativization which make use of attributions such as "amateur High German" (Amateurhochdeutsch) and "Half High German" (Halbhochdeutsch), or respondents may resort to paraphrases and further designators like "mishmash" (Mischmasch) and "common speech" (Umgangssprache), integrate regional markers, e. g. including the location's name "[location] High German" ([Ort] Hochdeutsch) 35 and region "Tyrolian German" (Tirolerisch-Deutsch), or stress individual shortcomings like "broken High German" (gebrochenes Hochdeutsch) and "attempted High German" (versuchtes Hochdeutsch).
As previously pointed out, "real High German" obviously encompasses both comprehensibility and the lack of any salient regional features that might act restrictive on the former respectively. Prototypical "High German" is considered "pure", i. e. free of any interferences. This ideological aspect of "pureness" directly relates to findings of previous surveys in the German language areas as well: 36  (2015) Herrgen (2015) conducted one of the most recent perceptual studies on "standard in Austria". As his listener judgment test is of high relevance for this paper's methodological approach, its methods and results are discussed in greater detail in the following. Herrgen made use of the VGT, using as stimuli eight spoken samples from eight different speakers, six of whom used (near-)standard, and two of whom dialectal varieties. Three male professional newscasters, from Germany, Switzerland and Austria, were used to represent what Herrgen calls "the standard of trained speakers" (Standard geschulter Sprecher) (cf. Herrgen 2015). Two of them read an actual news text, and one read some of the now-classic sentences used to elicit variation in pronunciation in the groundbreaking survey by the German dialectologist Georg Wenker at the turn of the last century (cf. Fleischer 2017; Kim 2019). 38 Three young academics (one male voice from Germany, one female voice from Austria and one female voice from Switzerland), who had not received speech training, read the fable The North Wind and the Sun, a text highly used within linguistic empirical research, in their "intended standard" manner [intendierter Standard]. Finally, two dialect samples from older speakers (one male and one female retirees from Germany and Switzerland) reading some of the sentences from Wenker's questionnaires (translated into their individual dialects) completed the stimulus set. The order of the different stimuli within the VGT was randomized before being played back to the informants. 39 As response scheme for speaker evaluation, Herrgen used a seven-point rating scale with the extreme poles "deepest dialect" (tiefster Dialekt) and "pure High German" (reines Hochdeutsch    Herrgen's (2015) results show that, while the entire scale was fully utilized from the top to the bottom, only one professional speaker was transnationally perceived to speak "pure High German", namely the newscaster from Germany. Still, the Austrian listeners rated "their own" newscaster as being roughly on the same level as his German counterpart (no significant differences). This leads Herrgen (2015: 155) to suggest that "two alternative standard norms of orality" might exist in Austria: [T]he evaluation of the speech sample S-A, i. e. Standard of a trained speaker from Austria, is remarkable. The evaluating informants from Austria (in contrast to those from Germany and Switzerland) accept this speech sample just like the speech sample from Germany as »pure High German«. Put differently: for Austria, there apparently exist two alternative norms of orality, which are accepted to the same degree: an Austrian and a German Standard. (Herrgen 2015: 155) 40 The concept "norm of orality" is a substantial part within the "theory of language dynamics" (Sprachdynamiktheorie) by Schmidt/Herrgen (2011) and may be roughly paraphrased as "the spoken shape of the written standard". 41 According to Schmidt/Herrgen (2011) (2011) and Schmidt (2010) for extensive discussion of the theory of language dynamics. man speaking) country (especially Germany, Austria, Switzerland) an individual norm of orality is predominant. Thus, Herrgen's (2015) listener judgment results indicate the following: 1) that there actually is an individual norm of orality in Austria (represented by ORF-newscasters) which, 2), is perceived as on par with the norm of orality from Germany both with regard to the "doctrine of two purenesses" and from a "vertical status" point of view within Austrian informants. 42

Summary and implications
Several of the aspects of "standard in Austria" presented in sections 3.1-3.5 provide the grounds for our own study (cf. section 4), which picks up the following central considerations. Previous research results from Austria suggest that a single attitudinal-perceptual (normative) conceptualization of "standard" does not meet the complex linguistic situation adequately. As Moosmüller (1991: 21; own translation) states "the majority of the respondents affirmed the existence of a discrete high language in Austria, but there is disagreement with regard to which variety it actually is". 43 Against the background of Herrgen's (2015) results, the question arises: Is there more than one candidate? Kaiser (2006: 242; own translation) discusses "an 'abstract' standard language, detached from all situational and national/regional requirements and without 'colorings' of any kind". 44 Does this indicate that there might exist a marked "variety" apart from such a transnational unmarked one as well? Moosmüller (2015: 172) concludes that the type of speech used in the ORF is a variety on its own: could this be interpreted to hint at more than one (conceptualization of a) standard variety in Austria? And finally, we have to come back to the diverging evaluation results in different attitudinal-perceptual studies in Austria as discussed above: do these discrepancies point to heterogeneous, differentiating conceptualizations of "standard" from a lay perspective? According to the results by Steinegger (1998: 353f.), informants assign diverse (social and regional) dimensions to different registers: what is the motivation behind these acts? The indistinct localization of "standard", as described by Soukup (cf. 2009: 39, cited in section 3.1) must be taken into account as well: do these evaluations potentially reflect heterogeneous ascriptions with regard to a concept of "standard in Austria"?

4
Microvariations of Listener Judgment Tests on "standard in Austria" in two series

Preliminaries
The analyses we present in this section are embedded in the SFB 'German in Austria. Variation -Contact -Perception'. 45 The SFB consists of nine project parts conducting research at three Austrian universities (Vienna, Salzburg and Graz) and the Austrian Academy of Sciences. It investigates diverse dimensions of language variation, contact between varieties and languages as well as language perception and attitudes.
In order to (con)test) our initial hypothesis (cf. section 2), to take on the discussion section and to address some of the outlined desiderata with regard to "standard in Austria" (cf. section 3), two series of listener judgment tests were developed, by which we seek to address the following questions: 1) How and by whom are standard or near standard varieties perceived and conceptualized in Austria? What kind of implications do the findings have on the Schmidt/Herrgen's (2011) hypothesis of "norms of orality"? 2) (How) do differing methods and variations within listener judgment tests provide differing results and information regarding the perception of standard varieties from a lay perspective? Here, focal points will include the determination of (evaluative) influence regarding varying types of text, the origin of the speakers, the degree of speech training, the type of tasks to fulfill as well as the labeling of the questions.
In total, two series of listener judgment tests were conducted. Each series focusses on different aspects, as will be shown in the corresponding sections. However, although they complement each other in terms of findings, both series share certain characteristics to ensure inter-comparability as well:  The audio stimuli used are of identical length (approx. 20 seconds each).  Approximately 1.000 listeners took part in both series, of whom 66% are of Eastern Austrian origin (including the federal states of Lower Austria, Burgenland and Vienna; cf. Fig.  1). However, for practical reasons this article will focus on selected aspects of listener judgment tests. Thus, the number of listeners analyzed are lower.  This article only reports on the sample of 536 informants for both series who fulfill the following socio-demographic criteria: raised/grown up in Austria, at least one parent also raised/grown up in Austria. 46  For both series, the vast majority of informants consists of students with a background in linguistics, as a convenience sample. The tasks, i. e. rating audio stimuli according to questions phrased in lay terms, did not explicitly evoke their (potential) linguistic qualifications, though. Whenever possible, courses due early in the curriculum (low entry level to university) were selected. The social data collected of all listeners encompass age, sex, occupation ISSN 1615-3014 62 type, (current) place of residence, the place where the listeners grew up, as well as the place(s) where their parents (father/mother separately) grew up.

"Series
One": Investigating the stability of the "pureness ideology" The first test series was designed to tackle questions on the stability of the "pureness ideology" of standard language(s) in German speaking areas (as also reflected in Herrgen's (2015) results (cf. section 3.5)). The same identical stimuli as in Herrgen (2015) were used to design a new set of listener judgment tests, henceforth called "Series One". 47 Five sub-studies and their results, i. e. ratings from 253 students, 48 are discussed in this article (cf. Tab. 1).
All of the informants taking part in the sub-studies were students at the University of Vienna, and the tests were conducted in 14 different (linguistic) university courses. For the presentation of the (digitally recorded) stimuli, a computer with portable speakers was used, and the informants all heard the same voices in the same order. The test permutations were implemented in the questionnaires (in paper), distributed to the students in random order. Five different questionnaires (featuring one design permutation each) were distributed, and every type of questionnaire was used in every course. 49 Thus, the following micro-variations (sub-studies) were tested in the run of our VGTs: order effects of stimuli, length of scales, change of the labeling of the extreme poles, influences of tasks and formulation of questions. In addition, the extent of standard and near-standard stimuli meeting the requirements of the concept of "pureness" was to be analyzed. In a first modification (cf. New Order 1 and New Order 2), the stimuli were played back in two different orders, so that the existence of order effects could be determined and evaluated. The original test in Herrgen (2015) begins with the "Intended Standard" sample from Switzerland (in Fig. 3 labeled as Herrgen (2015)). We tested two alternative orders, beginning with the "standard of trained speakers" sample read by the Swiss newscaster (cf. Fig. 3: New Order 1) 47 Joachim Herrgen was so kind as to authorize the re-use of the stimuli of Herrgen (2015), for which we thank him on this occasion. We also used the same filler voice as Herrgen (2015): a male (German Standard German speaking) voice indicating the consecutive number of the stimulus ("Language Sample X" / "Sprachbeispiel X") as coming up next. Filling voices are typically used to "distract" listeners to a certain extent from the actual voices to be judged upon if e. g. all audio samples are played to them directly one after the other. 48 In total, 341 informants (fulfilling the above-mentioned socio-demographic requirements) took part in all substudies/micro-variations of Series One. A further control group, consisting of informants that did not fulfill the "autochthonous" requirements, is not considered in this paper. 49 We thank all our participating colleagues for supporting our data survey. and the Swiss dialect sample respectively (cf. Fig. 3: New Order 2). The permutation, however, did not show significant order effects, independent from whether or not the test began with a dialect or a standard language example: the results of the listener judgments were stable.
A second modification targeted the extreme poles (cf. Tab. 1: version Pure-Pure): Does the denomination of the extreme poles affect listener judgments and if yes, how? The adjective "deep" as in "deepest dialect" [tiefster Dialekt] used in Herrgen (2015) was changed to "pure dialect" [reiner Dialekt], corresponding to the opposite pole of the scale indicating "pure High German" [reines Hochdeutsch] (cf. Fig. 3: Pure-Pure). 50 The change of the label of the dialectal extreme pole did not lead to any significant difference in the outcome. However, the ratings of the newscaster from Germany (arithmetic mean = 1.73, standard deviation = 0.92, 1 = "pure High German" and 7 = "pure dialect") and the newscaster from Austria (arithmetic mean = 1.38, standard deviation = 0.53) differ significantly (in both scale label set-ups). 51 As a third modification, we changed the number of scale increments from seven to six (cf. Tab. 1: version 6-point scale). Again, there were no significantly diverging results obtained between the two permutations.
While the modification from "deepest dialect" to "pure dialect" was a minimal adaption to the rating scale, the type of scale was changed to a Likert scale in the fifth modification of the survey method (cf. Fig. 3 3 Likert). Here, the informants' task was to agree (or disagree) with the statement: "I consider what I heard as pure High German" (Das Gehörte halte ich für reines Hochdeutsch). Thus, in contrast to the scale that had been used previously, the listeners were not asked to place the recordings between dialect and standard extreme poles but were only asked to rate the extent to which they matched the standard language prototype "pure High German". Unexpectedly for us, the results still matched the ones of the other four modifications, thus showing no significant effect of the type of scale used. 50 The hypothesis that "deep" may bear different regional connotations within Austria (especially as regards East vs. West) will be the subject of further listener judgment test series. 51 Modification Pure-Pure: Austrian newscaster vs. German newscaster, p=.021, T-Test (paired samples), Cohen's d=.338. Figure 3: Results of listener judgment tests ("Series One") in contrast to Herrgen (2015). Y-axis labeling: 1 = "Pure High German", 7 = "Deepest Dialect" for all sub-studies except Likert (1 = "totally disagree", 7 = "totally agree").
The results of listener judgment tests "Series One" (cf. Fig. 3) can be interpreted and summed up as follows: Our findings suggest a perhaps unexpectedly stable and robust listener assessment regarding the concept of "pure High German" and the vertical placement of the stimuli on the "pure/deep dialect"-"pure High German"-axis. This outcome does not exhibit any (significant) fluctuation even if certain micro-variational modifications in the design are carried out. Thus, a -yet to be tested -hypothesis could be that we appear to be dealing with a quite constant, stable concept of "pure High German" in Austria. In all five methodological variations, both the newscaster from Germany and the newscaster from Austria emerge as being the two best representatives of a conceptualization of "pure High German", at least for our informants, i. e. for linguistic students from (predominantly the Eastern part of) Austria. Like Herrgen (2015), these sub-studies of "Series One" hence suggest that both the Austrian newscaster and the German newscaster fit a concept of "pure High German" to the same extent.

"Series Two": "prototypical" model speakers
The results of "Series One" support the (sub-)hypothesis that there is a certain stability with regard to selected perceptual parameters of "standard in Austria", namely: the "vertical" relation of "pure/deep dialect" and "pure High German" as conceptualized by lay persons seems to "withstand" micro-variational modifications that are based primarily on scales (both numbers and their formulation) as well as on stimuli order. These results also back Soukup's (2009) findings (cf. section 3), which suggest complementary conceptualizations of the two poles, standard and dialect, and which was replicated vertically by "Series One". Both from scientific literature and our SFB-corpus on attitudinal-perceptual data (cf. section 4.1 and for examples section 3.4) we are aware that the parameters examined and described in "Series One" only account for one of several aspects with regard to dimensions of evaluation relevant for lay conceptualizations of "standard". Therefore, we developed a second series of listener judgment tests (in the following "Series Two"), which focused on "prototypical" model speakers and further parameters (cf. Tab. 2):  The representation of newscasters as "prototypical" model speakers. 52 This was operationalized by changing the evaluation question to a perceived fit of a speaker as ORF-newscaster. More precisely, the Likert version was enlarged to consider the (alleged/perceived) "standard-sphere" more comprehensively. Thus, in addition to "pure High German" ("I consider what I heard as pure High German." [Das Gehörte halte ich für reines Hochdeutsch]) the suitability of being an "ORF-newscaster" was tested, too ("This person would optimally fit as ORF-newscaster."[Diese Person wäre als ORF-Nachrichtensprecher/in bestens geeignet]). 53 Both the labelling of the extreme poles ("totally disagree" -"totally agree") and scale length (seven steps) remained unchanged from "Series One" (cf. Likert 2 and Likert 4).  Influence of speech training: This was operationalized by using either only newscasters or both newscasters and persons without professional speech training as stimuli, originating from Austria and Germany respectively. 54 This way, (transnational) evaluation differences could be put into perspective.  Influences of individual/different speakers, tested by combinations of MGT and VGT: In addition to the VGT setup used in "Series One" further MGT setups complement "Series Two". In the VGT versions all speakers read out the same five (unrelated) Wenker sentences (cf. Likert 3 and Likert 4). In the MGT set-ups, Austrian newscasters read the three different texts mentioned above (cf. Likert 1 and Likert 2).  The relation of "writtenness" and "standard" was operationalized by varying the degrees of "conceptual writtenness" (cf. Koch/Oesterreicher 1986) within the stimuli used: in order to measure to what extent text types influence listener judgments, three different types of textual stimuli were used: a real news text (topic: a maritime disaster), the fable The Northwind and the Sun (henceforth Northwind & Sun) as well as five selected yet unrelated sentences from Wenker's questionnaires 55 . These modifications were implemented in Likert 1 and Likert 2 (cf. Tab. 2). 52 As deduced from previous empirical research results in Austria, cf. section 3. 53 Thus, in order to evaluate different (possible) aspects of "standard in Austria" (i.e. an ideology of "standardness"), both "pureness" and the "ORF newscaster's sphere" are taken into consideration within this article. However, further aspects are likely to exist, yet cannot be discussed for pragmatic reasons here, though. 54 As "professional speech training" we define the kind of training newscasters typically undergo before being allowed to perform live "on air". 55 The same five Wenker sentences were used in each sub-study for reasons of comparability. The following sentences were read out (own translation): Es hört gleich auf zu schneien, dann wird das Wetter wieder besser (#2; "It will soon stop snowing, then the wetter will be better again."); Er ist vor vier oder sechs Wochen gestorben (#5; "He died four or six weeks ago."); Wo gehst du hin? Sollen wir mit dir gehen? (#12; "Where are you going? Should we go with you?"); Als wir gestern Abend zurück kamen, da lagen die Andern schon zu Bett und waren fest am schlafen (#24; "When we were coming back yesterday evening, the others already lay in bed and were fast asleep."); Ihr dürft nicht solche Kindereien treiben! (#28; "You must not engage in such puerilities!"); see Wenker (1888Wenker ( -1923 for full list of sentences.  Table 2 Overview: Modifications listener judgment tests ("Series Two").
The selection of speakers used for the stimuli was derived from findings of previous scientific results indicating perceived "standardness" in Eastern parts of Austria, especially the Viennese region. 56 Thus, three of the four Austrian stimuli speakers (both young academics and the female newscaster) are of that origin. 57 The listener judgment tests were conducted in 18 different courses at the University of Vienna in 2017. The students could use their smartphones together with headphones to take part in the listener judgment test, as the test setting was converted into an online survey (using LimeSurvey). Additionally, the link was distributed via the Internet, including social media, mailing lists and online-courses within different Universities in Austria and also in non-academic surroundings via the "snowball" principle. The different versions of listener judgment tests (cf. Tab. 2) were selected randomly. Corresponding with "Series One" all stimuli had an average length of 20 seconds and each sub-study consisted of 8 audio stimuli to be judged. This time, no filler voice was used, as each stimulus just started after successfully evaluating the previous one. The data pool of "Series Two" included 283 persons in total for the four versions of these listener judgment tests. The results of each sub-study are discussed hereafter, followed by concluding remarks in the last section.
Preliminary explanatory remarks regarding statistical methodology: For pairwise comparisons paired T-tests together with corresponding effect size measures (Cohen's d) were used. Following Cohen (1992), d-values are interpreted as small (.2), medium (.5) or large (.8) effects (i. e. differences between means), respectively. Significance level was set to 5% (i. e. .05), calculations did not take Bonferroni correction into account (see criticism by Nakagawa 2004;Pernegger 1998). Selected statistical results are discussed within this article, for a table of results of T-tests cf. appendix. 56 A systematic review of that hypothesis has yet to be addressed by (extensive) series of listener judgment tests. 57 The male Austrian newscaster is from Western Austria. Figure 1: Results from the sub-study Likert 1 of "Series Two": Newscasters read different types of texts; 1 = totally agree, 7 = totally disagree (arithmetic means); incl. standard deviation.

Sub-study Likert 1
In the sub-study Likert 1, Newscasters read different types of texts (cf. Figure 1). They were evaluated regarding the statement: "This person would optimally fit as ORF newscaster". The order of the stimuli was not randomized but not in the order displayed in Figure 1. All numbers above the bars indicate arithmetic means.
The stimulus type manifests as relevant for the evaluational outcome: textual stimuli that are news (reporting a maritime disaster) lead to the lowest arithmetic means (i. e., the "best" rating as fit for the newscasting job). An explanation might be the perceived context of "newscasting": Obviously, a text with news content does fit in here "best". Northwind & Sun can be considered at least as a coherent text/content and, as such, something that newscasters in TV might (still) actually read. Wenker sentences fit "worst" in this task. One of the reasons could be their unrelated, isolated character: every sentence stands on its own and no story is told. These observations are drawn from the outcome of Likert 1 as follows: The arithmetic mean of the female Austrian newscaster (ATV_w) when reading an actual news text (i. e. 1.55) differs from the result she achieves when reading the fable (1.81) and significantly with regard to her Wenker sentences (3.23). 58 Similar results can be observed for the male Austrian newscaster (ATV_m).
Here, again, the news text fits "best" (lowest arithmetic mean) and differs significantly from both the fable and the Wenker sentences. When reading the news text, he receives an average score of 2.99, followed by Northwind & Sun (3.10) and Wenker sentences with the "worst" scores (4.35). 59 However, when both Austrian newscasters are compared to each other, they differ significantly from each other in nearly every single audio stimulus. 60 Both newscasters are employed and trained by the same TV station in Austria, which basically excludes the type of training as explanation for these highly significant differences. However, considering their regional origin within Austria, the fact is that the female Austrian newscaster comes from Vienna (Eastern part of Austria, Bavarian dialect base, cf. Ill. 1) while her male counterpart is from Vorarlberg (Western part of Austria, Alemannic dialect base, cf. Figure 1). 61 Thus, the evaluations might include an east west contrast as well: speech from eastern Austria may be perceived as more standard-near than if it stems from western parts, which would correspond with previous findings (cf. section 3.1). Apart from this assumption, gender could play a role as well. We return to both aspects below, as other results of "Series Two" will shed further light on possible influential factors for evaluation. At this point, we note both factors as potential effects indicators.

Sub-study Likert 4
Figure 2: Results from the sub-study Likert 4 of "Series Two": Newscasters and persons without speech training read just one type of text: Wenker sentences; 1 = totally agree, 7 = totally disagree (arithmetic means); incl. standard deviation.
As stated in the beginning (cf. Tab. 2), in Likert 4 only Wenker sentences (i. e., identical texts) were read (cf. Figure 2). While the statement for evaluation remained unchanged ("This person would optimally fit as ORF newscaster"), now speakers without professional speech training (young, linguistically educated academics) were presented as stimuli as well. 64 With regard to the German newscasters, at least two things are noteworthy: first, their arithmetic means both range in the upper ("better") half of the results, leaving the male Austrian newscaster behind. According to this result, both German newscasters are more suitable as newscaster for ORF than their Austrian male colleague (or at least equally suited). The difference between the German male newscaster and the leading female Austrian newscaster was considerably reduced, to 0.7 points. 65 Secondly, the difference between both German newscasters 66 is now significantly different 67 , while both results -this clear difference and its significance -do not show up in Likert 1 (cf. Tab. 2). These findings are compared with additional versions of "Series Two" listener judgment tests. 64 These untrained speakers are abbreviated with "A" if they are from Austria and with "G" if they are from Germany; the lack of "TV" indicates lack of newscaster-specific speech training. Both Austrian untrained speakers are from the Eastern part of Austria, and their German colleagues both come from Northern Germany. 65

Sub-study Likert 2
In Likert 2 (cf. Tab. 2) the statement was altered to "I consider the things heard as pure High German". As in Likert 1, only newscasters are evaluated, reading different types of textual stimuli. Figure 3: Results from the sub-study Likert 2 of "Series Two": Newscasters read different types of texts; 1 = totally agree, 7 = totally disagree (arithmetic means); incl. standard deviation.
What becomes immediately evident in Figure 3 is the rather flat graph of evaluation: While the difference between the stimuli in the first and the last place in Likert 1 amounts to 2.89 points, it reduces to 1.2 in Likert 2 as the arithmetic means are closer together. The evaluations are more balanced, and most of them oscillate between scale point two and three. The female Austrian newscaster (ATV_w) again scores "best," reading the news text. Attention has to be paid to the fact that the textual stimuli are not identical in this version of listener judgment tests. The male German newscaster (GTV_m) is positioned second reading a mixture of (five) Wenker sentences. When reading Northwind & Sun, the female Austrian newscaster (ATV_w_F) scores higher than him, and the difference even grows to a significant level when she reads Wenker sentences (ATV_w_W). 68 The difference between the female Austrian newscaster reading news (ATV_w_N) is not significantly different to the German male newscaster (GTV_m_W). That could be interpreted as both the female Austrian newscaster and her male German counterpart being evaluated as quite equally speaking "pure High German". However, the female Austrian newscaster differs significantly from her German colleague and the Austrian male newscaster. 69 ISSN 1615-3014 71 The male Austrian newscaster scores "worse" than any other newscaster, independent from their country of origin. Especially when speaking Wenker sentences (ATV_m_W), he significantly differs from any other stimulus within Likert 2. 70 The female German newscaster (GTV_w_W) is positioned in the midfield of evaluations, ranging behind the male German newscaster and the female Austrian newscaster reading news and the fable. When both female newscasters read Wenker sentences, the German speaker is "in front" (i. e. lower arithmetic mean). That impression can be further intensified by analyzing the results of Likert 3 (cf. Tab. 2).

Sub-study Likert 3
Figure 4: Results from the sub-study Likert 3 of "Series Two": Newscasters and persons without speech training read just one type of text: Wenker sentences; 1 = totally agree, 7 = totally disagree (arithmetic means); incl. standard deviation.
In Likert 3, both persons with and without professional speech training read the same five Wenker sentences (cf. Figure 4). The trend towards (more) homogeneous evaluations, as already discovered in Likert 2, continues (and intensifies to a certain extent): 0.69 points separate the "best" and the "worst" evaluation. Because such low evaluative differences were not observed in Likert 1, judging the suitability of being ORF newscaster, this leaves room for interpretation regarding the conceptualizations behind the terms put up for evaluation, i. e. "pure High German" and "ORF newscaster" respectively. Does a flat curve of evaluation as in Likert conceptual-evaluative closeness to "ORF newscaster" (representing the parameter "model speaker") amongst our Austrian participants.
Apart from these cross-test-version perspectives, there are further remarkable results to report within Likert 3: speakers from Germany seem to fit the concept of "pure High German" to a higher degree than any of the tested speakers from Austria. Both German newscasters are positioned noticeably ahead of the Austrian ones. This time, the female German newscaster (GTV_w_W) qualifies ahead of her male colleague (GTV_m_W), while that order is typically the other way round in other sub-studies of "Series Two". The Austrian newscasters even fall behind the untrained speakers of Austria (A_w_W and A_m_W). Can this be interpreted such that the type of speech spoken by newscasters of the ORF is also from a perceptual point of view (i. e. by non-linguists) "hybrid" or "different" and thus in line with the (objective-linguistic) thesis as pointed out in section 3.3? Considering all speakers from Germany, the woman without speech training (G_w_W) came in first ahead of the professionally trained speakers, which eventually could extend the idea of an artificial kind of speech in TV over the Austrian borders as well. However, data are not yet strong enough to back up this interpretation in any way. Thus, that aspect has to be postponed for further investigation.
Both female German speakers (G_w_W and GTV_w_W) differ significantly from their male Austrian counterparts (ATV_m_W and A_m_W) 71 while this is not the case regarding the male German newscaster (GTV_m_W). Overall, it has to be stated that apart from these findings our data does not support the hypothesis that gender did play a major role in evaluation within the listener judgment tests conducted.

Synopsis, discussion and research desiderata
The initial starting point and motivation for this paper arose from our interest in disentangling and describing the parameters of evaluation that are conceptually connected to "standardness" in Austria according to scientific literature and to our own SFB-research corpus. We set as our key hypothesis that we are dealing with heterogeneous conceptualizations of "standardness" within the German speaking countries, manifest, amongst other things, in different sociolinguistic parameters and assumptions about model speakers (cf. section 2). This hypothesis was tested in two ways: first, methodologically, by addressing the question: (1) Do methods and variations within listener judgment tests (VGT and MGT) provide different results and information with regard to the perception of standard varieties from a lay perspective? Second, contributing empirically to the main research question: (2) How do Austrians perceive and conceptualize standard or near-standard "varieties/registers" of the German language spectrum?
To proceed with these issues, two series of comprehensive listener judgment tests were conducted. The groups of informants were homogenous with regard to age, (formal) grade of education and region of origin: the majority of our listeners were students of the University of Vienna who were raised in Austria, as was at least one of their parents. Roughly two thirds of these informants come from the Eastern part of Austria (here defined as the federal states Lower 71 With a significance level of p=.026 or lower (T-Test paired samples), cf. appendix.

Austria, Burgenland and Vienna). 72
Regarding research question 1, using the identical audio stimuli as Herrgen (2015), we undertook several methodological microvariations in a sequence of listener judgment tests ("Series One"). Due to surprisingly comparable, uniform results, we concluded that a) a reasonably stable concept of "pure High German" might exist, 73 which b) is reliably placed within individual variation repertoires by the informants. A second series of listener judgment tests ("Series Two") evaluated the robustness of these perceptual results and added additional important perspectives. According to our data, the type of text spoken as stimuli is of striking evaluative importance from an attitudinal-perceptual perspective and supersedes other modifications tested. Our data indicate that the degree of "conceptual writtenness" tested in various types of texts (always read out) receives particular attention in the evaluative process.
Regarding research question 2, in a test series using stimuli selected for presumed perceived "standardness" (especially from eastern parts of Austria, in accordance with the findings presented in section 3) 74 , we conclude: evidently, both (country of) origin and the degree of (speech) training are factors in the process of evaluation. Focusing specifically on Austria, the results allow the following assessment: it is of higher importance for a positive (more "standard"-near) evaluation that the speaker is from Austria (particularly from the eastern part / Vienna), than that the speaker underwent speech training. The informants exhibited diverging patterns of evaluation depending on the parameter of evaluation: the differences (arithmetic means) between the audio stimuli were less pronounced when rating "pure High German" (again, similar to Herrgen's [2015] finding comparing the newscasters from Austria and Germany only), than when rating whether or not a speaker qualifies as "ORF newscaster".
To conclude, and to match these results with our key hypothesis, this paper provides evidence for the following sub-hypotheses (i. e. operationalized aspects of the key hypothesis): 1.) The results of the two series of listener judgment tests (i. e. the several micro-variations) indicate that for Austrian participants, the target group of our analyses, the initially assumed "ideology of pureness", appears to be very stable and insensitive to microvariations such as different text types and response scheme wording. 72 For reasons of scope, inter-individual differences among the informants will be addressed in future judgment tests. Amongst other, such test series will include a larger number of informants from the western parts of Austria as well, corresponding with the (methodological) perspective of the respective test series. As pointed out, this article's focus is on Eastern Austria(n stimuli). 73 This hypothesis will be addressed in further tests. 74 In our test series, we did not test possible intra-national differences (e. g. Eastern -Western), as the focal points were set differently (e. g. on contrasting trained vs. untrained speakers, contrasting German speakers with Austrian ones etc.). Based on our results, this perspective seems to be promising as well: Both Austrian newscasters differ with regard to their origin (Vienna, East Austria, vs. Vorarlberg, West Austria), not for their type of speech training (both are employed at the same TV station). Nevertheless, the female Austrian newscaster is always evaluated (significantly) more favorably than her colleague from the western part of the country. However, this aspect must (and will) be focused on at a later stage of analysis and is thus excluded here, as it would go beyond the scope of this paper.
2.) Audio stimuli, regardless of their German or Austrian "origin", may correspond to such an ideal of "pureness" for both, Austrian and German informants. However, Austrian informants seem to subscribe to such an ideal to a greater extent than this is the case for German informants (especially with regard to audio stimuli from Austrian speakers).
3.) In Herrgen (2015), where the parameter "pureness" suggested "standardness", the only speaker perceived as speaking "pure High German" turned out to be the German newscaster. The situation in Austria appears to be different, though: some of the audio stimuli are perceived as "pure" but not necessarily perceived as adequate for (ORF) newscasting and vice versa. Thus, in Austria the parameter of "pureness" alone is insufficient to seize "standard", if the notion is to include also presumed model speakers.
Considering the findings and our interpretations thereof, the results contribute to the discussion of Herrgen's (2015) thesis of "two alternative standard norms of orality" as follows: there seem to be fundamental evaluative frictions and incongruities regarding conceptualizations and parameters of "standard in Austria" in the minds of speakers and listeners. "Standard in Austria" is closely linked to highly heterogeneous dimensions of evaluation. In particular, the parameters "pure High German" and "being suitable for ORF newscasting", both showing diverging evaluative patterns, play major roles for the perception of "standardness". However, there are decisive perceptual differences between Austrian and German results, which indicates a focus shifting away from competing (German speaking) country-specific conceptualizations of "pure High German" on to different and highly heterogeneous dimensions of "standard in Austria". These dimensions along with their parameters have to be put in focus of future analyses accordingly. Obviously, there are at least transnational differences of evaluation.
Although a considerable body of research has been amassed already, further comprehensive analytical efforts will be necessary, as extensive parts and aspects of the data are yet to be added to the overall picture. In particular, this includes substantial comparisons of test series dealing with semantic differentials and integrating, for comparison, allochthones informants as control groups. In using semantic differential scales, answers on questions like "Which linguistic features should a voice suitable for ORF include?" will be tackled. Evaluative differences traceable back to individual characteristics of certain speakers are prone to affect patterns of judgment, too. This includes both phonetic/phonological aspects like voice quality or speech rate and affective-evaluative ratings on e. g. comprehensibility and correctness, sympathy and personality. A third series of listener judgment tests ("Series Three", currently "in the field") targets further important aspects like e. g. the perceived region of origin of each stimulus used within "Series One" and "Two".