The VARSUL Database

This study introduces the Project that gave origin to one of the most important databases about oral language in Brazil. The Project on Urban Linguistic Variation in the South of Brazil (VARSUL), that started in 1990, initially comprised the three federal universities of the three States of Southern Brazil: Federal University of Santa Catarina (UFSC), Federal University of Paraná (UFPR) and Federal University of Rio Grande do Sul (UFRGS). In 1993, the Project began to also rely on the Pontific Catholic University of Rio Grande do Sul (PUC–RS). The VARSUL Project aims at storing samples of speech realizations by inhabitants of socio-representative urban areas from each of the three states of the South of Brazil, stratified by location, age range, gender and education.


Introduction
The VARSUL database results from executing the project of Urban Linguistic Variation in the South of Brazil, whose concept was idealized in 1984 by Leda Bisol who had reunited some researchers in Porto Alegre.The Project proposed by that researcher was based on the pioneering project of socio-linguistic survey in Brazil: the Linguistic Census Project of Rio de Janeiro, coordinated by Anthony Julius Naro, and carried out in the late seventies, at the Federal University of Rio de Janeiro (UFRJ), with its first results published in the beginning of the eighties.The Census Project limited data collection to the city of Rio de Janeiro, with interviews performed in different districts, representative of different local communities, especially from the social viewpoint.
Although the data collection model was that of the Census, in order to comprise the ethnic diversity of the region a consensus was established: it would be not enough to make a survey in the capital cities of the three states (Paraná, Santa Catarina and Rio Grande do Sul); it would be required to include some of the ethnic groups representative of the diversified ethnic occupation not only of the South but also of each state, individually.The reason for that was the fact that the project intended to verify whether the Portuguese spoken in the region was different from the other dialects of PB (Brazilian Portuguese) as a consequence of the distinct colonization of these states (this region was nearly depopulated by the time the highest number of slaves came to Brazil).That is why some bilingual informants were interviewed in some places; Portuguese monolingual informants were interviewed in other.
Another difference relatively to the Census Project regards the data transcription system and interview transcripts storage.For Varsul, the system selected allowed for the storage to be done in microcomputers, making data access easier.Data transcription of the South region, because of its own different characteristics, required a system of idiosyncrasy indication, especially as to pronunciation, hence the selection of a three-line transcription system: the first line records the actual syntax of the informants' speech; the second line records pauses and variable phonetic aspects and the third line records the morphosyntatic classification and the remarking of prosodic aspects such as speech emphasis and speed.

A brief history of the VARSUL Project
On August 19, 1982 in Porto Alegre, after invitation of the Applied Linguistics Center coordinated by Margot Levi Mattoso from the Instituto de Letras of the Federal University of Rio Grande do Sul (UFRGS) under the direction of Nora Then Thielen, professors representing the three universities of Southern Brazil: Federal University of Santa Catarina (UFSC), Federal University of Paraná (UFPR) and Federal University of Rio Grande do Sul (UFRGS) -reunited for a meeting.They then had the objective of discussing resources and means to make the studies dynamic in the areas of Linguistic Geography, Bilinguism and Linguistic Variation.Thus, a regional project formed by three work groups: (i) Linguistic and Ethnographic Atlas; (ii) Bilinguism; (iii) Linguistic Variation -was born.This project should reach post-graduate studies so that a descriptive material of the spoken language of Southern Brazil was produced.Regarding data survey methodology, the Labovian line was followed, inspiring the transcription of interviews in the work accomplished by the team of the Censo Project from Rio de Janeiro, although now configured to be stored in microcomputers.Data were transcribed in three lines: the orthographic transcription in the first line, the indication of variations in the second line which allowed for an immediate electronic relation to be established between orthography of a form, usually uniform, and its diverse realizations; in the third line the morphosyntatic classification of the issues as well as some records of speech style.
Once transcribed, data were electronically stored.The interviews originally recorded in cassette tapes are being gradually transferred to CDs -a stage already concluded at PUCRS and still ongoing in the other centers.
It is important to remark that the VARSUL Database has been expanding with the addition of new samples.To the basic sampling formed by informants divided into three education levels, gender and age range, others have been added: a new age range (15-24 years) and one more education level (graduates), the latter only for capital cities.Also, VARSUL has been turning into a privileged reference in the formation of new researchers, opening its doors to undergraduate students (with scholarships for scientific initiation), master and doctoral students.

The Region included in the sampling
The VARSUL Project was conceived with the objective of installing a linguistic database in a short term to allow, further ahead, the description of the urban linguistic variation of the Southern Region of Brazil and its local dialects.
The implantation of this database was initially made with data from the capital cities and one inland city of each state.At a second stage, these data were supplemented to cover urban areas more representative of the regions which, under the historic social and cultural viewpoints, stood out and were relevant in each of the states of the Southern Region of Brazil.Thus, for each state four cities were selected, representing the groups proven relevant to its occupation process.The selection criteria of these groups and of the municipalities representing them are presented as follows.

Rio Grande do Sul
The state, with a tradition strongly based on the peasant life of the farms, inhabited by indigenous people, Castilians and troopers who arrived from other parts of the country, was only attached to the Portuguese Crown around 1750, under the name of Província de São Pedro.
It brings in its core the history of three important immigrations: the Azoreans, who arrived around 1750, a little less than a thousand people in groups of sixty couples -the so-called "couples of number" -distributed along the areas aimed at settlements, along river margins, gave origin to the cities of Porto Alegre (early Porto dos Casais), Taquari, Osório among others.Together with the Portuguese people who arrived later, they devoted themselves to cattle-breeding activities, extending through the pampas (grassland), where part of the state culture comes from.
The German immigration, represented in the sampling by Panambi, started in 1824 with 43 immigrants, interrupted in 1830 and re-started in 1844, reaching a large contingent.They received colonial land lots along the fertile margins of River dos Sinos and River do Caí and kept spreading along Serra Geral (local mountain range).They were devoted to cattle-breeding and small scale handicraft, the origin of the current industry, remarkably the shoe industry.They are accounted for erecting cities such as São Leopoldo, Novo Hamburgo, Taquara, Panambi (of late immigration) among others.
Flores da Cunha represents the Italian immigration, started with three Milanese families in 1875.They were assigned with the challenges of the plateau and its difficult access ways which, in a way, directed farms to vine growing.They were awarded with the colonies of Conde d'Eu -nowadays the city of Garibaldi, and Dona Isabel -modernly the city of Bento Gonçalves.They grew wheat and vines, and continued inaugurating towns such as the ones mentioned above and others like Veranópolis, Farroupilha and Caxias do Sul.
In addition to these, there are border populations with Argentina and Uruguay, of Spanish language, forming an important part of the population of the State of Rio Grande do Sul, with specific socio-cultural and economic traditions.Among the cities of this region, Livramento, Itaqui, Uruguaiana and São Borja stand out, and the latter was chosen to be part in the sampling.

Santa Catarina
The sampling of the State of Santa Catarina intended to represent Portuguese as spoken by the descendants of the most expressive ethnic groups of the state: Azoreans (Florianópolis), Italians (Chapecó), Germans (Blumenau) and the highlanders (Lages).
The Azoreans arrived at the shore of Santa Catarina in the period between 1748 and 1756.Basically, they occupied São Francisco do Sul, Nossa Senhora do Desterro (presently Florianópolis) and Santo Antônio dos Anjos de Laguna.
The march of colonization and settlement of the territory of Santa Catarina was re-started in the middle of the Nineteenth Century and it is characterized mainly by European immigration flows.The first colony of Santa Catarina occupied by Germans was Colony of São Pedro de Alcântara.Two decades later, the large German immigration flow took place in this state with the colonization of the middle valley of River Itajaí and the North-east lands of the state, near São Francisco do Sul.As a private enterprise by Herman Blumenau, the colony Blumenau was born in 1850, in the Middle Itajaí-Açu River.Following, the colonies of Dona Francisca (1851), Itajaí-Brusque (1860) and Ibirama (1899) were inaugurated.
In the hydrographic basin of the Itajaí River the first Italians to arrive to Santa Catarina were settled, in the colony "Blumenau", along the margins of the affluent rivers of Itajaí-Açu River; in the colony "Brusque", along the margins of Itajaí-Mirim River and its affluents and then it was transferred from the valley of Itajaí-Mirim to the valley of Tijucas, settling along the Braço River and its affluents.At a later stage, within the basin of Itajaí River, along the margins of Luís Alves River a colony with this same name was born.In a later flow a large amount of Italian settlers came to the valley of Tubarão and then, little by little, went on to other valleys such as Urussunga, Mãe Luiza and eventually Araranguá.Finally, the middle and far West of Santa Catarina were occupied by Italian and German immigrants from Rio Grande do Sul, represented in the sampling by the city of Chapecó.
Lages was also colonized by Italians at a later period after the initial occupation, when it was founded by people of the administrative division of São Vicente/São Paulo (together with indigenous groups from the Jesuitical Missions) in the Eighteenth Century and the following occupation from the gauchos established because of the troops' track open from Viamão (RS) towards Sorocaba (SP).

Paraná
The State of Paraná presents a extremely diversified linguistic panorama: the main reason for such diversity is in the various origins of the population of the state, formed from different flows of population groups: the Portuguese colonizer from the first centuries, the European and Asian immigrants (19 th and 20 th centuries) and the Brazilian migrants of the last decades, especially from Minas Gerais, São Paulo and Rio Grande do Sul.Therefore, several modalities of Portuguese are spoken in the state.
Since there is no systematic survey of such varieties, the project has tried to comprehend this ethnic occupation by considering the following areas: The North of the state was populated by people from the states of Minas Gerais and São Paulo during the period of coffee plantation expansion (in 1930) and the city of Londrina is representative of this territory occupation.In the South-west and West, the language brought by the colonists from the states of Rio Grande do Sul and Santa Catarina, responsible for the agricultural occupation of that area of the state, will be represented by the city of Pato Branco.
In the region concentrating the immigration of Slavian peoples (Russian, Polish and Ucranian) and still remaining partially bilingual, the city of Prudentópolis (Ucranian) would be more representative but, for not being a urban center sufficiently stratified for the sampling, the city of Irati (Polish and Ucranian), a larger town (although the majority of the population is not bilingual there) was chosen instead.In the Center-south, also called Old Paraná -the area making the state unique the most from the linguistic standpoint, Curitiba -the capital city of the state -was selected.

Sampling constitution
In an initial definition of profiles required in the sampling of urban population, the following social characteristics considered significant in previous sociolinguistic researches were taken into account: gender (male and female); age range (25-45 and over 50) and education (elementary, middle and high school).
It was determined that every municipality should be represented in the sampling by a group of 24 interviews, corresponding to 12 profiles (2 genders × 3 education levels × 2 age ranges), each represented by two interviewees.After the definition of these profiles, informants in different districts with a considerable permanent population were sought.
In addition, the speakers had necessarily to fulfill the following pre-requisites: (i) to speak only in Portuguese (requirement for interviewees in the capital cities, but not in bilingual areas); (ii) to have lived in the city for at least 2/3 of their lives; (iii) not to have lived outside the region for more than a year during the period of native language acquisition (2 to 12 years); (iv) not to cause awkwardness to other inhabitants of the region.
Neither illiterate nor graduate people were included in the initial stage for the fact that they are the target population for studies on dialect (illiterate) and formal regional urban norms (graduates).
The age range below 20 was not considered for not presenting the linguistic consistency required to the objectives of the initial (frequent idiosyncrasies).The decision of not including this age range also took into account the sampling size.Since the database had to include 4 municipalities of each state, the number of informants for each municipality was reduced, considering only the social characteristics proven significant in previous studies.

Data collection
Data collection was carried out in two stages.With the help of local leaders (priests, teachers, community leaders) the speaker with a compatible profile in each of the districts was sought; the interviewer introduced himself/herself to them usually accompanied by a person from the community, an acquaintance of the interviewee, and identified himself/herself as a university student, asking for cooperation for an academic paper required about how he/she lived, what he/she thought, how he/she entertained, what the actual inhabitant of the city believed.
Once the consent to the interview was obtained, a recording session of about 5 to 15 minutes took place, justified to the speaker as a test of his/her voice in the recorder although, in reality, worked as data confirmation and record of social characteristics of the informant, such as: home history (to confirm family and school history), family history (to confirm occupational history), school history (to confirm education level of interviewee and his/her acquaintances), occupation history (to verify social level, jobs, salary level, ambitions etc.), reading habits, TV, radio, sports, parties, "hobbies" (to verify socio-cultural characteristics) and the contact with speakers of other languages and dialects of Portuguese (to verify any possibility of foreign interference).
Once the first interview was recorded, the second one was set up, lasting up to one hour, performed whenever possible at the interviewee's home so that he felt as little tense as possible despite the normal interference of the recorder (observer's paradox).
The second interview should always be based on the information obtained in the first interview so that the interviewer could prepare a script of subjects to allow the interviewee to feel comfortable and speak for the most part of the time, producing a linguistically varied speech as to vocabulary, structures, verb tenses and modes, pronouns etc.This interview also followed the blind technique, that is, for the speaker the interview had the objective of collecting information about how the actual inhabitant of the city lived, and not about their linguistic marks.
In many cases, the interview relied on the participation of a third person, labeled intervenient, who could be either an interviewer's company or a family member of the interviewee.The interview style, despite the informality degree achieved, can also be characterized as semi-distended, although many informants were able to relax during the interview.

|-
The second line (i) brings information about sound fading and inclusion; (ii) indications of pauses (short and long) corresponding to punctuation or not; (iii) phonetic specifications of phonemes performed in different manners throughout the various cities for example3 ; pronunciation of /t/ before /i/ and in front of /e/ in an atonic syllable whose performance can be either apical dental or palatalized (with enhancement of the e).
The third line contains the morphological classification of all words.

6.1
What do the three lines stand for?
Engesis4 developed a program to edit the interviews that, along with the Interpretator, allows for the execution of searches by combining items of two or more lines.This software was developed in a Windows environment and it is aimed at facilitating data search.
Data can be used in several analyses after the social factors recorded in each interview (gender, age range, education level, location) as shown by some studies already available in the site http://www.pucrs.br/fale/pos/varsul/index.php.

Database Use Standards
The VARSUL Linguistic Database is open for consultation and data supply to professors and students of the universities participating in the project (UFRGS, UFPR, UFSC and PUCRS) as well as to the researchers and graduate students connected to both national and international teaching and research institutions, according to the rules found in the VARSUL site, mentioned below.

Present Status of the VARSUL Project and Database
On the VARSUL web page (http://www.pucrs.br/fale/pos/varsul/index.php) it is possible to find the present structure of the Project's coordination groups as established in the last meeting on October 29, 2008, during the VIII CELSUL -Centro de Estudos Lingüísticos do Sul (Center for Linguistic Studies of the South), held at UFRGS in the capital city of Porto Alegre.
On the project's web page is also possible to find further information about the VARSUL Project and the Database as well as about studies produced such as master and doctoral dissertations, graduation monographs and scientific initiation.The areas of research concentration are: Phonological Variation, Morphological Variation, Syntactic Variation, Sociofunctionalist Studies, Sociolinguistics, Syntactic-Discursive-Pragmatic and Text Linguistics.the expansion through addition of the graduate range the group in Rio Grande do Sul also decided to make a re-contact sampling of NURC-POA5 with the purpose of working not only with research in apparent time -an important objective of the Project, but to make short duration real time research (NURC carried out in 1970) as well.This will allow to recover NURC data and to prepare panel studies (linguistic variation in the community and in the individual, according to LABOV, 1994).
Another objective of the group of professors of the VARSUL Project is the construction of a digital sampling to be made available on-line to the researchers interested.Under the coordination of Professor Izete Lehmkuhl Coelho, a funding project for the production of such sampling (recently approved) will allow to expand and make access easier to a part of the Database.
| 0006 1 | ensinando tudo.(est)*Entãotu tens que fazer | Presently a database expansion is being accomplished with the inclusion of a younger age range between 15 and 25 years (12 interviews) and graduates, 15 years of education and up (8 interviews).This stage is nearly concluded in the State of Rio Grande do Sul, and at the digitalization and transcription stage in the States of Paraná and Santa Catarina.Jointly with O. Pereira da Silva Menon/E.Domingos Fagundes/L.Loregian-Penkal: , coinciding with the period of accomplishment of the Círculo de Estudos Lingüísticos do Sul (CELSUL -Circle of Linguistic Studies of the South).
The group of Linguistic Variation initially discussed the proposal of Leda Bisol to organize a linguistic database of the Southern Region of Brazil comprehending the states of Paraná, Santa Catarina and Rio Grande do Sul.The idea was to develop research along the same lines of the Linguistic Variation Census of the state of Rio de Janeiro, currently known as PEUL (Programa de Estudos sobre o Uso da Língua -Program of Studies on Language Use), and coordinated by Anthony Naro.In 1983, the second meeting of the large group took place in Florianópolis.In October 1984, at the time of a Master examination board at UFRGS, Leda Bisol reunited professors Carlos Alberto Faraco from UFPR, Solange Lira from UFSC and Gisele Machline de Oliveira e Silva, of the Census Project staff from Rio de Janeiro, in Porto Alegre.The objective of that meeting was to discuss the preparation of the Linguistic Variation Project.The meeting also reunited professors Odete Pereira da Silva Menon (UFPR), Clarice Knies (UFRGS) and students Laura Quednau (Scientific Introduction) and Cristina Schmitt (Master course).