English corpora resources
Order transcripts from the three collections of New Zealand English corpora from the School of Linguistics and Applied Language Studies.
Linguists at Victoria University of Wellington have been involved in the collection of New Zealand English for three different corpora, one spoken, one written, and a third which includes both spoken and written data. The transcripts from the corpora are now available as text files on CD. The WWC and WSC are available on the same CD. The prices for the corpora are outlined below. All amounts are in New Zealand dollars and are subject to 15% GST to New Zealand purchasers.
How to order
Each of the corpora are subject to conditions of use. Fill out the relevant agreement form and email to firstname.lastname@example.org. On receipt of payment and the appropriate forms the corpora will be sent out.
You can pay here.
WWC and WSC CD
Individual licence $100
Institution licence $200
To order a copy of the WWC and WSC CD the Wellington Corpora of English Conditions of Use Form must be filled in and returned.
Individual licence $50
Institution licence (single copy) $100
Institution network licence (multiple copy) $200
To order a copy of the ICE-NZ CD, the first page of the ICE-NZ Licensing Agreement must be filled in and returned.
For more information contact
Archive of New Zealand English
School of Linguistics and Applied Language Studies
PO Box 600
The Wellington Corpus of Written New Zealand English (WWC)
One million words of written New Zealand English collected from writings published in the years 1986 to 1990.
The WWC has the same basic categories as the Brown Corpus of written American English (1961) and the Lancaster-Oslo-Bergen corpus (LOB) of written British English (1961). The corpus also parallels the structure of the Macquarie Corpus of written Australian English (1986). The WWC consists of 2,000 word excerpts on a variety of topics. Text categories include press material, religious texts, skills, trades and hobbies, popular lore, biography, scholarly writing and fiction. (For further information see Bauer, Laurie 1993. Manual of Information to Accompany the Wellington Corpus of Written New Zealand English. Wellington: Department of Linguistics, Victoria University of Wellington.)
The Wellington Corpus of Spoken New Zealand English (WSC)
One million words of spoken New Zealand English collected in the years 1988 to 1994. The corpus consists of 2,000 word extracts (where possible) and comprises different proportions of formal, semi-formal and informal speech. Both monologue and dialogue categories are included and there is broadcast as well as private material collected in a range of settings. Seventy-five percent of the corpus is informal dialogue.
A brief outline of the Wellington Corpus of Spoken New Zealand English (WSC) is provided here.
The New Zealand component of the International Corpus of English (ICE-NZ)
One million words of spoken and written New Zealand English collected in the years 1989 to 1994. ICE-NZ consists of 600,000 words of speech and 400,000 words of written text.
The Wellington Corpus of Spoken New Zealand English and the spoken component of ICE-NZ share 9 categories. Because informal conversational data in particular was so difficult to collect, there is an overlap of 339,530 words (173 files) between the two corpora to achieve economy in data collection.