Skip to Main Content


Websites on English Corpora

MiCase_Michigan Corpus of Academic Spoken English
Online collection of transcripts of academic speeches presented at the University of Michigan (may also include some sound recordings of speeches). Intended for use in a research project examining the characteristics of academic speech.

Corpus of Contemporary American English (COCA)

The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. The corpus contains more than 450 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.

International Corpora for English

(ICE) began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Twenty-six research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English  produced after 1989.

Statistical Natural Language Processing and Corpus-based computational linguistics

An annotated list of resources from Stanford University.

Dictionary of Old English Corpus

This database gathers 3037 individual texts comprising a complete record of Old English. Arranged as a contextual dictionary, words or word fragment searches lead to hits everywhere these appear in an OE text

Reference Books on English Corpora