Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.


Websites on English Corpora

MiCase_Michigan Corpus of Academic Spoken English
Online collection of transcripts of academic speeches presented at the University of Michigan (may also include some sound recordings of speeches). Intended for use in a research project examining the characteristics of academic speech.

Corpus of Contemporary American English (COCA)

The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. The corpus contains more than 450 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.

International Corpora for English

(ICE) began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Twenty-six research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English  produced after 1989.

Statistical Natural Language Processing and Corpus-based computational linguistics

An annotated list of resources from Stanford University. 

Reference Books on Corpora