What students need (and want): semantically-oriented queries in large online corpora

2010

SYNAPS - A Journal of Professional Communication 24(2010) pp.27-39

The 400 million word Corpus of Contemporary American English (COCA) [1990-2009] is the only large,

balanced, up-to-date corpus of English that is publicly available. There are many features in this corpus that

allow learners of English to quickly and easily perform semantically-oriented queries. These include the

following: 1) one-step collocates (with limiting by part of speech and sorting and limiting by Mutual

Information score), 2) comparing collocates across genres (e.g. collocates of “chain” in fiction and academic),

3) comparison of collocates of two words (e.g. sheer / utter) 4) use of integrated thesaurus (entries for 60,000+

words) to see frequency of all synonyms (including by genre) and to create more powerful queries (e.g. all

forms of all synonyms of “clean” + a noun in a particular semantic domain) and 5) customized wordlists

(including hundreds or thousands of words in a semantic domain).

NHH

SYNAPS - A Journal of Professional Communication