On the Perils of Using Corpus Linguistics to Interpret Statutes

Anya Bernstein, Legal Corpus Linguistics and the Half-Empirical Attitude, 106 Cornell L. Rev. 1397 (2021).

In Legal Corpus Linguistics and the Half-Empirical Attitude, Professor Anya Bernstein provides an illuminating and forceful critique of the claim that corpus linguistics—the study of patterns of language usage across a wide array of English-language sources—should be used to “empirically” derive the ordinary meaning of words used in legal texts. Corpus linguistics has been a hot topic in statutory and constitutional interpretation for the past several years, as a growing number of judges, scholars, litigants, and amicus curiae have pressed for its use in cases that turn on the meaning of a legal term or phrase. Perhaps most notably, in an article titled Judging Ordinary Meaning Utah Supreme Court Associate Chief Justice Thomas R. Lee and his former law clerk Stephen Mouritsen have argued that the concept of “ordinary meaning” implicates empirical questions that the field of corpus linguistics is well-designed to answer—and have urged courts to “import [corpus linguistics] methods into the modern theory and practice of interpretation.”

Professor Bernstein’s thoughtful article astutely identifies several serious flaws with such an interpretive move, calling into question the push to use corpus linguistics to determine statutory or constitutional meaning and the effort to use corpus linguistics to add an empirical dimension to the search for ordinary meaning. Her central critique is that the use of corpus linguistics to determine the meaning of legal texts mismatches methods and goals. She contends, for example, that while corpus linguistics in linguistics makes an empirical claim to illuminate truths about how language in the corpus is used, the use of corpus linguistics in legal interpretation misuses empirical methods to make a normative claim—i.e., that the usage patterns identified through corpus analysis ought to influence the interpretation of legal texts. Bernstein labels this attempt to treat normative claims as empirical a “half-empirical” attitude. And she meticulously questions the assumptions underlying that claim.

First, Bernstein explains that legal corpus linguistics focuses on different data—the frequency and collocation of words in the corpus—than the larger-scale search for hidden but pervasive patterns in language structure that characterize corpus linguistics research’s “most exciting findings.” (She explains, for example, the linguistic concepts of syntagm and paradigm, which focus on what is communicated by what is left out of a linguistic phrase—e.g., the unused option when we say “I like ice cream” rather than that we “love” ice cream or that “Isaiah” likes ice cream; whereas corpus linguistics in linguistics cares about such omissions, corpus linguistics in service of identifying ordinary meaning ignores such subtleties.)

Second, Bernstein questions the actual database of English-language usage—i.e., the corpus itself—that legal interpreters have tended to use (and have advocated using) to determine statutory and constitutional meaning. Specifically, she notes that COCA (the Corpus of Contemporary American English), which collects language used in “fiction, popular magazines, newspapers, academic texts” as well as TV and radio programs, has been touted as reflecting ordinary, naturally occurring conversational usage—but in reality reflects professionally planned, edited writing or broadcast performances that differ markedly from unscripted everyday speech. Moreover, Bernstein points out, an emphasis on the COCA ignores the genre that is arguably most relevant to legal language—i.e., the language of legislators; thus proponents of corpus linguistics might be better served by advocating the use of a corpus based on the congressional record, C-SPAN recordings, and committee reports, in lieu of one based on talk shows and the like.

Relatedly, Bernstein argues that the use of corpus linguistics in legal interpretation completely ignores legal context. Using specific case examples, she deftly shows that courts sometimes use corpus linguistics to ask the wrong questions—e.g., what a particular word in a statutory phrase means in everyday conversation rather than whether precedent dictates that the entire phrase has a specific legal meaning. Worse yet, she argues that courts sometimes use corpus linguistics to obscure legal judgment calls or to provide a false air of scientific certainty or neutral objectivity to their decisions. In this sense, the judicial use of corpus linguistics falls prey to some of the same problems as the judicial use of the canons of construction and other supposedly neutral interpretive tools, as I and others have written about elsewhere.¹

Bernstein ends by offering some suggestions for how legal scholars and practitioners might use corpus analysis, not to determine how legal terms appear in non-legal language sources, but to determine how legal language is typically structured and how it compares to the language of other genres.

Ultimately, Professor Bernstein’s article is remarkably insightful and valuable—both for its careful explanation of how corpus linguistics in linguistics works and for its detailed analysis of the limitations of various specific corpora, as well as its insights on where judicial use of corpus linguistics has gone wrong. The article is a must-read for anyone who wishes to understand what exactly corpus linguistics is—and its limitations as a tool of legal interpretation.

See, e.g., Anita S. Krishnakumar, Backdoor Purposivism, 69 Duke L.J. 1275 (2020); James J. Brudney & Lawrence Baum, Oasis or Mirage: The Supreme Court’s Thirst for Dictionaries in the Rehnquist and Roberts Eras, 55 Wm. & Mary L. Rev. 483, 548 (2013) (arguing that the Court sometimes employs dictionary definitions to “lend[] a patina of objectivity and legitimacy” to its statutory constructions).

Cite as: Anita Krishnakumar, On the Perils of Using Corpus Linguistics to Interpret Statutes, JOTWELL (April 1, 2022) (reviewing Anya Bernstein, Legal Corpus Linguistics and the Half-Empirical Attitude, 106 Cornell L. Rev. 1397 (2021)), https://lex.jotwell.com/on-the-perils-of-using-corpus-linguistics-to-interpret-statutes/.