For those of us who are not engineers or programmers, magical results appear when we run searches in legal databases. However, we have little understanding of the machinations behind the ever-present e-wall. What kind of confidence can we have when the underlying structure of legal databases are hardwired with human biases? We must ask ourselves the question posed to then-Senator Obama and Senator McCain at a Town Hall Debate in 2008, “What don’t you know and how will you learn it?”
When I teach legal research, my students compare the same searches in different databases. One goal is to demonstrate that there are different results. But a more nuanced goal is to examine the results closely enough to provide insights into which databases might be more useful for updating, for case searching, for browsing statutes, and other research tasks. Susan Nevelow Mart’s study will elevate these discussions because of her focus on human-engineered algorithms and the inherent biases in the databases used for legal research. This study will also guide researchers to think more about search strategy and will help set more realistic expectations about search results.
Mart studied the impact of human judgment and bias at every step of the database search process. Her study explains how bias is hardwired into the human-engineered algorithm of each database. Add additional layers of human judgment and bias to the choice of database, to the date and time of the search, to the search terms, to the vendor’s classification scheme, and to the fact that searchers typically only browse the first 10 sometimes-relevant results. Mart introduces us to the concept of algorithmic accountability or “the term for disclosing prioritization, classification, association, and filtering.” Mart contends that algorithmic accountability, or understanding a bit more about the secret sauce in the inputs, will help researchers produce more accurate search results.
Mart’s research sought to test hypotheses about search algorithms by examining the results of the same searches in the same jurisdiction across six databases: Casetext, Fastcase, Google Scholar, Lexis Advance, Ravel, and Westlaw. When examining the relevance of the top 10 results, it is unsurprising that Lexis Advance and Westlaw lead in the relevancy rankings because they have the longest standing in the market. However, it is surprising that the top 10 results for those two vendors were relevant only 57% and 67% of the time, respectively.
Mart found that each of the six databases average 40% unique cases in the top 10 results. Mart also explores how many of the unique results are relevant in each database’s results. Again, it is unsurprising that Westlaw (at 33%) and Lexis Advance (at about 20%) lead in these two categories. It is surprising, however, that there are so many relevant cases that are unique results when the same search was performed in each database. And because we don’t know what is in the secret sauce, it is difficult to improve these outcomes.
There are a number of takeaways from Mart’s study. First, algorithmic variations lead to variations in the unique, and in the relevant, results returned from each database. Second, database vendors want us to have confidence in their products but it is still necessary to run the same search in more than one database to improve the chances of yielding the most comprehensive, relevant results. Third, while some of the newer legal databases yield less unique and less relevant results, they can bring advantages depending on the research topic, the time period, and other contextual details.
This well-researched and well-written article is required reading for every attorney who performs research on behalf of a client and for every professor who teaches legal research or uses legal databases. Because we often don’t know what we don’t know, Mart’s work pushes us to think more deeply about our search products and processes. Mart’s results provide an opportunity to narrow the gap in knowledge by learning a bit about what we don’t know. Learning from this scholarly yet accessible article brings the reader closer to understanding how to derive the optimal output even without knowing the ingredients in the secret sauce.