Ricoh eDiscovery

Our Flawed Love of Keyword Searches

Posted by Sean Lynch |3 minute read

Sep 26, 2017 11:15:00 AM


In eDiscovery, lawyers often think they’re experts at crafting keyword search terms to reduce the volume of documents for manual review. Unfortunately, lawyers, and human beings in general, are not good at stringing together Boolean search terms that result in finding all relevant, or potentially relevant, documents.

The problem with developing highly-accurate keyword search terms is often caused by one of two things:

  1. The human tendency toward confirmation bias – “the tendency to search for, interpret, and recall information in a way that confirms one's pre-existing beliefs or hypotheses”. The Psychology of Judgment and Decision Making, p. 233

  2. Humans don’t know what they don’t know, or what I like to call, the “unknown unknowns”.

The result? Keyword search strings inherently miss the "unknown unknowns", and are therefore not complete.

Here's an example...

Let’s say that all discussions of “dog” could make documents in your data set likely relevant.

You craft a keyword search for all documents that contains the terms (dog OR doggy OR doggie OR pup OR puppy) intending to include all potential variations of the word “dog" in your keyword search. But, what you don't know, or failed to consider, is that one of the custodians of your client's documents refers to his dog as a “Ralph”.

However, because your initial keyword search for variations of “dog” returned a number of relevant documents, you assume that your search was a success (confirmation bias), but you missed all of those documents that discussed Ralph (unknown unknowns).

Keyword Searching vs. Technology-Assisted Review (TAR): Three Key Takeaways from the Cummins Decision.

In the recent court decision FCA US LLC v. Cummins, Inc., judge Avern Cohn ruled that utilizing a Technology-Assisted Review (TAR) approach to reduce the volume of data to be reviewed (i.e. pinpoint the most relevant documents for human review), rather than using keyword search terms, was the Court’s preferred approach. Unfortunately, the judge did not provide detailed reasoning, but there are still things we can unpack from the decision:

1) TAR helps address concerns about the human inclination toward confirmation bias. 

Rather than spending time developing vast lists of complicated keyword search strings, with the expectation that the terms will unearth relevant material for review, TAR significantly reduces the risk of confirmation bias AND is good at finding relevant materials with proper training.

2) TAR addresses the inherent dangers of not searching for the inevitable unknown unknowns.

The TAR process is designed to identify documents based on the concepts contained within them, and is not limited to exact keywords, using wildcards, or properly estimating proximity searches. Conducting a TAR process and leveraging other advanced analytics such as conceptual categorization can also help uncover those unknown unknowns. With TAR, it’s unlikely that the term “Ralph” would be obscured as a potentially relevant term.

3) TAR eliminates the need for lengthy negotiations with opposing counsel over the content of keyword search term strings.

Gone will be the days of arguing ad infinitum about whether the proximity should be “w/ 5” over “w/10”, or waiting for hit results only to continue to refine further and further until you reach a random number of documents for review which “seems proportional”.

Reliance on the time-consuming crafting and utilization of keyword search terms to locate potentially relevant documents in a collection gives a false sense of control and incomplete insight into what is contained in the data. As collections continue to grow, confirmation bias and unknown unknowns can become significant problems that may also raise the ire of the court (and clients) if productions are found to be incomplete. These problems can add even more time and cost to correct. TAR helps to control dangers related to keyword searching and reduces costs by yielding a more accurate review population.

Download this guide as your first step toward reducing costs and increasing efficiency on your document review.

5 Ways to Reduce Costs & Increase Efficiency_Document Review.pngBy combining the human understanding of an issue, with objective, advanced analytical software, it is possible for even smaller teams to analyze and categorize large collections of electronically stored information in an efficient and cost-effective way.



Topics: Intelligent Review, Sean Lynch


Tell Us What You Think.