Ricoh eDiscovery

Boolean Searches 101: How to locate the documents you actually need

Posted by Michael Truelove |2 minute read

Nov 19, 2019 1:54:40 PM


Have you ever run a complex Boolean search and didn’t know why some documents were showing up in the results?

Boolean searches are used to search the text of documents in a database (or a sub-set of documents). They can be used early in the process to decide what documents should get processed or exported to the review platform, during the review to prioritize documents or simply to find and review documents on the fly.

I’ve experienced Boolean logic in three very different fields throughout my life, when designing and building digital circuits, as its own branch of algebra and most recently in the world of eDiscovery.

Even with all this experience, I often question whether Boolean searches are actually searching for what they’re meant to be finding. In short, "Boolean" means there are only two possible options: True/False, 1/0 and so on. Naturally, I look at the search and break it down into its different parts. I work out what the different True/False questions really are and ask if a certain part is really needed for a document to be considered a hit.

Take for example, these three different (but VERY similar) searches:

1. "here" OR "there" OR "everywhere" AND "beside" OR "lead" AND "life"

2. "here" OR ("there" OR "everywhere" AND "beside" OR "lead") AND "life"

3. ("here" OR "there" OR "everywhere") AND "beside" OR ("lead" AND "life")

The only thing that’s changed is the parentheses, but each of these searches has different criteria for what is required for a document to be considered a "hit".

To easily tell what will cause a hit or not, you want to break the searches into the separate Boolean choices:

1. |"here" OR| "there" OR| "everywhere" AND "beside" OR| "lead" AND "life"

2. |"here" OR| (#"there" OR# "everywhere" AND "beside" OR# "lead"#) AND "life"|

3. |(#"here" OR# "there" OR# "everywhere"#) AND "beside" OR| (#"lead" AND "life"#)|

I’ve put "|" at the beginning and end of the search strings, and after any OR outside of parentheses. I’ve also put "#" at the beginning and end of every parentheses, and after every OR inside parentheses. We can use these to change the statements into True/False. Let’s see what that looks like for the document if it has "there", "beside" and "lead" in it, but none of the other terms:

1. |False OR| True OR| False OR| False|

2. |False OR| (#True OR# False OR# True#) AND False|

3. |(#False OR# True OR# False#) AND True OR| (#True AND False#)

We can then do the logic around the parentheses:

1. |False OR| True OR| False OR| False|

2. |False OR| False|

3. |True OR| False|

We can see that this document would hit on searches 1 and 3, but not on search 2. You can use this same method to work out what your search is really searching for. Plus, it still works with more complex searches, such as ones with Proximity searches, NOTs, Stemming or Fuzzy searching.

If you need help with formulating your searches, you can always reach out to Ricoh eDiscovery or comment below.

Topics: Tuesday's Tip, Michael Truelove


Tell Us What You Think.