Ricoh eDiscovery

Boolean Searches 201: Connectors and Wildcards

Posted by Michael Truelove |6 minute read

Jul 13, 2021 4:11:00 PM

Tuesdays Tip Feature Image - 27

When we first launched our Tuesday Tip series, I wrote the introductory post Boolean Searches 101: How to Locate the Documents You Actually Need. Today, I’m taking it a step further and sharing how connectors and wildcards can be used in text searches to get optimal results for your search queries. These principles apply to all programs including Relativity, Eclipse, eCapture and Nuix.

Using 'AND' in Boolean Searches

‘AND’ is probably the most commonly used connector. You can add it between any two terms, and the search will require both queries to appear in its results. Here is an example:

  • “here” AND “there”

When including 'AND', both “here” and “there” must exist for a document to be considered a hit.

Using 'OR' in Boolean Searches

The second most commonly used connector to use is ‘OR’. Unlike ‘AND’, which combines multiple searches and requires both terms to be present, ‘OR’ separates the searches into two. Think of this connector as if you’re asking the search engine to, “Find me either or one of these two terms.” If the term/phrase to the left of the ‘OR’ is true, then the search will hit on the document. If the search on the left of the ‘OR’ isn’t true, but the contents on the right side are, then it will hit on the document.

Naturally, this can sometimes cause confusion. I’ve seen many searches that were written so a single term would cause a hit, despite that not being the intention of the searcher. Here are some examples:

  • “here” OR “there”                
  • “here” AND “there” OR “anywhere”

In the first example, only one of the terms needs to exist in a document for it to be a hit. In the second example, “anywhere” on its own would cause a document to hit, but “there” would also require “here” to exist in the document for it to be a hit. I’ve seen lots of examples similar to the bottom point, sometimes with a lot more ‘AND’s on the left side. Often, what the searcher really wants is for the ‘OR’ to only apply to the last 2 terms. To achieve this, you need to use parentheses.

Using Parentheses in Boolean Searches

Here’s where things get a bit more complicated. Think back to when you first learned about BEDMAS in the seventh grade. When programs carry out search results, they essentially practice a branch of math: Boolean Algebra. Just as with regular mathematical equations, parentheses change the order of operations. Incorporating parentheses into your search queries works in the exact same way.

Drawing from the previous example, here is how you can cause different results to your search by incorporating parentheses:

  • “here” AND “there” OR “anywhere”
  • “here” AND (“there” OR “anywhere”)

Like we discussed in the previous example, the search would hit if only “anywhere”, and not “here” or “there”, was in the document. Or, if “anywhere” wasn’t in the document, it would require both “here” and “there” to be a hit. The search in the second example will always require “here”, but it must also have one of “there” or “anywhere”.

Just like with math, there are times where the parentheses won’t matter at all. In a formula like 5+(6-3)+2, the parenthesis do not matter; you could remove them and the result would be identical. The same thing applies to text searching. In a search like “here” AND (“there” AND “anywhere”), the parentheses serve no purpose. You could include or remove them, and it will have no effect on the results.

Using w/# and pre/# Proximity Searches

Next comes proximity searches. These are a little like an ‘AND’ search, but instead of requiring both terms to simply exist within a document, it needs the terms to be close to one another in order to return a successful hit.

First, we’ll look at the within search. In the search, it will be written as ‘w/#’ with a number that represents the amount of words allowed between the two terms. For instance:

  • “here” w/5 “there” would require the two terms to be within five words of each other.
  • Therefore, a document that has “here are twelve soldiers, there on the dock” would be a hit. But, “here are twelve soldiers, sitting on the dock there” would not return results because too many words exist between the two phrases.
  • “there were twelve soldiers here” would also cause a hit, because the within search doesn’t specify an order of words.

Alternatively, when using 'pre/#', the order of words does matter. The term to the left of the pre/# must come before (or precede) the term on the right. Everything else about it works identically to the w/# search. Using the same examples above, but replacing w/ with pre/, the first one would still hit, because the term on the left precedes the term on the right. The second one wouldn’t hit (assuming we replaced the w/5 with pre/5), and the last one would not hit, because the right term precedes the left term.

Wildcard* Boolean Searches

Using ‘wildcard’ (with an asterisk) lets you search for the beginning, middle or end of a word, without specifying how many characters can be missing. Here are some examples:

  • *ing
  • stop*
  • *hen*
  • apple * apple

With the first example, any word that ends with ‘ing’ such as coming, stopping, running, etc. would appear in the results. It would also search for the term ‘ing’ on its own. The second example would hit on any word that starts with “stop”, including stopped, stopping, as well as just stop. Because of the use of asterisks at the beginning and end of the term in the third example, the results would hit on any word with hen inside it. For instance, hen, prehensile, Henning, then, etc. The last example uses an asterisk to allow any single word between the two words. A phrase like “Let’s pick that apple. The apple was delicious.” would hit on that search.

Using Quotations in Boolean Searches

The last thing we’ll look at is quotations. Quotes can be used to indicate that you’re searching for a string of words to be returned in the results verbatim. It also allows you to include words like ‘or’ as well as ‘and’ in your search. Depending on the tool you’re using, you may need to put quotes around any two (or more) word phrases you want to search, though not all tools require this. Let’s look at an example and see how it would search differently with and without quotes:

  • “Let’s go to the park, or maybe for dinner and a movie”

With the use of quotes, this would simply search for that sentence. Without the quotes, your search tool would recognize the ‘OR’ and ‘AND’ in the sentence and assume they’re being used as connectors. In that case, it would really be searched as “Let’s go to the park” OR “maybe for dinner” AND “a movie”. Meaning that just having “Let’s go to the park” would hit, or if that didn’t exist, then having both “maybe for dinner” as well as “a movie” in the document would cause a hit.

---

In the next article on Boolean Searching, we’ll go over how the index is created and what it contains along with what Stop Words and Noise Words are, and how they affect the index and search. Click here to subscribe to our blog to be notified as soon as it’s published.

Have questions about your Boolean searches? Get in touch with our team today.


You may also be interested in...

Tuesdays_Tip_0_2019-10-21

Boolean Searches 101: How to locate the documents you need

Boolean searches are used to search the text of documents in a database. They can be used early in the process to decide what documents should get exported to the review platform, during a review to prioritize documents or simply on the fly.

Topics: Tuesday's Tip, Michael Truelove

   

Tell Us What You Think.