5 Ways to Deal with Large Data Dumps

Posted by Esther Labindao |2 minute read

Apr 21, 2020 10:24:27 AM

Tuesdays_Tip_14_2019-10-21

“Help! I received 1 TB of data from my client and I don’t know what’s in there and what’s relevant!”

Does this sound familiar? If so, you’re not alone. It is not uncommon to feel overwhelmed after receiving large data sets from clients especially when you’re unsure of how much data needs to be reviewed. Remember: just because it was collected, doesn’t mean it all needs to be processed and reviewed.

The next time you receive a large data dump, try these practices when building out your strategy. All five steps can be done remotely and help you through even the most frustrating cases.

1. DeNIST

A DeNIST is a standard list of known system applications and files that are unrelated to a matter. Most eDiscovery software allows administrators to load the NIST list so files can be removed automatically during processing. This helps eliminate any unwanted files and thus reduces the document count.

2. Deduplicating

Deduplicating, otherwise known as deduping, removes any exact duplicates based on the Hash Value of the electronic document. Depending on the tool being used, there are different methods of deduping your data. The most common method is to dedupe based on a family level. This means that emails with identical attachments received between multiple custodians would be removed. However, if a word document is sent in two different emails to different recipients, that would not be removed because the parent (email) is different. Deduping is a common practice and helps to easily reduce data between multiple custodians.

3. Date Range Filtering

If your matter is within a specific time frame, you can easily apply a date range for the data to be reviewed. This means only the documents within those specific dates would be processed or exported for your designated review tool.

4. Search Terms

Applying search terms can greatly reduce your data. The search terms can be applied either during the processing phase (so only the potentially-relevant data is exported) or once it's uploaded to your review tool. Depending on the complexity of your case and the tool being used, you can create searches that range from simple to complex.

5. File Analysis

File Analysis, Early Case Assessment or Data Assessment — however you refer to it, this process can analyze your data and provide crucial insights. Through programs such as Active Navigation, you can:

  • Get a high-level view of the folder structures within the data dump
  • See an overview on the types of data you're dealing with 
  • Determine whether specific folders can be completely excluded from processing 
  • Discover the true value of your data 

Each insight will help you pare down the data and expatiate the task of sorting through large sets. 


If you have questions about managing large data sets or would like to learn more about Active Navigation, reach out to us today

Topics: Tuesday's Tip, Unstructured File Analysis

   

Tell Us What You Think.