Metadata is data about data. In a Word file, the document contains all of the data: words, put into paragraphs, displayed on pages. But, as we know, behind the scenes Word also keeps metadata about the document including the author’s name and the date it was created, modified and printed.
In today's Tuesday Tip, we'll be showing you how to collect, copy and move data without accidentally changing the hidden (and extremely important) details about the file. While most eDiscovery tools won’t capture unusual metadata, you can create your own custom fields in Word. These additional fields will allow you to keep track of information that may be useful or pertinent for future reference.
Keep reading to learn how to properly manage your files during the data collection process.
Where is metadata found?
The Word examples above show all the metadata fields that exist within the file type. If you change anything in these fields, it will change the hash of the file and won’t register as a duplicate of the original file (though Near Duplication Detection would recognize the two files as 100 per cent matches since the contents of the file have not changed).
Metadata is also found outside of the file. We call this File System Metadata because it's tracked and saved by whichever system the file resides in. Different file systems will track different information, however they all track the File Name and likely the Date Created and Date Modified details (note: attachments to emails are an exception to that).
A TXT file attached to an email will not have dates associated with it since it has no internal metadata. For these attachments, it's not unusual to have no date associated with the document or for the only date of the file to be the date it was processed such as when it was extracted from the email and saved as a stand-alone file. ZIP files on the other hand maintain this File System Metadata.
How to avoid changing metadata
It’s important to be careful about inadvertently changing metadata during data collection. If you copy a file, the “Created Data” of the new copy will be that of the time it was copied. If you move a file, it doesn’t change these fields.
The best way to ensure you're not changing the metadata of the data you collect is to "containerize" it. This is done by putting data into a container, such as a ZIP file or a PST, which you can copy and move without affecting the data you see in your review platform.
As an example, let’s say emails are being collected by a clients’ IT department. Typically, they’d go into their Exchange server and run searches on the selected accounts. Next, they’d export the hits as PSTS, with one PST per Custodian or account. In this example, the metadata of the PST would point to the IT user as the creator of the file and its Created and Modified Dates would be when that PST was exported from Exchange. This information wouldn’t likely be useful in your case.
If you know there's a folder on the network you want to collect so it can be reviewed, instead of just copying that data, the best thing to do would be to ZIP it. Windows can create ZIPs on its own, with some limits (most versions cannot create ZIP files larger than 4 GB). However, there are free programs such as 7-ZIP (which can be downloaded from https://7-zip.org), which can ZIP huge data sets. In my experience, I’ve created ZIPs well over 200 GB using 7-ZIP.
Once your data is containerized, you can copy/move those containers all you want and have no worries about the metadata being affected. All the metadata you care about will be safe inside your containers.
Looking to better understand metadata? We can help. Contact us for assistance.
Don't forget, if you have a question for our team or a topic you'd like us to cover, be sure to comment below.
You may also be interested in...
Boolean searches can be used early in the process to decide what documents should get processed or exported to the review platform, during the review to prioritize documents or simply to find and review documents on the fly.
Learn an alternative way you can create padded zeros by using a formula. In this Tuesday Tip we share how you can do this in three easy steps, and (once you’ve got that down pat) how to combine all the steps into one single formula.