Near-Duplication Detection | Ricoh eDiscovery

Statistics have shown that every collection of documents contain between 25% and 50% near-duplicates, which are similar documents containing formatting and/or textual differences. These are distinct from duplicates, which are exact copies. Near-duplicates include files with:

A percentage of textual differences (by far the most common)
Variances in formatting (such as bold or italicized fonts)
Different file types (such as an MS Word file converted to PDF)

The return-on-investment on near-duplicate detection is unquestionable. Industry studies have shown that the cost of legal document review has a significant impact on the overall cost of litigation. When near-duplicate documents are not identified or grouped together, there is a significant risk that similar documents (paper or electronic) will be reviewed multiple times by different lawyers resulting in wasted time, extra cost and the risk of subjective coding inconsistencies. Near-duplication detection costs pennies per document and allows lawyers to review similar documents in groups dramatically increasing the speed of document review while lowering the associated costs by 25% or more.

The Ricoh eDiscovery near-duplicate solution, which identifies both duplicate and near-duplicate documents, is used with electronic data, scanned/OCR'd collections or a combination of both to identify and group documents prior to a full document review. The results are then output in a suitable format for many standard litigation support software.

Near-Duplicate Detection

About Us

Get in Touch