As a litigation support trainer with AccessData, I’ve had the opportunity to travel across the U.S. and the U.K., training users on Summation. Despite the different workflows and judiciary systems, the goal remains the same for all – find the relevant data, review it ASAP, and hope to find that smoking gun. The fastest and most efficient way to find this data is through data culling.
Data culling is the process of searching and isolating your original data based on specific criteria, such as date ranges and keywords. This process simply hides documents that may not be relevant to your case, allowing a quick and proficient review of documents before production.
We used to rely heavily on vendors for this process because many of the products out there did not have Early Case Assessment (ECA) capabilities and certain products were not forensically sound. With Summation, you have all the ECA capabilities, can seamlessly cull your data, and give reviewers access to the relevant documents – all without ever having to leave the Summation platform! Sound difficult? It really isn’t. Like many of our Summation users, I came from the iBlaze days and was a bit intimidated by the robust functionality Summation possessed. Don’t be. Technology is evolving and change is good… well, in the world of Summation, at least. Data culling in-house can save you time and money so why not dive in and give it a try.
Let’s discuss all walks of data culling, pre- AND post-processing:
- DeNISTing is the removing of non user created data such as system files, operating system files and other file formats that generally hold no evidentiary value. This is a simple way to reduce non-relevant documents during data processing.
- Deduplication is the identification and separation of duplicate documents and emails. These documents and emails are flagged across the entire case and filtered by the custodian, allowing a faster review of relevant data without repeatedly seeing the same files.
- Cluster Analysis, or Near Dupe Analysis, is the process that compares the contents of documents and emails in order to group similar documents. This allows reviewers to view the data as a group, therefore saving time.
All three of these pre-processing data culling methods are available to you in Summation and, even better, in one location within the user interface. Once the data is processed, there still may be thousands of documents and emails that are simply not relevant to the case. There are many options in Summation to remove the rest of the non-related data, such as filtering. Filters are simply predefined categories populated with metadata during ingestion and processing of the data. Evidence can be filtered by criteria such as custodians, date or date ranges, file size, display names and many other metadata fields. Filtering in Summation is by far one of the easiest and most intuitive features to use. Keyword searching can further the data culling process. Within Summation’s advance search feature, you can build complex queries without having to know the syntax, parenthesis, operators and other details of crafting a search. This is a great advantage when combined with advance search variations like stemming, phonic, synonyms and fuzzy searching.
Data culling assists us in separating and prioritizing relevant and non-relevant data to save time, money, and gets us to the prize: an efficient production set for review.