With apologies to Samuel Taylor Coleridge, I want to take a few moments to track what happens to data that is sent to a law firm, or vendor, as part of the discovery process, and challenge us all to think about the implications of how many copies of this data we end up with.
In the collection process, we first identify the information we are going to need to put a hold on, and then decide what data we need to collect and pass on for attorney review. Generally, the client will make a copy of the data being sent off to either a vendor or to outside counsel.
So that’s one copy of data. Within the law firm, the first thing I would be doing as a litigation support professional would be to take the media delivered from the client, and making a working copy on our network environment (Copy number 2). Then I would file away the original media for safekeeping.
From there, we’ll load this set of data into a processing or ECA tool to either process as is, or do some culling. The tool will take the working set of the data, extract all of the compressed formats (PST, NSF, ZIP, etc.) and import it all (Copy number 3).
Now that it’s in the ECA or processing tool, we’re going to export it out to a review platform so that it can be reviewed by a case team. This may or may not be a full copy of the dataset. Hopefully, the tool has allowed us to cull some of the data, but we now have, at the very least, a fourth copy of some of that data.
From the review platform, we will be producing another copy of data to opposing counsel, possibly producing a subset to an expert witness or two, attorneys are printing copies of documents to read offline, or creating notebooks for depositions, and any number of other copies are being created for various purposes.
As you can see, this becomes very difficult to keep track of, and I haven’t even talked about attempts to keep this data secure, which could be another article in itself. It also gets expensive. Storage may be cheap, but building the infrastructure for all this storage and keeping this data backed up sure isn’t. This is one, though not nearly the only, reason that I see two trends developing in the legal industry: The first is the move to integrated Collection/ECA/Review/Production tools. If you have one set of data being used across the different stages of the EDRM it eliminates the large number of copies that are left out there. The second is that more organizations who have the resources are moving toward hosting these platforms themselves, or working directly with vendors to host the data and make it available to outside firms. This allows an organization to have much more control, not only on the number of copies of data that exist, but also on when the data can be removed. Unfortunately, law firms aren’t exactly known for getting rid of data in a timely fashion and data not being purged within the normal data lifecycle policy becomes a risk.
Regardless of these trends though, it’s important for organizations to create and communicate the expectations on data handling to their outside firms, vendors and other third-parties, especially on timelines regarding archiving, or purging, non-active case data. These policies and expectations should be agreed to at the beginning of any engagement, and followed up on after the case is over. In my experience, this is where most communication fails. The case team finishes a case, and moves on to the next thing. Meanwhile, the folks who actually take care of the database administration, in the absence of direction from the case team, leave the data where it is. A reminder from the client can go a long way towards letting those folks know when to get rid of active data, and believe me, they want to get rid of data when they can!
It will take some work, and there are lots of variables to consider in each case, but the cost of having your data sitting around networks beyond your control for years and years might just prove more painful.