From the people who advocated the backup system, “Ghost”1 as an attempted eDiscovery collection solution comes the extremely bizarre claim that, “in traditional civil litigation – even the behemoth eDiscovery cases that get all the bloggers blogging – forensic imaging simply is not required or needed.”2 Now I can understand Symantec changing their tune, since as best I can tell Ghost has long since been pushed out of the eDiscovery market and Symantec has acquired a new vehicle Clearwell Systems that doesn’t have a long history in collection, but such an extreme 180 seems, at best, disingenuous. While good marketing is meant to educate, when your view of the world is informed solely by the capabilities of your products, it quickly loses its meaning.
As for the merits of the argument that forensics or bit-by-bit copies are not really needed for civil e-discovery, lets break down the points made by Clearwell’s Brandon D’Agostino in the blog post Bit By Bit: Building a Better E-Discovery Collection Solution one by one.
By my reading the first point D’Agostino seems to make is that bit-by bit-imaging of hard drives, “dramatically increase[s] the cost associated with electronic discovery – this process adds unnecessary complexity in downstream phases of eDiscovery and leads to vast over-collection.” This argument is both true and false, but largely misses the point. One of the reasons it’s true is that companies like Clearwell charge customers based on the number of gigs that they process. Given that pricing model, if you use a similar product for your data processing there is no doubt that you will end up paying more the more data you process – and that as a result it will make sense to collect less. However if you move away from vendors that have these per-gig pricing models and instead purchase one of the many solid products that don’t charge in this way, you can feed in as much data as you want without inflating the costs of the discovery process, making this argument moot.
Furthermore, there are good reasons to “over-collect” or be more inclusive once the cost burden of paying per-gig is eliminated. Collection is a difficult process that can be disruptive to IT and the custodians involved. If you are collecting from a laptop owned by a mobile user, it is not uncommon to only get one shot at collecting the data. In that case you really should over collect and bit-by-bit copies are often the fastest way to do that. They also have the added value of ensuring you have everything so if in the future the relevant search terms change or something happens to the target machine you’ve performed adequate due diligence. This is exactly why almost every major e-discovery consultancy utilizes forensics collection tools. It isn’t because they think deleted data or slack space is going to be relevant, it is because bit-by-bit is a faster and more thorough process that ensures one never needs to return to the well.
The next assertion D’Agostino makes is that the top experts in the field have decided the days of imaging everything are over,
…So, with the top experts in the field saying the days of “image everything” should be over, why does it still happen? Why are the victims of this antiquated workflow still paying the exorbitant costs of a solution that does not really meet their requirements?
There are two points I would make in response here. The first is the top experts aren’t saying the days of imaging everything are over at all. What the experts are saying is that opposing counsel can’t require you to image everything if you don’t want to. But that doesn’t mean you can’t or shouldn’t for the reasons previously mentioned. And second, D’Agostino’s repeated contention that this workflow results in exorbitant costs continues to fall flat. For example a company can purchase FTK, which will enable the user to collect full images (or leverage a targeted methodology (keyword, file type, date range, etc) across the network for approximately $3k. So full disk imaging isn’t expensive at all. In fact the tools needed to do it are extremely inexpensive. Again, full disk imaging only inflates cost if the customer is utilizing a Clearwell-like appliance that charges per gig for processing. If that piece of the puzzle is removed, the cost of a collection is not a function at of the amount of data collected.
Next under attack are logical imaging containers. For those that don’t know what these are, they are proprietary formats from companies like AccessData and Guidance Software that allow users to better maintain and prove chain of custody. The logic D’Agostino uses is that these containers cause problems because downstream systems need to,
…unpack or parse these proprietary container formats for processing and analysis. In fact, even software from the vendors who created these container formats must “crack them open” to get to the contents within. This seems to add a layer of complexity that has not been needed since the days of the external examiner coming in with her forensic toolkit to do drive images.
This is another statement that seems more like marketing hyperbole than education. In reality, image containers are one option among many formats offered by both Guidance Software and AccessData. Also, having the option to ensure the validity of chain of custody via a logical imaging container is a good thing, not a bad one. Finally, the idea of “cracking them open” is a misnomer. The task of opening these types of files takes literally seconds for any comprehensive processing product. So this ‘layer of complexity’ D’Agostino describes does not really exist and shouldn’t scare users off a viable option for stronger chain of custody preservation. I invite D’Agostino to download a copy of AccessData’s popular FTK Imager product to first create and then “crack” into an AD1 file. It’s literally as simple as creating and reading a basic text document.
D’Agostino then takes aim at the practice of using court-vetted technology to satisfy chain of custody concerns in court, stating that he has an alternative approach. He describes this process as one where he computes hash values and checksums of all the files he collects as a way of validating their integrity. I assume he does this for each file and then maintains some form of reporting to ensure they are not lost or altered. To be honest that doesn’t seem like the most reliable best practice and D’Agostino even acknowledges its weakness when he suggests that “opponents will still bring up claims that the evidence must have been altered, or the expert familiar only with forensic imaging technologies will try to use the argument that only vendor X’s technology is ‘court vetted,’ so any other method is not acceptable.” I don’t doubt D’Agostino is enough of an expert to testify on the stand about his methodologies and have that be sufficient to satisfy chain of custody concerns, but for companies taking this process in-house and relying on internal IT, it’s a different situation. Do they really want to be in the position of having internal IT on the stand defending their collection process and its chain of custody details? Honestly I think it makes a lot of sense for internal IT to have a forensics container supported by case law to fall back on instead of having to testify in detail as to why their home grown approach is sufficient.
D’Agostino ends his blog by arguing for targeted, “forensically-sound collection of ESI using streamlined and automated solutions that maintain custodian relationship.” I find it hard to argue with that conclusion – after all it is exactly what companies like AccessData have been enabling for years. (Why some are claiming this capability is “next generation” is a bit of a mystery.) I don’t, however, understand why or how that conclusion is in conflict with forensics containers or the ability to, when desired, collect forensic images. It seems to me that the real solution is one that is flexible enough to allow users to decide not only how to collect their data, but how to store and maintain those same files. The only reason I can see to argue against this type of flexibility is if the solution you are trying to sell simply can’t provide it, which seems to be the situation here.