NOTE: Eric Killough is the author of Virtual Canary in the Digital Mine, a bi-monthly column at AbovetheLaw. This article was previously published in modified form as a three-part series there.
As I said last week, this radical new technology that folks in our field just can’t talk about enough (TAR, CAR, Predictive Coding, etc) is not really all that radical and it’s really nothing to fear.
Technology Assisted Review is, in both form and function, not very different from a spam filter. You do a bit of preliminary review on a representative sample to train the system and your new high-tech best buddy says:
Oh, I see. If you like these 1,000 documents as potentially relevant, then I recommend you also take a look at these 10,000 documents over here. You’re going to love them! But, please, don’t waste too much of your time or your client’s money on those 90,000 documents over there. I’m 98.769% sure that there is nothing of interest for you there. Just to be on the safe side, however, I’ve put a little flag on them to remind you that, later, if the case hasn’t settled yet, you might want to randomly sample them as well — just to be sure I haven’t mis-categorized anything.
Isn’t that comforting? Aren’t you ready to let go?
I’d like to close the loop here with two related but separate topics: (1) how predictive coding engines work (generally) and (2) some recent case law that not only validates but, in some instances, mandates their use.
1) How predictive coding works (generally)
My hope is that you might be a little convinced at this point that you should at least be interested in possibly investing in a technology that will assist in reviewing the documents that I know are filling up in your workflow. Traditional, manual, review — as I’ve described it — is rarely, if ever anymore, the best first choice for a first pass review. There are simply too many documents and there is simply too little time. But, how do you convince those who hold the purse strings that TAR is a viable alternative? You need at least a basic understanding of how the technology works because, at a basic level, it actually makes sense.
Generally, there are two methods employed by predictive coding applications
SAMPLING AND CONVERGENCE
A subject matter expert (attorney) sits down with a random sampling of documents. She “codes” while the computer “watches”, building a decision-making model based on the attorney’s choices. Then, the computer “predicts” the coding of another set. When the computer’s predictions sufficiently match the attorney’s choices, it has learned all it needs to complete the batch itself.
A team of reviewers (not experts) begins to code while the computer “watches” and compares each response with all of the other responses. It makes its own predictions at the same time, and when its predictions match the reviewers’, it has learned all it needs to complete the batch itself.
As you can see, the differences are in details so fine that you and I do not really need to know about them. What’s important to understand is that, through the miracles of parallel processing, machines can now watch, practice and eventually mimic the choices made by humans as the humans are making the choices. This means a real-time, actionable feedback cycle that makes it possible for us to trust that, yes, we can program a machine to think like us. As I’ve said before, for now, all we’re asking the machine to do is work that humans were never really well-suited to do in the first place: digest and make binary choices about non-binary information at speeds not humanly possible.
2) Some important case law to guide you into the future
OK, so let’s assume I’ve convinced you and you’ve convinced the powers that be to let you bring on a TAR platform. Let’s further assume that your clients are on board and that your team has been re-deployed to address the new workflow. You’ve submitted a reduced budget proposal — you’ve gotten lean and mean — and you’re ready to unleash the power of TAR. This is the moment you’ve been waiting for: go, machine, go!
And, then, you talk to the other side. “Now listen here, y’all need to wait just a dadburnt minute” says the opposition’s comically antiquated trial attorney assigned to give you a hard time. The guy who still uses a pen. Who takes “notes” in a “notebook”. “Wait,” says he, “what is all this hokus pocus rigarmarole you’re stating about a computer reading documents? Even if you are just pulling my leg, and I sincerely hope that you are, I beg you to stop right here. This is lawyer’s work and I know Judge So-and-So — she will not go in for shenanigans masquerading as techno-mumbo-jumbo.” Maybe this happens in J. So-and-so’s chambers, even, and your opponent suddenly disavows any knowledge of computers
beyond pocket calculator capabilities. Maybe, still, J. So-and-so seems to agree and raises a skeptical eye at your assumption that TAR will be acceptable.
Well, 2012 was a big year for machines in the courts. Either the judiciary is just exhausted with not having more assistance during discovery, or the attorneys and expert witnesses have become more eloquent about explaining how the various technologies work, or a combination of both. As a result, today, there is really no excuse for not pursuing all available means of technological assistance in document review. Neither your opponent nor the judge can reasonably dismiss TAR without very good reason. TAR is now precedent.
Here are three important cases to put next to your new TAR user’s manual:
- February, 2012: “Da Silva Moore”: Da Silva Moore v. Publicis Groupe (Southern Dist NY. 2/24/12). This federal case to recognized Computer-Assisted Review as “an acceptable way to search for relevant ESI in appropriate cases“. * (NOTE: as of July 10, Da Silva Moore is under review at the Supreme Court. Petitioners argue that Judge Peck abused his discretion due to an untoward affection for predictive coding. See here.)
- July, 2012: “NDLON”: National Day Laborer Organizing Network v. U.S. Immigration and Customs Enforcement Agency, (S.D.N.Y. July 13, 2012). FOIA case in which a District Judge held that “most custodians cannot be ‘trusted’ to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities” and that [b]eyond the use of keyword search, parties can (and frequently should) rely on . . . machine learning to find responsive documents. Through iterative learning, these methods (known as ‘computer-assisted’ or ‘predictive’ coding) allow humans to teach computers what documents are and are not responsive to a particular . . . discovery request and . . . significantly increase the effectiveness and efficiency of searches.“
- October, 2012: EORHB v. HOA Holdings LLC (Del. Ch. 10/19/12). The first case in which a court directed the parties to use Predictive Coding as a replacement for Manual Review (or to show cause why this was not an appropriate case for Predictive Coding), absent either party’s request to employ Predictive Coding.
Predictive coding, or TAR, or CAR, or whatever, is here.
It is time for you to use it. If you do not, you will need a solid argument for why not. (Hint: because I want to run up the cost of discovery in hopes that my opponent will settle out-of-court or because I want to run up the cost of discovery because I like to bill for document review are not good arguments.)
Technology Assisted Review is also just beginning. For now, we will let the algorithms decide Responsive and Non-responsive, just like we let our emails decide junk and not-junk.
But, full disclosure, I also let my email decide “important” because it comes from my wife or my boss or my boss’s boss. I allow it to show me these emails prioritized over others that are, while not junk, not from anyone with whom I have a particularly special relationship. I also let my email decide “relevant to this project” because it comes from someone involved in the project and because it has a keyword or two associated with the project. And, someday, I will cook up the master search query that will let my email decide “important, drop everything and do it now” because it is able to understand key phrases like “do this now”. Like all trailblazing, this is a work in progress.
What’s all but certain is that, not yet, but soon, maybe even by the time this post goes live, you will also have the choice to let our algorithms decide “relevant to this issue”, “probably privileged”, or even, “hot”.
And, if we train our TAR engines well, we will find that their decisions are correct. At least, we’ll find that their decisions are the same as ours.
For today, please review AccessData’s entry into the predictive coding arena and please sign up here for a free live webinar. We’re very excited about the future of TAR and our place in it. We hope that you will be too.