NOTE: Eric Killough is the author of Virtual Canary in the Digital Mine, a bi-monthly column at AbovetheLaw. This article was previously published in modified form as a three-part series there.
First, here’s a little story about me: my life in the legal world began as a paralegal nearly a decade ago. My first case was a GIANT patent infringement case that was already six years old at the time I joined and had involved as many as five companies, multiple US courts, the ITC and an international standards committee. And I, a former librarian who knew a little bit about a number of things, knew nothing about any of this.
On my first day, my supervisor (a paralegal with at least eight other cases driving her batty) sat me down in front of a Concordance database with a 100,000+ patents and patent file histories. “Code these,” she said. I learned that “coding”, for the purposes of this exercise, meant manually typing the inventor’s name, the title of the patent, the assignee, the file date, and other objective data for each document. That, friends, was pretty much the extent of my training. I worked on that project – and only that project – for at least the first six months of my job. I’m not sure about the actual duration of the project: after a week or so, time began to blur.
What I know, in retrospect and with absolute certainty, is this: as time began to blur, so did my judgment. So did my attention to detail. If you could tell me that I did not make at least one mistake a day – one inconsistent spelling, one reversed day and month, one incorrectly spaced title – I frankly would need to see your evidence. I would not believe it. The human mind can be trained to do machine-repeatable tasks but it is not a machine.
After some time, I became slightly senior due to the fact that others were hired. I went on to code on other projects. And then I began to manage teams who coded still other projects. Most of these projects covered the same ground: coding objective data (author, date, doctype, etc.) so that the attorneys could perform subjective searches for it, maybe, someday. But, other projects were a little more advanced: we examined doc dates to determine relevancy or maybe we examined names and email addresses to determine privilege. For the truly advanced, if you saw something that was handwritten or that had a handwritten note, it was immediately suspect and perhaps “Attorney Work Product”.
Now, consider this: the firm I worked for charged its clients at least three times what it paid me for my time (billed, of course, by the hour) and for the time of my colleagues. And I made more doing this than I’d ever made before, especially when I worked overtime (which was pretty much any time I wanted to). Meanwhile, other firms billed their clients the same way for the time of my peers at other firms. Actually, time may have been the best accurate measure of cost because the only thing really required of us was give our time to a truly robotic, thoughtless practice. The only explanations I can come up for as to why this was accepted are that either (1) clients were so successful that they no longer reviewed invoices or (2) were so beaten-down by years of steadily increasing attorney bills that they assumed they had no other choice.
So, my robot brethen and I continued, coding like fallible machines.
And then 2008 happened.
And the bottom-line is an engine of change. Clients began to ask for the unthinkable: a better way to review documents. They began to (shudder to think) pay attention to the work folks like me were doing and weigh the value of that effort against the price they had been paying . . .
In the meantime, back in the world outside of law firms and their mega-sized clients, the 00’s had been a decade of startlingly-fast change. “Geeks” became cool, “googling” became a lower-case verb and librarians had begun pondering extinction. The Yellow Pages had long been reduced to a doorstop, Encyclopedia Brittanica stopped printing and Wikipedia became a respected secondary source instead of a punchline. Why? How? Because “technology” now “assisted” humans in “review”ing all of the information we could possibly need without ever having to crack a book or a newspaper. The evolution of the human-computer had begun. Just ask my two-year-old, if you can pry her away from “her” iPad.
OK, I got to write about my daughter. Thank you very much.
It’s time to pan out a bit. Why have I told you about all this that you probably already knew? Because I want to tell you now that I really didn’t think there was anything wrong with what was going on. It was, crazy as it sounds to me now, natural. And then, not too long ago, a few years maybe, I started to hear rumor of strange forces beyond the horizon. It started with wireless internet connections, the VCR-less television recorders, then cashier-less checkout lines at the supermarket, and then at cashier-less checkout lines at Home Depot. Then, murmurs even among the hushed halls of the law firms: something was coming. Predictive, wait for it, coding. Horror of horrors! Like the travel agents of yore, like the persons at 4-1-1, like the reference librarians — the document reviewer was endangered! How could this be? How could a computer possibly learn to tell A from B, responsive from non-responsive? And what about the legal world’s sacred duty to diligently lay a human eye on every single piece of paper? If we replace doc reviewers, what’s next for competent representation?
What I know now is that was simply crazy-thinking. What I hope you and everyone you work with will soon learn is that the introduction of a “predictive” technology into your review workflow should not be an unwelcome event.
Let’s turn to present-day now and consider a handful of formerly amazing technologically-assistive sidekicks that are so useful (if occasionally a nuisance) that we no longer consider them innovative because they have become essential.
Here’s a short roll call:
- Spam filters
- “Adult” content filters
- Explicit image searches
- Targeted advertising
- iOS’s auto-correct
I know, I know. You’re thinking, “Wait, I thought he was going to try and convince me that predictive coding is a good thing”.
Sure, we’ve all missed an important email because the spam filter caught it. Or we’ve tried to do a legitimate search for, I don’t know, “chicken breasts”, only to be foiled by an overzealous content filter. Or we’ve attempted to use an explicit image search to filter out, ahem, questionable scenes only to find that the filter really didn’t understand that that picture was a picture of that. We’ve also been persistently irritated by Google’s insistence that a “recommended” site for us might be this when actually we were looking for this. And the pithy things I can say about auto-correct would literally write themselves if I were using my iPhone to write this.
So, why would I, as the boxers say, lead with my nose? Because these examples are all hilarious exceptions to the rule otherwise established by the widespread adoption of these technologies. The rule is this: “you can absolutely teach a machine to think like a human when what you’re asking it do is a task humans have been trained to do like machines”. The spam filter, just like you would, sees an email from an address that does not look like a real person’s address or a message with a suspiciously generic subject and it trashes it without opening it. When it first gets going, you know you need to QC its work. And when an email you’ve been promised fails to arrive, go look in the spam folder. If you find that email or you find something else that you value more than the machine predicted you would, you say to it, “this is not spam”. At that point, it writes a new rule for itself and no longer trashes emails from that source. The machine has learned to think like you.
And you rejoice. What’s more, when you go to look for the missing email, you will see just how much assistance you’ve been receiving from your technology. The folder was full of spam! Of mind-numbingly awful subject lines and incomprehensibly randomized From: addresses. And again, you rejoiced: you never had to look at all that garbage with your precious human eyes.
And that is “Technology Assisted Review”. The future is here and, as always, it turns out that it’s been here all along. When we talk about “Technology Assisted Review”, it is not controversial and untested pseudo-science. It not really anything all that “new”. It is, quite simply, litigation-grade spam filtering.