When you jump online and search for something, do you love that the search engine is smart enough to recognize the phrase and try to finish it for you? How about this one . . . you’re shopping online and suddenly you’re directed to what “others” looked at or purchased that are similar to what you’re looking at. Based on what the system knows, it has sped up the process for you. The users before have taught the system, and then the system applied what it learned so that the next user would benefit by the expedited process. Now that the engine has either finished the phrase for you or filtered similar items to the top, that doesn’t mean you have to proceed with that result or purchase everything on the page; it’s just a ‘better’ starting point to begin to use your human eye to determine what you want to view or what you want to purchase. It also doesn’t mean that there aren’t other relevant items out there, but at least you’re off to a good start. You may recall that before these nifty features existed, it was sometimes exhausting to search the internet or shop online.
The same holds true for document review. What I’ve described above is predictive coding. Document review becomes tedious and the accuracy of the review begins to decline after a while. These features that you’re already familiar with are now at your fingertips to accompany document review. On the eve of the next release of Summation, I think it’s important for our users to understand that they can feel just as comfortable with predictive coding as they did when they embraced the features online for searching or shopping. In order to understand the process, there are three steps to predictive coding: teaching the system, applying the system’s learning, and performing quality control.
First, in order to teach the system, identify a data subset that represents the file types existing in the entire collection of documents you wish to predictively code. This subset will be referred to as “the seed set”. Have a group of your reviewers code these documents using a specific coding layout. The reviewers will identify whether or not each document is responsive or privileged and capture keywords along the way. Meanwhile, the application will monitor the reviewers’ coding and will generate an algorithm to use in performing its own analysis.
Second, once your team has completed its review of the subset, you engage the predictive coding tool. It will utilize its learned algorithm and conduct its own review of the seed set. Then, the system will generate a confidence level score letting you know how accurate it was and how confident it feels about going forward on its own. If the score is not high enough, you should continue coding additional documents and reapply the tool until you reach the desired level of confidence. Once you reach the level you desire, allow the system to analyze the remaining document collection.
Finally, in order to perform quality control, filter on the documents that were just coded and confirm the accuracy. Now, you can organize by predicted degree of relevance or responsiveness and distribute them for further review with human eyes to bring the review process full circle.
There is a lot of hype in the industry on predictive coding that is wrapped around judicial opinions. Many of you may be waiting for that nod of approval encouraging the use of predictive coding.
Here’s an exercise for you: Take a moment away from this blog and Google “Judge Peck”. Did you notice that the computer automatically wanted to take you to “Judge Peck predictive coding”?
Predictive coding is simply using the already-tested technology to cull your data so that what is most likely relevant or privileged is filtered for you to save time and improve accuracy. It is still necessary for you and your team to view these documents and make a final decision.
The beauty is that you taught the system to begin with.