Last week I wrote a post on why lawyers can no longer ignore math and the role of sampling in eDiscovery. In summary, I said that the failure to employ sampling techniques during eDiscovery review and production was shortsighted at best and might be grounds for sanctions at the other end of the spectrum. When practiced correctly, sampling can protect against inadvertent disclosure, strengthens defensibility and helps control the high cost of the process.
How can sampling be implemented practically?
By definition, sampling means that only a portion of total dataset is going to be directly analyzed and reviewed. This is a concept that many requesting attorneys have difficulty wrapping their heads around because a “crucial piece of information could be missed in the sampling process”.
Historically, a document review wasn’t “complete” unless and until every document had been examined by an attorney. Given the exploding volume of eDiscovery material in today’s litigation, we need to integrate scientific methodologies to reduce the high costs of document review without sacrificing accuracy. It is not a one sided concern. Those providing the information are also concerned. They may feel that sampling takes control out of their hands over what's revealed to the requesting party. The fear is that if sampling is used, data may be released to the opposing side that should not be released.
How can these concerns be addressed?
There are two primary ways to integrate sampling into the discovery process in a systematic manner. The first and best method is to use the FRCP rules regarding the Meet and Confer which are pushing opposing parties to work collaboratively in designing eDiscovery plans. As part of these discussions, sampling techniques can be integrated into a collaborative discovery plan. As discussed in the last post, theses issues would be agreeing to the sampling protocol involving: precision, confidence level, margin of error and type of test to employ. When sampling data, the results usually end up in three categories:
• Relevant - data tagged for referral to the requester
• Privileged – data that is protected from discovery
• Irrelevant – data irrelevant to the request
With the sampling conditions are agreed to by both parties, there is little room for argument when the results are delivered. If a data producer’s sampling methodology isn’t transparent to the requester, there will be a lack of trust about the results, particularly about the documents that end up in the “Irrelevant” category.
Employing a two-tier process is the second way to integrate sampling into the discovery process and address any concerns by the opposition about document production. In a two-tier discovery strategy, the data provider reviews documents without using a statistical model. They forward the information selected as ‘Relevant’ to the opposition. As the review of this data proceeds, the requesting party makes a second request for all data not previously produced, excluding exact duplicates, system files and privileged information. Once these three sub-categories of documents have been removed from the irrelevant document set, the remainder can be searched using a set of statistical protocols that is designed solely by the producing party. The application of statistical sampling on his reduced data set will still result in a production that can be defended in court if required, without forcing a manual review of every document in the irrelevant population set.
The ultimate goal of integrating statistical sampling into eDiscovery is to reduce the overall cost of production and document review, providing your client with the security that only the required documents have been produced and supplying the court with a defensible process.


Comments