Managing the Production of Structured Data

James Vint, David Turner and Colleen Casey Voshell
Executive Counsel February/March 2009

E-discovery has focused primarily on unstructured data, such as emails, documents stored on file shares, spreadsheets and other “loose” documents. But due to a number of factors, including amendments to the Federal Rules for Civil Procedure governing production of electronically stored information (“ESI”) and the increasing sophistication of litigants, the production of so-called structured data may become more routine in future large matters.

Structured data is information contained in financial, transactional and operational databases that utilize software programs such as Oracle, Cognos, SAP and PeopleSoft. Structured data also includes custom built applications unique to an organization.

This kind of data generally supports both internal and external business operations, and it may indeed be relevant in litigation. Some of it may be readily available, while some, due to complexity, age, and unique customization, will not be available unless restored from archives, back-ups or legacy systems. Because it is a relatively new target in civil discovery, production of structured data presents challenges, from initial identification through production.

There are several distinct phases, each of which must be addressed when preparing structured data for disclosure.

IDENTIFICATION AND ACQUISITION

The dynamic nature of structured data sets requires robust data management knowledge and analytical tools. Expertise in enterprise information management systems and database technologies, including Oracle, SAP, and PeopleSoft, is important in understanding how various data sets fit together and how information must be extracted.

Companies can facilitate the identification process by creating data flow diagrams or entity relationship diagrams. These documents provide answers to basic questions such as:

• Where is the data that might need to be produced?

• Which of this data is currently available and which is stored in backups, archives or legacy systems? The answer to these questions may be important in evaluating whether information is “not reasonably accessible” under the Federal Rules of Civil Procedure — and thus potentially not discoverable.

• What is the cost and burden of restoration, versus the expected benefit?

• How is this data organized for access by employees?

• In what format can the company produce the information in order to be in accord with discovery rules?
While the questions above are traditionally directed at an organization’s email and document sources, they bear equally on the structured data that it has employed and stored. Given the complexities, companies may save time and overall costs by thinking proactively about these issues as they relate to structured data.

Particular care should be exercised with custom and legacy systems, which may not be fully documented and can present significant challenges when it comes to understanding the data elements.

Companies should expect that unique acquisition strategies may be required for each system at issue. For example, with smaller databases, often a full copy of the database might be acquired by the technical specialists who are helping to prepare the data for production. With other systems, particularly multi-terabyte systems too voluminous to fully copy, targeted acquisitions may be advisable.

In order to ensure that the correct historical information is acquired, and also to ensure that information will continue to be preserved properly, it is critical during the acquisition process to understand how data sets change over time.

PRESERVATION

In many cases, the routine updating of a database will not overwrite existing data. However, there may be situations where relevant data is in fact over-written or deleted.

At the outset of litigation, companies should make sure they understand how their systems retain data so they can decide whether they need to make system changes in order to preserve potentially discoverable information. Keep in mind that many legacy databases were not originally structured consistent with the requirements mandated by today’s litigation environment. Therefore assessing the preservation environment for each data set is critical.

PROCESSING AND ANALYSIS

The processing/analysis step highlights one of the significant differences between producing structured data as opposed to more traditional ESI. Keyword or concept searches are the generally accepted methods used to cull traditional ESI for potentially responsive information. But these techniques may not be applicable to structured data, which is largely centered around transactions, not words.

Instead, processing and analyzing structured data may require profiling transactions by specific fields or identified issues, using criteria such as dates, journal or general ledger codes, or even approval fields. The review of the structured data undertaken during the analysis phase may provide additional insight into related unstructured data and suggest new or different lines of inquiry for identifying potentially relevant information in that unstructured data.

REVIEW

Once potentially responsive structured data has been analyzed and processed, it may need to be reviewed again for responsiveness and privilege. There are no pre-defined tools for reviewing and redacting data contained in structured systems, but there are some general guidelines.

The more straightforward approach involves withholding or redacting specific value fields in their entirety, such as fields that contain personal privacy information that is protected from disclosure. This approach is more efficient and less costly because once attorneys define the fields that should be withheld or redacted, a technical specialist writes a script to exclude the entire field of information prior to production. Other than quality control, no further attorney review of individual fields of data is required.

There are times, however, when the content of individual fields of structured data, such as notes or memo fields, require attorney review. This type of review is complicated since the information that requires review tends to be stored in a structured manner but contain unstructured data, such as free text that has no parameter constraints on length or format. Technical specialists are needed to develop database-specific tools to present this information in a reviewable and potentially redactable format. Certain electronic culling methods can be employed to reduce the overall volume of information that requires attorney review, but generally attorney review is required.

PRODUCTION

Once the review is complete, the final phase begins – production. These data sets must be produced in a reasonably useable format, meaning that both the producing and receiving parties are able to review and analyze the information without excessive software licensing or time consuming manipulation.

It is important to ensure communication between the parties during this process so that the receiving party is equipped to receive the information in the form that the producing party provides it. This communication should begin early in the discovery pro-cess, ideally once the producing party has a general understanding of the systems at issue. Early communication enables the parties to agree to the formats for production and avoid multiple iterations of the same information, which is inefficient and costly to both sides.

Even in the absence of litigation, in-house counsel may want to acquire a basic, up-to-date understanding of the company’s major systems and identify the information services personnel who have the appropriate technical knowledge and expertise to assist in the event of litigation.

James Vint (e-mail) is a managing director in the FTI technology consulting practice and leads the London Financial and Enterprise Data Analytics Practice. For ten years he has provided IT consulting services in the United States and the EU, with regard to identification, preservation, collection, analysis and production of electronically stored information.

David Turner (e-mail) is a managing director in the FTI Technology Consulting practice and leads the Washington D.C. Financial and Enterprise Data Analytics Practice. For more than ten years, he has consulted with Fortune 500 clients in the area of structured data analytics, focusing on event-based transactions.

Colleen Casey Voshell (e-mail) is a director in FTI’s Technology group. She provides strategic advice to clients on e-discovery issues. Before joining FTI, she was an associate at a large law firm, concentrating her practice in mass torts and product liability defense.