Harold Burt-Gerrans talks about how to deal with structured data in ediscovery cases.
I’m baaaaa–ack! And you thought I was done with my earlier six-part series, but I have a new topic to
add to my rants and raves. For review, the previous series was:
- Standards and De-Duplication Levels
- New approach to managing Duplicative Documents
- Family Level Coding
- Recursive De-Duplication and Time Zones
- Multi-Languages and my vision of document utopia
And today’s new topic – Structured Data – What is it and how to deal with it
We typically classify the documents we tend to see in eDiscovery as emails, power points, spreadsheets,
etc., and these comprise the majority of the documents in the review. However, this document
classification is really a second-level classification – the primary level is “Unstructured Data” vs. “Structured
Unstructured Data: This data is free flowing, without any rigid rules as to how the content is organized.
Written documents such as Word files, and all the file types listed above, are typically considered
Structured Data: This data is organized with defined rules – the most common types of files would be
database files, where the database consists of data tables, and each table consists of a set of columns
containing a specific type of data.
In the strictest interpretation, data files are either one or the other: structured or unstructured. That said,
there are times when there may appear to be crossover points, such as a spreadsheet that contains only a
worksheet with a defined set of columns and each row is an individual record. For example, Column A is
‘Name,’ Column B is ‘Street Address,’ Column C is ‘City,’ and so on. Conversely, there can also be free
formed text fields inside of a database – like a product description, or some other field containing text
describing an object/situation.
The eDiscovery industry has been managing unstructured data for years, so I don’t need to discuss how
well we do it, with consideration to my suggestions in the previous six articles.
Dealing with Structured Data
There are going to be exceptions to the situations I present below, however, there is a good possibility
that if you are receiving structured data, it can be managed as I will describe. Oddly, the processes being
discussed are designed to convert structured data into unstructured document types that can be
handled in typical eDiscovery platforms.
1) Packaged Application Data: There are a lot of packaged applications where the data is stored in
an underlying database. A very common example of this is an accounting system such as SAP or
QuickBooks (one extreme to the other). Depending on the nature of the matter, accounting data
may be significant. Typically packaged applications come along with either a defined set of
reports, or with a built-in report writer tool that can be used to extract the data in a format that
makes sense because the application is providing the contextual structure needed to
understand the data.
The best approach to working with this data is to work with the client staff who usually create
or use the data, to pull off reports that contain the information that would be relevant to the
matter. For example, in a Mergers and Acquisition matter, you would work with the accountants
to extract GL Ledger Reports, Balance Sheets, A/R and A/P subledger reports, etc.
2) Proprietary Databases: These are becoming more common as companies implement systems to track corporate specific data. These may be done using in-house development staff, or by leveraging external application hosting environments, such as implementing a customized complaint tracking system within SalesForce or a Support Ticketing System within a case management tool.
How do we handle these? Stay tuned for part 8!
About The Author
Harold Burt-Gerrans is Director, eDiscovery and Computer Forensics for Epiq Canada. He has over 15 years’ experience in the industry, mostly with H&A eDiscovery (acquired by Epiq, April 1, 2019).