Sound extraction fr...
 
Notifications
Clear all

Sound extraction from large databases

10 Posts
5 Users
0 Reactions
522 Views
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
Topic starter  

There is a 500TB database in a widely distributed cluster.

Most, except the target subset, of the database is dynamic and constantly changing.


Describe the forensically sound extraction of 1GB subset.

Do not just think MS SQL, or Lotus Notes, etc. Think proprietary database.

How would you do it?


   
Quote
binarybod
(@binarybod)
Reputable Member
Joined: 17 years ago
Posts: 272
 

I would think that the only real alternative to using the proprietary interface is to write your own code and hook in to the API which I would suppose is difficult as the source is likely to be closed. If you can't pin down the physical location of the data then you are pretty much hosed in terms of things you can do.

How about contacting the people who wrote the DB?

Paul


   
ReplyQuote
 IanF
(@ianf)
Trusted Member
Joined: 17 years ago
Posts: 55
 

Are we allowed a couple of questions ?

If so -
Does the system have to remain available to users at all times?
Whats type of interfaces are available ?
Does the database operate similar to traditional RDBMS's - can you quiesce the datafiles, suspend transactions, lock records etc ?
What type of backup infrastructure is in place ?
When was the data in question created and can it be altered through the application or other interface ?


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
Topic starter  

The system must remain online.

The system maybe able to provide some sort of a delimited information dump.

There is a possibility it can be queried in some languages, such as SQL. (new discovery for me and this still may be wrong).

The records possibly can be locked, but just as they can be locked programatically, they can be unlocked.

I am not aware of their backup schema, or if there is any.

The data was created between 2002 through 2008. The application can alter the information.


   
ReplyQuote
(@douglasbrush)
Prominent Member
Joined: 16 years ago
Posts: 812
 

Volume Shadow copy? This way you can make a hash-able data set provided it is MS server environment.


   
ReplyQuote
 IanF
(@ianf)
Trusted Member
Joined: 17 years ago
Posts: 55
 

Hi Jhup,

One final question - what are the datatypes of the columns you will be trying to harvest ? Reason I ask is because specific datatypes such as raw and long, blobs etc can only be extracted into a similar typed data store. If that is the case dumping out to flat file may be an option.

I would explore the backup/recovery capability a bit more tbh. Most database systems have a mechanism to allow you to do a selective backup/restore (this may just be the export to flat file you're mentioned) while providing the ability to migrate data across systems.

Is there any further info on the system ? even when it was/how developed, what file structures they use to store the date and interfaces you might have found.


   
ReplyQuote
(@seanmcl)
Honorable Member
Joined: 19 years ago
Posts: 700
 

Too little information, IMHO. As others noted, is their an API or a raw interface to the database or is the only interface the actual application that uses it?

Also, when you say it is "proprietary" what, specifically, do you mean? That the database architecture, itself, is a trade secret?

And what constitutes forensically sound with respect to a database?

To illustrate. We had a case involving alleged kickbacks in which the principle data were located in an Oracle Applications (OA) database. OA is extremely complicated because the application logic is embedded both in the application, itself, and in various triggers. The tables to which the user is exposed are, frequently, materialized views rather than the raw tables, themselves. It is possible that a user could by-pass the application logic and alter the data in the raw tables so as to make the change all but undetectable by the actual application software, but it would require a great deal of knowledge beyond what the normal application user would possess.

So, for practical purposes, the best way to dump records which may be of use in a forensic audit is to use the application, itself, to create reports, since these would preserve the actual financial records as they would be represented in the system.

But if you suspected that the raw tables had been altered, this would not be sufficient.

As I said, more information would be necessary to provide a finer grained answer.


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
Topic starter  

Thanks all.

It is proprietary, as in the underlying database engine (not just structure) is proprietary - which is why I have a problem saying it is an RDMS, or something else.

There is no API.

Forensically sound, as in, I can defend match as much as possible to original. All steps documented, be able to explain what it does, and can defend why taken.

I think what I am going to end up doing is use one historical backup of the data, one current backup, one post current.

I will use the middle backup as the master, and the pre- and post- backups as proof of integrity. Thanks IanF.

Of course, this only proves that the backups are consistent, not that the actual database is.

If I can get them to restore all 3 backups, then I can use seanmcl's report suggestion sequentially proving consistency not just within the backups but also matching within the application/database.

Again, thank you all!


   
ReplyQuote
 IanF
(@ianf)
Trusted Member
Joined: 17 years ago
Posts: 55
 

jhup - how did your analysis go ?


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
Topic starter  

Settled out of court.

I dumped data through the system into CSV, then dumped it twice more.

As all three matched that is the best solution I could do to show that the data most likely is not changing.

I was able to get an older backup 'restored', then dumped from beginning of '09, which matched again.

It showed that the data did not change in over a year, and is most likely consistent with the original data . . .


   
ReplyQuote
Share: