Aaarrrrrr, Be troubled I am
We are looking to assist our in house counsel with a large tape restoration piece, approximate 30,000 plus tapes.
The tapes are a mixture of LTO 2-4 and at this stage we are trying to estimate cost feasibility/time frames.
There are some technologies and software that could assist us in completing the work e.g. Index Engines but the costs associated with using this technology is more than our in house counsel is willing to pay.
Aspects to consider are the following
* A mixture of backup tapes
* A mixture of backup (legacy) agents used by our IT department no longer available
* A reasonable time frame eg 1 or 2 yrs
* A full restore would require a few Petabytes and henceforth not our preferred option
* As usual, limited funds available
We are open to views on how this can be completed, as this can not be new to the forensic market. Ideally we visualise that during processing we should obtain a 60-70% de- duplication rate on the data and that indexing is ideal.
Is there really only one provider for tape restorations? and are large tape restorations always going to be this costly?
The Angry Pirate
Aaaarrrr???
This line struck me as interesting for the type of job you are mentioning
" is more than our in house counsel is willing to pay."
You will likely hear that for any solution that you offer up to them.
Whether you go with more manpower, robotic loader, more machines, you're looking at an incredibly high figure to do this. Then you threw in "As usual, limited funds available" Could you define what they consider limited funds for the whole job, software, hardware, manpower. $100k $500k?
AND you want 60-70% dedupe?
The price is just going to be really expensive, as well as the dedupe part and the ability alone to load up 30k tapes. You'd likely also run into a stonewall on your end product data being so large that software might not be able to index it, so more licenses for that and breaking down the data. In addition to that, one small slip up with the naming conventions of the data could be catastrophic for the end result.
Are you allowed to hire out of house?
This is why Information Management is on the left hand side of the eDiscovery Reference Model (EDRM) and why retention rules matter.
The only good news is that with that subject matter, a 60-70% dedupe is likely to be conservative. Considering the quantity of redundancy likely, I'm guessing you'll get more in the range of 80%+.
There are a limited number of companies who are likely to be in litigation or under an investigation this size, which is why I couldn't discuss strategies unless I knew who you were and who you work for.
DISCLAIMER NOT A LAWYER. NOT LEGAL ADVICE You know generally in US litigation you are not required to produce ESI from backup media which is not reasonably accessible? Although since you only identify your location as "Americas" it's hard to tell if you're working under US rules or some other. More information would help.
You also need to consider what data you are looking at.
Do you want to look at any SQL backups, bare metal backups….
Exchange EDB’s for off line backups are easy enough to extract along with associated STM’s but then the data from these needs to be fed back into the system somehow. The same for PSTs, zips….
Mailbox backups/Online exchange backups (sometimes called brick level) are more difficult to deal with and most companies would want to restore this part of the backup to a cloned server. The backup software won’t restore it elsewhere. However, you often see a full backup on the same tape/set as a brick level so the data is duplicated anyway
Duplicate data, as mentioned, is likely to be 80%+ which is good news. The slightly less good news (depending upon your view point) is that in order to determine whether you have a duplicate document you need to extract it and hash it. i.e. you need to do a full extract hashing the data as you go and then throw away the duplicates. You will however need some log as it may well be relevant that a particular file was found on backups x, y and z.
What about backups that span multiple media? Can you easily identify which tapes belong to which backups before you start? Can your solution deal with an orphaned backup tape (i.e. the second or third tape in a series for which you don’t have the first).
Contact eMag Solutions in Atlanta
They have a service center concentrating on such jobs. Also, the capability to handle a very large range of current and 'mature' tape formats.
30K tapes is a lot, and I expect they will conatin many different backup formats / tape structures as well as many file structures. As Paul mentioned, there is also a major issue of handling multi-vol datasets if the tape sequences have not been recorded.
JFYI, there was a recent thread somewhat related
http//www.forensicfocus.com/Forums/viewtopic/t=9354/
The "numbers" involved are appalling.
They talk of having GREATLY sped up things passing from a US$ 8,000,000 for 480 tapes in 2005 to a mere US$ 1,900,000 for 800 tapes in 2009.
I won't dare to divide US$ 1,900,000 by 800 and multiply the result by 30,000 😯 , but I can allright divide the 9 months work that it reportedly took, divide it by 800 and multiply it by 30,000
30,000*9/800=337,5 months
337,5/12=28,125 years
On the other hand, if you actually want to "restore" a tape, see here
http//
an "average"
DDS-5 (DAT 72), 36 gigabytes, 3.5 megabytes/second
good ol' time tape has a theoretical speed of 3.5 Mb/s and a capacity of 36 Gb, i.e. in relative terms (roughly)
36,000/3.5=10,286 seconds/tape or 10,286/3600=2,85 hours/tape
and an "average"
Super DLTtape I (SDLT 1), 160 gigabytes, 16 megabytes/second
more recent tape
160,000/16=10,000 seconds/tape or 10,000/3600=2,77 hours/tape
and an "average"
LTO 3, 400 gigabytes, 80 megabytes/second
Ultrium "latest"
400,000/80=5,000 seconds/tape or 5,000/3600=1,39 hours/tape
Not knowing the details of the type of tapes involved, 4 hours time each (including finding the tape, inserting it, tensioning, dealing eith the "odd" or "problematic" one, etc.) is not a too bad estimation, so
30,000*4=120,000 hours, or 120,000/24/365=13,65 years (24/7)
more likely, let's say that you can have between 2 and 3 tapes restored in a days's work on a single machine on a "normal working year" of 365/7*5-30=230 woking days/year you have
2.5*230=575 tapes year.
Provided that a single operator can somehow manage 10 machines, you will be doing 5750 tapes/year, so you need something like 5-6 operators (and 50-60 machines) to do it in one year.
jaclaz
30000 tapes in 2 years is about 50 per day. Working 24 hr shifts you should manage about 5/6 tapes per machine per day (assuming no problems) so in reality you are loking at running 10 machines in parallel. Having done similar (although only on hundreds of tapes - not on thousands) you need to have a robust procedure set up to ensure that a) you are running at near to max throughput and b) everything is fully documented and double checked.
How would I do it?
Firstly I have in house software TC2 that works on a tape image and supports a lot of different tape formats. this would need to be modified to work on a tape directly to remove the overhead of the imaging process.
TC will autmatically generate an MD5 on an extracted file and create a log and csv of the content. So I would need to modify it to check a central database to see if a file had already been extracted and if so delete a duplicate as soon as it is extracted*
Any tape that has something unusual, and by this I mean anything that is not really a file (exchange brick level backups, databases….) would need to be flagged and a decision made later as to how to proceed with it
* as tapes need to be read linearly it makes no sense to calc the check sum and then only extract if the file has not been seen as this would involve the tape seeking backwards (very very time consuming), so better imo to extract everything and then delete dupes.
The real problem here is that backup formats on tapes change very regularly and differ depending upon whats being backed up. frinstance a backup of a Novell server with mac namespace installed using arcserve 7 will be different from a backup of the same machine without mac namespace. Arcserve 8 would be different again. On nearly every occasion when I have needed to deal with a backup set I have needed to modify my code to cater for something I have not seen before.
I would refuse to do it, in a very nice way. Unless this is a multi-million dollar case, your lawyer needs to review her Sedona notes.
Presuming this is eDiscovery, non-criminal case -
First, if you never have accessed backup tapes other than for emergency recovery, as Tony said, you are not required. This is a catch-22 by the way - if my internal attorney asks for restoring tape backup because they think there is something useful on it, I would still refuse because it would open the door for future cases.
Second, FRCP 26(b) allows for cost shifting. I never say to demands "will not do it", simply present them the costs. If they are willing to eat it, then by all means go for it.
Third, FRCP 68 offers to do a "offer of judgement", whereby your company offers a settlement, comparable to the cost of the case to the plaintiff. If they decline, the offer is defaults to withdrawn. Basically, saying "look Joe, it is going to cost me $1mm, take the million and we call it settled." Saves court time, and prevents carpal tunnel for the squirrels.
Finally, proportionality - we get eD demands that are multiple times the possible compensation the other side would ever get. For example, an HR case where at best the person would get $10,000, but demands recovery that would cost $100,000. Courts do not look kindly on such demands.
This is where your attorney needs to be savvy, and make the right decisions at the initial meet & confer.
Sadly, I have rarely get to sit in (or have seen other FIs invited), and provide aide to the lawyers. Usually one side ends up in knots and a mess of a eD project.
I recommend contacting eMag Solutions as well. Thousand tape jobs are the norm for them.
Hi All,
Many thanks for the replies and wealth of information.
A lot to consider and some interesting points to return to our in house counsel with. At this stage I've only been included on what "I need to know" although it seems that alot of justification would be required for my team to pursue this. (without committing a sizeable portion of our lifespan)
I do however find this topic of information management and eDiscovery quite interesting, especially relating it to the use of Tape as a backup medium and the cost associated in restoration.
Its quite evident that Tape is and will be a cost effective backup medium at the backup stage but these cost savings seem to be lost quite quickly come restoration.
I personally know of organisations who create in excess of 3 terabyte worth of tape backups every month, without making use of de-duplication. This almost seems like a clear miss use of the tape backup medium.
I guess the closing questions are
* Will suitable tape restoration packages come out that can assist the restoration process (increasing time and cost effectiveness),
* Will organisations need to be smarter about backup regimes,
* Or will the use of such excessive backup practices help lawyers and the like avoid the disclosure of evidence due to the cost involved.
The Angry Pirate