deleted data surviv...
 
Notifications
Clear all

deleted data survival times  

  RSS
tootypeg
(@tootypeg)
Active Member

Just wondered if anyone has done any research (or knows of any) into the survival time of deleted data (before being overwritten) on FAT and NTFS devices under different formatting conditions and clustr sizes etc?

I was thinking about doing some work in this area but-

a) wasnt sure if it was worth it? is there any forensic value in it
b) wasnt sure how to test it - should it provide context to say browser cache turnover of cached content before it is overwritten?

What do people think?

Quote
Posted : 26/07/2017 2:00 am
tracedf
(@tracedf)
Active Member

It really depends on the size of the drive, utilization, activity, and whether it's SSD. With TRIM, unused blocks on an SSD can be wiped shortly after they are freed. Assuming a spinning hard drive, the data could sit around for years or get overwritten in minutes depending on multiple factors.

If the drive is smaller and/or mostly full, any recently freed blocks are more likely to get reused soon because there are fewer free blocks to choose from.

If the drive does not see a lot of write activity, the data could remain for quite a long time. If the drive is active, e.g. the user is trying to download dozens of pirated movies, the data could get overwritten relatively soon.

In general, I would be optimistic about some data remaining for a long time. For example, let's say the user deleted hundreds of contraband images (CP). Unless the user wipes or fills the drive, there's a good chance that some of those images will remain months or years later although many of them could be overwritten.

From a forensics standpoint, I don't know how useful it is; in most cases, we're just going to search/carve to see what is there. But, I would be interested in seeing how long the data remains under varying levels of use. For testing, I would recommend using another computer to create some unique files with an identifiable byte pattern. Load the files onto the test computer, then delete them. See what you can recover after 5 days, 30 days, 90 days. Repeat the experiment with varying levels of disk activity. E.g. do it once with light use (email and web browsing). Do it again with heavy use (lots of bit torrent, trying new games from Steam, etc.). What % of the data are you able to recover each time?

-tracedf

ReplyQuote
Posted : 26/07/2017 4:46 am
jaclaz
(@jaclaz)
Community Legend

It seems to me rather pointless, not because it would not be interesting (it would be) but because it would have IMHO no practical use.

I have seen "office" computers (the one that a secretary uses, where the activity is mostly e-mails and updating a number of pre-existing documents) where deleted data could be recovered after years (usually the disk is used - say - at the most 20 or 30% of capacity), and I have seen systems used during the day to do some work and during the nights and weekends to senselessly and blindly download Gbytes of torrents/movies where you couldn't get on Monday something deleted the previous Friday.

Also since Windows 7 - if I recall correctly - there is automatic/scheduled defrag, which while not necessarily really wiping the data may well make a mess of it overwriting partially files.

And of course automatic Windows Updates. (

But let's say that we can categorize a given PC in one of (say) five "types", each with a given "recoverability rating"
10%
30%
50%
70%
90%
Then we empirically establish that we (still say) can subtract 1% for every 24 hours of activity of the PC after the deletion.

So you have a "10%" PC where the file has been presumably deleted 15 days ago (of which 11 were working days of around 8 hours each).

Our nice hypothetical formula will tell us that we have a probability of (10-(11*8/24))/100=6.33% to recover the deleted file.

What do you do?

1) You look for the file (but with a seriously apathetic and uninspired attitude)
2) You don't look for the file (since you decided previously that anything below 10% is not worth even the attempt)

What would you do if you didn't know that the particular item was rated at 6.33%?
1) You look for the file (optionally with an optimistic or neutral attitude)
2) You don't look for the file (because it is too much work, would cost too much, it is not possible, etc. [1])

jaclaz

[1] This is what usually the corporate IT guys say whenever you ask then anything that could - even hypothetically - result in some work for them.

ReplyQuote
Posted : 26/07/2017 5:47 pm
tootypeg
(@tootypeg)
Active Member

Thanks both for your replies.

I kind of feel the same in the sense that you are both drawing reference to 'practical use'. But then in terms of triage (maybe im trying too hard to find a use), could we infer something for example from event logs showing active times which may then provide a % of chance of recovery and therefore whether it is worth implementing a long carving process?

I wonder if there is some way to determine when a carve would be useful to make things more efficient.

ReplyQuote
Posted : 26/07/2017 6:40 pm
jaclaz
(@jaclaz)
Community Legend

But then in terms of triage (maybe im trying too hard to find a use), could we infer something for example from event logs showing active times which may then provide a % of chance of recovery and therefore whether it is worth implementing a long carving process?

I wonder if there is some way to determine when a carve would be useful to make things more efficient.

That would be still a simple formula
How much (percentage) of disk is Free? F
How many hours (of running machines) passed since probable deletion (as taken from event logs)? H
Which OS is it running?
Is it Hard Disk or SSD?

Probability P=F-(k*V/10*T/c)/100

F in the range 15%H in the range 0V in the range 5<=V<=10
where 5 means XP or earlier, 6 Vista, 7 7, etc.
k in the range 1<=k<=2
where HD=1 SSD=2

c is the correction factor that might need to be more finely tuned from experiments, initially 1, but could increase up to - say - 50.

Examples
1) filled up to the brim windows 10 on a SSD
F=15%
H=50 (h)
V=10
k=2
c=1
P=15%-(2*10/10*50/1)/100=-85% 😯

2) "secretary PC", Windows XP, HD
F=70%
H=50(h)
V=5
k=1
c=1
P=70%-(1*5/10*50/1)/100=+45%

3) "average" ? machine Windows 7, HD
F=40%
H=50(h)
V=7
k=1
c=1
P=40%-(1*7/10*50/1)/100=+5%

OR, maybe use this special device wink
http//stevendiebold.com/wp-content/uploads/2011/01/How-Can-You-Build-a-Marketing-Machine.gif

jaclaz

ReplyQuote
Posted : 26/07/2017 9:20 pm
tootypeg
(@tootypeg)
Active Member

Sort of agree, but I think surely thats over simplified. Things like cluster size, allocation algorithms etc - also impacting?

I think length of time used is not as important as the number of transactions taking place (likely one is linked to the other).

Maybe its worth simplifying to say a simple USB memory stick. Say for example, a suspect saves x,y and z on there. Yes, time isnt an issue here, but for example, how likely is natural overwriting to occur. For example, just because a file saved and deleted, then in NTFS, will that portion of the disk not be used at next optimal time as opposed to writing to over areas of the disk. Therefore even a small amount of usage will likely lead to natural overwriting, it doesnt need to be a lot?

Again, i think in practical terms its not massively useful, but knowledge wise maybe interesting.

…Another point, often its not possible to tell how long something has been deleted for. Could understanding deletion and overwritten processes maybe lead to estimates? Hey, i dunno, just throwing random thoughts out there.

ReplyQuote
Posted : 26/07/2017 9:45 pm
jaclaz
(@jaclaz)
Community Legend

Sort of agree, but I think surely thats over simplified. Things like cluster size, allocation algorithms etc - also impacting?

Naah, cluster size is totally irrelevant AFAICT (of course unless we are talking of files within the 0Actrually on NTFS that would be more likely the 720https://www.forensicfocus.com/Forums/viewtopic/t=10403/
as *almost any* NTFS filesyste will use 8192 bytes as cluster size.

It would be more relevant the actual size(s) of the single file(s) (each and every single file EVER written after the presumed deletion) which is a known unknown.

More or less (take this piece of info with a pinch of salt) the "algorithm" revolves around
1) "let me find first contiguous stretch on the disk where I can fit the whole file"
2) "if I didn't find one soon enough, let me start writing the file wherever I see fit, and if it is not contiguous it doesn't matter I will use one or more additional extents I see fit"

And SURE the formula is over simplified (by design).

We could add a number of additional parameters, insert a quadratic expression, but all without substantially improving the actual accuracy (which is "next to none", only of very limited use and definitely for "triage" only).

I think length of time used is not as important as the number of transactions taking place (likely one is linked to the other).

Yes and no, both could be needed in a "better" formula, any "weekly scheduled" activity (such as normally the automatic defrag) has 100% probability of having been run once if the stretch of time is more than one week.
Up to not-so-long ago having one or more "Patch Tuesday" within the elapsed time would make potentially a big difference.

Maybe its worth simplifying to say a simple USB memory stick. Say for example, a suspect saves x,y and z on there. Yes, time isnt an issue here, but for example, how likely is natural overwriting to occur. For example, just because a file saved and deleted, then in NTFS, will that portion of the disk not be used at next optimal time as opposed to writing to over areas of the disk. Therefore even a small amount of usage will likely lead to natural overwriting, it doesnt need to be a lot?

Well, if you narrow it down to a removable non-system disk/device, then it may be reproducible.

On a system disk, where there is besides "normal" scheduled activities, the temporary folders, the expanding (temporary) of archives, the updates of the OS and of a zillion of programs/apps, etc. it is a "lost cause" in the sense that each single computer usage may be different from another one.

Again, i think in practical terms its not massively useful, but knowledge wise maybe interesting.

Yes, knowledge wise it would be interesting of course.

…Another point, often its not possible to tell how long something has been deleted for. Could understanding deletion and overwritten processes maybe lead to estimates? Hey, i dunno, just throwing random thoughts out there.

This is IMHO actually "random", no offence intended of course but would you be able to keep a straight face when in a Court as an expert witness you would state that "since the file was NOT found it means that it was EITHER intentionally wiped OR deleted a lot of time ago"?

Seriously, often a "normal" (what I consider "normal" at least)
https://www.forensicfocus.com/Forums/viewtopic/t=5410/
user initiated DEFRAG is enough to overwrite large parts of the disk (of course and as above, the more it was full and the more it was fragmented, the more data will be lost).

jaclaz

ReplyQuote
Posted : 26/07/2017 11:50 pm
tootypeg
(@tootypeg)
Active Member

Yer i see your points. Its answered alot of my questions and thoughts about this topic to be honest. I was just after a little research project and wondered if there was anything I could dig around in with relation to deleted files. Shame!

ReplyQuote
Posted : 27/07/2017 12:31 am
jaclaz
(@jaclaz)
Community Legend

Yer i see your points. Its answered alot of my questions and thoughts about this topic to be honest. I was just after a little research project and wondered if there was anything I could dig around in with relation to deleted files. Shame!

BUT once said all the above, on modern NTFS most probably a "what was deleted when" tool possibly combining a $MFT analysis with $UsnJrnl and $LogFile would provide a (maybe time limited) window on the past.

It won't be a quick triage method, but it will have some practical use, we are shifting from "what the OS/filesystem usually does (and analyze this statistically or evaluate the probabilities of events)" to "what actually happened and can be documented on this specific OS and filesystem".

This would be a good start point
http//www.forensicfocus.com/Forums/viewtopic/t=10560/
https://github.com/jschicht

jaclaz

ReplyQuote
Posted : 27/07/2017 12:58 am
keydet89
(@keydet89)
Community Legend

a) wasnt sure if it was worth it? is there any forensic value in it
b) wasnt sure how to test it - should it provide context to say browser cache turnover of cached content before it is overwritten?

What do people think?

I'm not at all clear what the end result of such research would be. In fact, in conversations I've had with other analysts, some have had the expectation that deleted data would still be available, to some degree, albeit without anything to really back that up other than a 'feeling'.

A couple of examples

The Samas ransomware that was seen early in 2016 shipped with a copy of sdelete in one of its resource sections; later versions reduced the size of the .exe by providing sdelete separately. So, the question becomes, what constitutes "deleted"?

As others have said, I've seen where some artifacts have been pulled out of unallocated space months later, whereas I've also seen where zipped archives have been deleted from a system less than 12 hrs prior to the image being acquired, and we were not able to retrieve any of the archives.

I've also seen instances where the bad guy has use the WinInet API to pull artifacts down, via a user account that doesn't use IE (user uses Chrome, or bad guy elevate privs to System…); as such, the artifacts remain months or even years later, as the cache turnover mechanism is never activated.

On some recent NotPetya cases, one of our team members has done some fantastic work in retrieving Windows Event Log records and assembling them into something that can be parsed. However, so far, he's only been able to retrieve the Security Event Log records…his methodology is agnostic, but all he's only been able to retrieve those logs. If the command to clear the logs was strung together with "&", logically you'd think he'd get a mix of records.

ReplyQuote
Posted : 29/07/2017 4:49 pm
tootypeg
(@tootypeg)
Active Member

There were a few things that were running through my head regarding this topic.

1) I just wondered if somehow (again, its not something that I have fully researched), it was possible to extract where on the disk a file was when live, based on its unallocated position on the disk. But as the allocation process (on NTFS) seems to be very much first available position, its unlikely this would be achievable. To provide an example, an unallocated cached images was surrounding (physically on disk) by other cached files, suggesting that the unallocated image was a file found to be part of a cached web visit. But there are so many factors to consider here including browser type etc. And in reality, it seems that the allocation of files does not follow a process which would allow this.

2) I also wondered whether determining allocation patterns, the length of time file was deleted may be ascertainable. But no.

3) The likelihood of a file having been overwritten based on the current setup, state and usage on the disk to prevent lengthy recovery processes.

But to be honest, Im now thinking this is something that just does contribute on any of the above points.

ReplyQuote
Posted : 29/07/2017 4:58 pm
keydet89
(@keydet89)
Community Legend

But to be honest, Im now thinking this is something that just does contribute on any of the above points.

I'm not at all sure that I'm clear as to what that means.

ReplyQuote
Posted : 29/07/2017 5:40 pm
Share: