FYI…
From here
http//
"The Last Access Time on disk is not always current because NTFS looks for a one-hour interval before forcing the Last Access Time updates to disk. NTFS also delays writing the Last Access Time to disk when users or programs perform read-only operations on a file or folder, such as listing the folder’s contents or reading (but not changing) a file in the folder. If the Last Access Time is kept current on disk for read operations, all read operations become write operations, which impacts NTFS performance."
This is from the Last Access Time section of the above linked page. Reading the rest of the section will give you a pretty good indication that last accessed times are not necessarily what one would refer to as "reliable".
HTH
Hi everyone. I'm Jonathan Grier, the author of the paper being discussed (Detecting Data Theft Using Stochastic Forensics). There's some great discussion here - thank you to everyone for their questions and challenges, and especially to Jelle for reaching out to me.
I'm going to try to address each point raised, one by one. I'm sure there will be more points, so let's keep the discussion flowing. So we're all talking about the same version of the paper, my references will be to http//
The first question I saw was Which operating systems does the method work on? There are a few subtleties here, so let me try to clarify
1. The first OS dependent factor is Does copying something, with out opening it, update the access timestamp? As Jaclaz pointed out, there's a difference here between Unix based OSes and Windows. Unix updates files' timestamp, whereas Window only updates the folders. The method will work either way, but you have to know which one you're dealing with.
2. The second question is Is the system tracking access timestamps at all? All modern OSes can track access timestamps, and all can be configured to _not_ track them. This is system dependent. Windows tracks them unless the registry key NtfsDisableLastAccessUpdate is set, as cited in the paper; in Unix, it's the noatime mount option.
Windows Server 2003 generally tracks access timestamps out of the box, whereas Server 2008 generally has the disable key set out of the box. But this is always system dependent and very easy to check.
Of course, if the system in question isn't tracking access timestamps, the method described in the paper isn't directly applicable. Are there other methods, drawing on a similar idea, that can be used? That's a great question, deserving of more research and discussion, but certainly outside the scope of this thread.
I believe that's the scoop on operating systems Either Windows or Linux (but with a difference in approach), but of course only if atimes aren't disabled.
There are other very important points raised in this thread (especially the ones raised by Jaclaz, Bulldawg, and Jelle) which I hope to come back to later. In the meantime, I'm eager to keep on reading the great discussion unfolding here.
Also If any of you will be at Black Hat (I'll be speaking about this there), I'd love to say hello.
Jonathan
Hello Jonathan, welcome ).
This is the point that needs IMHO a "main" addition/clarification/specification
1. The first OS dependent factor is Does copying something, with out opening it, update the access timestamp? As Jaclaz pointed out, there's a difference here between Unix based OSes and Windows. Unix updates files' timestamp, whereas Window only updates the folders. The method will work either way, but you have to know which one you're dealing with.
Whilst on *nix systems is the file that is "touched" when "accessed", on MS Windows (and there is the need to draw a line IMHO between the "good" NT's like NT/2K/XP/2003 and the "bad" NT's like Vista 😯 and later and possibly another one to separate the old DOS based ones, though I expect that a list of filesystems affected will "narrow" the "scope" to NTFS and ext2/3/4 filesystems ) the things that are "touched" are just the directories, which implies that
- on *nix this statistical analysis will surely have more "data" and/or these "patterns" can be found even on very "flat" filesystem trees
- on Windows it is ONLY appliable IF there has been a "bulk" copy of directories containing sub-directories and/or only on very "nested" filesystem trees, additionally
At least on Windows XP (and similar) these "patterns" can be created by a number of other "common" operations, like recursive DIR or file search, so the method may be accurate only on systems used in a very "vertical" manner.
jaclaz
Thanks for stopping by Jonathan. Your presence will add a lot to the conversation. As soon as I get some time, I plan to conduct my own research on this. Through all my CF training, everyone has said access times are not reliable, but I don't plan to take their word for it.
While all operating systems can track last accessed time, the most common modern operating systems have that feature turned off by default; which means that for this method to be useful going forward, we will somehow need to convince IT administrators to turn that feature on. For CF examiners working in a corporate environment, that's easier. For consultants and law enforcement who are often brought in on a case only after the theft has occured, communicating the need to turn on tracking access times will be very difficult.
Thanks for stopping by Jonathan. Your presence will add a lot to the conversation. As soon as I get some time, I plan to conduct my own research on this. Through all my CF training, everyone has said access times are not reliable, but I don't plan to take their word for it.
Thanks, Bulldawg.
Re access timestamp reliability It's certainly true that access timestamps are often imprecise, by minutes or even hours. And it's here where the beauty of using a stochastic method (as opposed to a artifact based method) really comes through. Since we're working with a probabilistic model, we're fine with noisy data. For instance, you could skew each timestamp by a random noise value - the cutoff cluster will still show up quite clearly in the historgram. This is because what we're looking at are trends, caused by probability distributions and the law of large numbers, not individual values.
(BTW, for the same reason, we can use the method with only partial data. For instance, if there was a process we knew about which altered the timestamps of 50% of the files, we could filter those out and still use the model.)
Now, there's another reason why access timestamps have the reputation of being unreliable. That's becasue experimenting with them is very hard it's easy to accidentally overwrite them when you're just trying to look at them. And sometimes the OS reports different values from what's on the disk. In secion 9 of the paper, I give some recommendations about experimenting.
Hello Jonathan, welcome ).
This is the point that needs IMHO a "main" addition/clarification/specification
On Windows XP (and similar) these "patterns" can be created by a number of other "common" operations, like recursive DIR or file search, so the method may be accurate only on systems used in a very "vertical" manner.
Thank you, jacalz. And yes, I agree 100% that your question is the most important one. And it's one I focus on when presenting the method. It was a big part of my talk at Black Hat. (I touch on it a bit in the paper (end of section 7), but academic papers aren't the best forum for it.) As you point out, cutoff clusters aren't proof of anything. And they're not intended to be. What they are are clues, defining a window of opportunity and propelling an investigation forward.
When I teach people how to use cutoff clusters, the first thing I have them do is plot histograms of many other control folders, besides the folder that they suspect may have been stolen. If many of the control folders have cutoff clusters, we have to hypothesize a routine cause. But if only the suspect folder has a cutoff cluster, and the control folders don't, that means that something very unusual was done to the suspect folder; it may have been data exfiltration, it may have been authorized copying, or it may have been something else. We don't yet know. Cutoff clusters, by giving us the date and time of this activity, are a great clue to propel the investigation forward. Who was in the building then? What were they doing? Can we investigate their PC? Interview them? One clue leads to another until the case unravels.
I have a popular slide about this, which I call "Thinking like Sherlock, not Aristotle."
Don't stop with control folders, though. No matter what, look for other possible causes. Remember, you have the system in front of you look at the services on it and see if any of them disturb the time stamp patterns. My experience is that most enterprise backup software doesn't. Antivirus software usually won't disturb them either, although in some cases they do. (Some great research has been done about this, but it's not for this thread.) Investigate recursive directory listing tools, like jaclaz mentions raises. If the insiders we're investigating are developers, it's possible there are people who use these tools a lot. Try to confirm this we should see cutoff clusters everywhere, and other evidence of these commands running. However, if we're dealing with a typical office, most people haven't even heard of a command line, let alone knowing how to run a recursive dir or grep. Try to confirm that as well Show that control folders are free of cutoff clusters, and show that there's no evidence of anyone ever using these tools.
Windows XP updates those access times when it feels like it, as much as a full hour after the actual access of the file.
That is well-documented by Microsoft, so it should not be much of a surprise.
It is equally well documented that Windows does not, as you say, 'updates those access times when it feels like it'. There is a statement in the documentation of 'fsutil behavior' that it refers to read-only access. And there's a strong statement in the article on 'File Times' (http// msdn . microsoft . com/en-us/library/windows/desktop/ms724290%28v=vs.85%29.aspx)
The only guarantee about a file timestamp is that the file time is correctly reflected when the handle that makes the change is closed.
which, together with the previous statements, suggests that that 'full hour' is only the worst-case read-only scenario for open files. If it affects only live images or if also affects a system that has been normally shutdown for post mortem imaging is not clear, however.
However, … as Microsoft's statements on this topic are scattered across their Windows documentation, they can't be considered entirely reliable for forensic purposes – they must be tested critically and thoroughly.
I have a strange result from some non-thorough testing of Last Access on a Windows 7 system with enabled Last Access updates. My current results suggests that Last Access timestamp on read may not update as expected for MFT-resident files, while it does do so for non-resident files. This does not tally with the 'correctly reflected when the handle that makes the change is closed' statement cited above.
I'm almost entirely certain, however, that the result is either due to the testing platform (I need to repeat it on another system entirely, preferrably one set up by another person), the rights of the tester (I'm admin on this system), or something similar (say, non-standard write cache configurations, or that I've only tested files of sizes 1 and 1024 so far), so it doesn't upset me too much. I only mention it because I think it illustrates the need to evaluate testing results and circumstances *very* critically.
I have a popular slide about this, which I call "Thinking like Sherlock, not Aristotle."
Exactly! )
My remark was - MUCH between the lines - connected with threads like (examples)
http//www.forensicfocus.com/Forums/viewtopic/t=5410/
http//www.forensicfocus.com/Forums/viewtopic/t=7764/
and particularly with this one
http//www.forensicfocus.com/Forums/viewtopic/t=9275/
On a completely unrelated field, I do like Cesar Millan (the "Dog whisperer") when he explains the meaning of things, as seen by dogs, as seen by humans and how they should be seen by both, somehting like
An open door means that the door is open, it does not mean "Ahh, good then I can go out", or "Then I can bark to anyone going through the door", it simply means that the door is open, and nothing more.
The method may evidence an anomaly, then one has to find out further evidence that this apparent anomaly is actually connected to "something evil" and that it cannot be due to something "common" and "normal".
Still within Sherlock's attitude
How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?
I find that often one needs to list and evaluate (and if the case exclude) beforehand the possible, the common, the probable.
jaclaz
So easy to read this as meaning that the last access times have an accuracy of up to an hour.
i.e the last accessed time has a resolution of an hour.
(if I am wording this correctly)
The only guarantee about a file timestamp is that the file time is correctly reflected when the handle that makes the change is closed.
Windows XP updates those access times when it feels like it, as much as a full hour after the actual access of the file.
From a closer reading of the second quote it seems as (1) the written last access time IS the time when the file is last accessed , BUT as per the first quote (2) it may take up to an hour for this to be updated on the HDD.
i.e these writes are cached and could be written from any time from up to an hour after the file was accessed.
So the correct accessed time is always written but this write may be delayed for performance reasons.
Is this everyone else's understanding of this also ?
That's my takeaway.
"So, you are saying that he could have killed her, for performance reasons, any time during the preceding hour…"