http//
I had some thoughts the other day regarding relative confidence levels of data sources (ie, how confident that the analyst can be that the sources are accurate), as well as using multiple data sources to increase that relative confidence level, as well as add context to the data.
A common intelligence practice is to rate your source and data according to an A1-F6 scale. I've applied this quite a few times on digital analysis work.
On the wikipedia page for Intelligence Collection Management (http//
scroll down to;
Ratings by the Collection Department
(and don't rate everything as 'F6'!)
Software such as i2 Analyst Notebooks allows you to build this into the charting side of things, and shows the connecting links as dotted, dashed, straight, bold, etc, based on the rating given.
I am putting together a paper which goes through how timeline analysis works for intelligence practitioners, and how it can be applied in digital analysis, which includes information rating.
regards,
D
That's an interesting rating system. Sadly, I've seen reports where I'd have to rate some of the Findings or Conclusions F-6, as there is little technical data to support the finding.
This is probably getting too far outside of the discipline at the moment, but it is something very interesting to consider.
I usually consider information rating as I work through data, and often ponder the rating to give to different items. For example, what rating should entries in the Event Log get for reliability and accuracy. They are possibly more reliable than file MAC times, which are easier to manipulate, but should they be considered as A, B or C? Also, if they can be verified with other events, would that hence increase their rating?
The rating is used for data, rather than findings or conclusions. The rating has a bearing on the findings and conclusions, but the findings are not rated per se
(and yes, lots of data is rated F6, hence my comment not to mark everything as F6, but to think about the data. In my experience too much is rated as F6 as it is an easy answer…)
Or perhaps this is more about premises, inferences and conclusions?
premise + premise + premise = inference
inference + inference + inference = conclusion
For example;
premise 1 - event log shows "an event" occuring at time 000000
premise 2 - internet history shows "an event" occuring at 000001
premise 3 - email data shows "an event" at 000000
inference - "an event" occured at about 000000/1
This is useful when preparing and writing up reports to show the data and logic process used to get to findings and conclusions. (Or am I way off?)
Darren,
This is why I push timeline analysis…including multiple sources allows me to overlay different events and view them with context, increasing my confidence level in the data itself.
I agree,
Why rely on one source, when you can verify (or disprove) with other sources. Event logs are a great source, especially when you get an NTP update entry that shows the system clock being adjusted to the correct time, and it shows the previous time. Confidence in the time stamps increases a lot in that case.
Pity there isn't an automatic way to prepare a timeline though. I have a current case with 30+ computers, and am not looking forward to manually preparing a timeline…but then again, dumping all the data into one database would be enormous, and would include a lot of superfluous information… oh well, back to the grindstone.
Pity there isn't an automatic way to prepare a timeline though.
Well, I think part of the issue is that while timeline analysis has been around since 2000-2001 (Rob Lee wrote mactime.pl to parse TSK bodyfiles), it really hasn't been something that has been discussed. This could be because many analysts pretty much follow what they're trained to do, and a good deal (albeit not all) of training comes from vendors…EnCE, ACE, etc. For the most part, timeline analysis has been relegated to those who see the value and have the confidence in their abilities to do part of it themselves, and ask others for help.
What's interesting, as well, are some of the comments I've seen regarding this topic. For example, some folks have asked for XML output, but when asked for style sheet recommendations, they generally don't respond. Another largely misunderstood aspect is having a GUI means of analysis on the front end…for a lot of reasons, it simply doesn't make sense.
With respect to the database, that should be pretty easy to do at this point, with the currently available tools…of course, it would be a somewhat manual process regardless, but still…once you have 30+ systems in a DB, you're good to go…right?
One thing that occasionally gets under my skin is the fact that many (actually nearly all,) forensic tools only report times to the nearest second which is not necessarily the accuracy that they are recorded in the evidence. NTFS FileTime stamps have an accuracy of 100 nanoseconds. FAT 'File Created' time is accurate to 10 milliseconds whilst a FAT 'Last Access' time has a resolution of just 1 day.
Taking an extreme example, say you have a set files on an NTFS system with times ranging from
2010-02-03 000101.0000001
to
2010-02-03 235959.9999999
You have a file on a USB stick that was connected to this machine and has a 'Last Accessed' date of 2010-02-03. Where does this 'event' fit in the timeline? The granularity in this case is so different as to render the results almost meaningless, yet most forensic tools will report all of these dates to the nearest second (the resolution of the programmers favourite, time_t in the C programming language). To add insult to injury when ordering by time, the FAT time would either be slotted in at the start or at the end of the list when the mathematical probability is that the file was accessed around 2010-02-03 120000 and would perhaps be best placed in the list at that point.
I recently placed a time utility in the downloads section of this forum here that will report times at their maximum resolution. A function to build a timeline from a list of exported times was on my list of things to be done for this app - I just never got round to it.
Paul
Paul,
I'm not sure I see how that level of granularity is of significant value. More so than the level of granularity, I have seen how having a preponderance of data is of much more significant value.
In most instances involving numbers of files, the questions I've been asked included "how many files were downloaded", rather than, "what was the exact order in which the files were downloaded"…
I'm not sure I see how that level of granularity is of significant value. More so than the level of granularity, I have seen how having a preponderance of data is of much more significant value.
In most instances involving numbers of files, the questions I've been asked included "how many files were downloaded", rather than, "what was the exact order in which the files were downloaded"…
Whereas in the kind of analysis I generally do I am looking at a wide variety of date related data around the creation/uploading of a particular file and seeing what material is related to that event and what is not. I suspect that in these instances I am looking at much more fine tuned time/date related material using a smaller dataset.
Generally speaking the nearest second is fine. It just irks me that so called 'Forensic software' is translating data incorrectly.
Paul