Some Pitfalls of Interpreting Forensic Artifacts in the Windows Registry

Presenter: Jacky Fox, student at UCD School of Computer Science and Informatics

Join the forum discussion here.
View the webinar on YouTube here.
Read a full transcript of the webinar here.

Transcript

Pavel Gladyshev: Welcome to University College Dublin. My name is Pavel Gladyshev, and this is a video blog of the Digital Forensic Investigation Research Group. If you ever performed computer forensics, you probably know that Microsoft Windows Registry is an important source of forensic information. But, like all forensic artifacts, it has to be interpreted with caution. One of our students, Ms Jacky Fox, performed research into standard ways of interpreting some of the common forensic artifacts in Windows Registry, and she found some interesting results, which she is going to talk about today.

Jacky Fox: Thank you very much, Pavel, for having me in today to share some of my work with you. I’ve just completed a dissertation on Windows Registry reporting, where I focused on the automating correlation and interpretation of the data. So, today I wanted to give you a brief overview of that project, and also, I wanted to give you a sample of some of the observations that I made while doing the project, particularly in the areas of Enum, MountPoints2, and UserAssist.

To start with, I looked at the way reporting is done on the registry today, and that is typically done on an operating-specific level, and this is both with open-source and commercial tools. The reports tend to work on a hive by hive basis, so they report everything from the system hive, everything from the software hive, and they also report the artifacts serially as found in the hive, as opposed to maybe in the order in which they would be read. So I wanted to investigate and see how far could you take correlating that data, and how high could you interpret that data. For instance, if an artifact was reported as a 0 or 1, but it might mean yes or no, I want it to report yes or no rather than 0 or 1, so that the examiner did not have to do that higher level interpretation themselves. I started by identifying some common areas that people reported, such as USB information; system information – when the system was last switched on; user-specific information – about user-specific actions on the system; and also the network artifacts – what networks a user has connected to and when they last connected to it. I then went about doing a thorough search through the registry to identify the specific artifacts, and I’m going to talk about one area, USB, just to give you a sample of what I did.

In this area, I found 25 different artifacts in the registry, or also some closely related files that I looked at. So I attempted to correlate, manipulate, and report all these artifacts together. Now, I’m going to switch to a slide now, which is rather detailed, and show you how these artifacts actually correlate with each other. So this slide here starts off showing typically where you would start looking for an artifact, under the USBSTOR key. Underneath this key, you have items showing the product and the vendor of each artifact, and then each device will be listed with a serial number. This serial number is the hub of all the information that relates together. There are different artifacts from different areas – you’ve got system hive artifacts, some from the software hive, some from the user hives, and then some from closely related files, for instance setupapi.log, which has the first install time of a device.

I want to go through and show you some of the detail about how this correlation works, and also how this is automated. So my next slide shows you this key here – the EMDMgmt key. This is related to, if you insert a USB device on to a system, the system will attempt to discover whether or not it could be used for cache, and it will check how much space is on the system, what speed it runs at, and as a by-product of this, it actually takes a record of the serial number of that volume. This serial number is stored in standard numerical notation, and I convert this to hexadecimal so that it can be used to go further and link it up to the link files on the system.

Link files on the system have a volume serial number both of the birth volume and also of its current volume. And so the last tested time can be used to find out whether any link files have been related to a specific USB device. This, obviously, if you’ve got numerous devices on a system and lots of link files on the system, is a very difficult task to undertake manually, whereas if it’s automated, it can come very quickly. I’d like to show you a sample report of this correlated information, and how much information you can get about one device put together from the different hives.

Here we have one device, and the first thing we have is the serial number. Then we go on to see the names that we have from the registry. Now, you’ll notice that there are several timestamps shown here, and these timestamps here are the first insertion since the last reboot. A lot of work was done by Rob Lee in this area, from SANS, where he showed that when you reboot a system, the first insertion will be recorded. However, if you’re hibernating a system or sleeping a system, subsequent insertion and extraction will not be recorded. The one that is typically used by most software to report this value is from the Enum tree, but I have reported other values because sometimes not all of these values will actually be recorded. The vendor ID is also taken from the registry, and again I go on to interpret this data further by looking up the Linux USB org, where the actual vendor is related to this number, and I report this rather than relying on someone to look it up manually. We also report the drive letter, and you may also have volume names reported at this point, the volume GUID and the volume serial number and link files, which we’ve been through on the previous slides. The first install time here is taken from the SetupAPI log file.

We then also report which users have actually used the device, and often this is only one, but it can be reported as several, and I’d ask you to note here two things about this slide. One, this side note I’ve put on here about the time possibly not being device-specific, and the second is that these three users all have the same timestamp for the last usage of the device.

Now, on to some of the observations I made. As I was going through, checking the scripts that I was writing, to see what was happening, on a couple of occasions my findings were not as expected, and in relation to the Enum tree, it is generally accepted with XP that the time in here is the first insertion since the last reboot. However, when I was looking at my data I could see that this was not always the case. So I decided to investigate it a little bit further. And I could see that the whole Enum tree seemed to have some kind of periodic update going on, where all the keys were being updated with a specific timestamp.

So I went about investigating this a little bit further. Just to point out that I actually was using some external test data while doing this, and I noticed that this particular set of test data here is from the Digital Corpora maintained by Simson Garfinkel. And you can see that here are two devices listed here, just starting in red, the two different serial numbers, and you’ll note that the timestamps here for the first insertion since last reboot are actually the same for both devices. So this phenomena was not just common to my own generated test data or my own systems, it’s there in public data. So I went about trying to find out a little bit more about this event, what possible causes it could have and what way I could record it happening. So first of all I used a product called Registry Decoder, and I used this to go through and evaluate the Enum trees for the hive samples that I had. One hive sample I had had 17,000 different keys in it, and within the Enum tree these had all been updated with the same timestamp, plus or minus 20 seconds, all the way through.

So I decided then I’d try and watch or observe or even trigger this event to see what was happening. I used a product called USBDeview to do this, and I set this monitoring on my own system while I was using it. And at one stage I actually witnessed the Enum tree update occurring, and I knew what I was doing on the system at the time and I was able to evaluate what other things were happening on the system at the time. So I knew that it was not something like a power saving event, a shutdown of the system, a restart, a hibernate, it wasn’t my antivirus software running, it wasn’t a volume shadow being taken, and it wasn’t the insertion or extraction of a device. And I was currently actually using a USB keyboard and mouse on the system, so I knew that it wasn’t USB shutting off, or anything along those lines.

So I observed further through several hives on different systems, and the event seemed to be happening approximately every 24 hours of active usage. I couldn’t actually identify what was making the event happen, but I was able to code into my scripts the recognition that this had happened by looking through the Enum tree and identifying when all the keys had a plus or minus 20 second occurrence within them. So all I was able to do was report when this Enum event had happened, as opposed to why it was happening on a system. I feel it’s relevant to do this because particularly in the Windows Vista and 7 environment, it seems to be prevalent across all hives that I looked at. And you could ask why am I still reporting that key in my scripts – well, the key is still valid, and the timestamp is still valid, if the device was inserted post the Enum event happening.

The next observation I’d like to share with you is about MountPoints2. When I was doing my tests on the system, I noticed that on occasion, several users would have the same timestamp. Traditionally, it’s reported and commonly referenced that if a user has an entry for a USB device in their user hive under MountPoints2, that this device can then be associated with that user – the insertion of a USB device will update that user hive. And what I found is that the insertion of a USB device will update the user hives for all currently logged on users on a system. So a user who happens to be logged on in the background but has never accessed a device can actually have an entry in MountPoints2, detailing a USB device that they have never used.

This will only happen on a system where Fast User Switching is used, where instead of actually logging out of a system before a user goes away, they switch. This is very common on home systems. In a domain environment where the users are using XP, this is disabled by default. In a Windows 7 environment, fast user switching in a domain environment is enabled by default, often though system management will disable it, but it can be enabled by choice. Some people will choose to enable Fast User Switching on their devices.

If this occurrence has happened in the near past, it is quite obvious, when you look at several hives on the system, that probably only one user inserted it if they all have the same timestamp. However, if it happened in the distant past, and one of the users has had subsequent usage of that device, it’s not so obvious that the initial recording in one person’s user hive was related to somebody else using a device and not them. So my interpretation of this is that when looking at a system, that you must look at all the user hives on a system, and if there is evidence that multiple user hives have used a specific USB device, then you must find other corroborating evidence to say that a user has actually used a device, and that it wasn’t just there by nature of the fact that the user was logged in while another user inserted the device.

The last observation I’d like to share with you is about Windows 7 User Assist. User Assist is used by Microsoft to actually enhance the user experience by allowing the start menu to include recently used applications both from the desktop and the explorer, and it’s useful from a forensic perspective because it will tell us how often an application has been used. And in an XP and Vista environment, this counter that’s used starts at 5, and anything underneath the 5 is normally some kind of focus as opposed to an actual usage of an application. I translated this knowledge through to Windows 7, where it’s known that the counter starts at 1. And so when I looked at hives on system that had been extracted chronologically over 4 separate months, I expected to see my usage count growing on applications. However, with one application that I’m showing you here on Notepad, this was not the case. The application started at a number, went up, went down, and went down even further. On Windows 7 it still records the usage count, it records a focus count, and it also records the last time that something was used. It did not makes sense that the usage count was going down to 0. So I looked at it in further detail, and I could see this was happening again across multiple hive sets. It appeared to be there was a persistent reset to 0 going on around about a month end. You’ll notice that on the last value there, in the 2nd table, the last usage time has been retained, so I knew that the value was being reset to 0, even though it was still recorded that the application had been used.

On studying this further, I could see it was typically around about a month end, but on studying it further again, I could see that if an application was in persistent use at the rollover of a month end, that it didn’t set to 0, that it would keep climbing. I’m currently doing further investigations on this, so that I can actually predict when it will be set to 0 and understand how that could happen. I’ve set a user hive going against two applications, one of which I’m going to keep using for a complete two-month period, and the other which I’m going to stop using after a fortnight, and observe at what point it gets set to 0. This is interesting because we now know that we can look at an application usage more so than saying, “Somebody used that a hundred and fifty times,” when it could have all been two years ago. It’s interesting because now you can see a pattern of usage, and particularly if you go back over volume shadows, you’ll be able to pull out hives that could show possibly that somebody had a lot of usage on an application that’s of interest to an examiner at a particular point in time, or none whatsoever.

So, anyway, that’s the end. Thank you very much for listening to me today. I hope that this information was interesting to you, and if you’ve got any comments or if you’re interested in any of the scripts that I’ve written, or reading the dissertation, please feel free to get in contact with me. Thank you very much.