by Mike Sheward, a contributor to InfoSec Resources.
Digital forensics is one of the most interesting and exciting fields of information security that you can ever be fortunate enough to work in, but not for the reasons you might expect. To those who have never been involved in an investigation, sorry to disappoint, it’s nothing like the movies or TV. There are no magical programs that can unravel the world’s strongest encryption algorithms without the need for a key, right before your eyes. Sure, there are some that will have a good go, and often be successful, but they usually require a good dollop of time and as many hints from the investigator as possible. There are, however, a multitude of processes and procedures that you must follow to ensure digital evidence is handled and processed correctly, before it can even be considered digital evidence at all. Then there’s documentation, which is usually followed by additional documentation.
Now this may sound neither interesting nor exciting, but it is, trust me. Personally, I find that the excitement comes from the significant number of challenges that you face during an investigation. One such challenge is making sure you are doing everything you can to look for evidence, while sticking to forensically sound procedures. A good example of this is when you are asked to acquire volatile evidence. You know that you are breaking the golden rule of digital forensics by interacting directly with live evidence rather than a duplicate, but you have to, otherwise additional evidence could slip away
and be lost forever.
New and evolving technologies also create new challenges for the investigator. Working with a new file system or even just a new type of file can require a change in approach or the development of a new technique. While these changes may require slight alterations to well defined procedures, it is extremely rare to have to deal with a technology that is a complete “game changer”.
The digital forensic community is currently facing one of these rare situations – the rapidly increasing popularity of solid state hard drives (SSD’s).
Forensics investigators and hard drives have developed something of a mutual understanding over the past couple of decades. That is, if we connect them to a write blocker, they’ll tell us everything they know and there is zero chance of them changing their contents and invalidating our evidence. This understanding has come about because the conventional magnetic hard drives sold today are essentially the same as those sold twenty years ago. Okay, so capacities have increased dramatically, and interfaces between motherboard and disk may have been updated to improve performance, but the fundamental inner workings of the magnetic disk have remained unchanged. Manufacturers build magnetic disks in a largely standard fashion, so no matter which production line it rolled off, it’ll behave in a predictable and repeatable manner, and in digital forensics those are two extremely important qualities.
SSD’s are a whole different animal. Like any technology that is new and evolving, manufacturers are still tinkering with the design and implementation. Compared to magnetic drives, there is no standardized approach to producing them. With that, the ancient understanding between forensic investigators and hard drives has been torn to shreds. Suddenly our “bread and butter” predicable evidence source has become a mysterious and secretive device.
So what makes them so different, and why does this affect digital forensics? Well to answer that question, let’s recap how magnetic drives work and compare them to SSD’s. If you’ve never looked into the science of hard drives, it is a fascinating and remarkable process.
Magnetic drives store data by creating extremely small magnetic fields on a thin magnetic coating that is applied to a circular non-magnetic disk, known as a platter. Modern day disks contain multiple platters. The direction of the magnetic field created indicates whether the data stored is a “0” or a “1”. The surfaces are magnetized using a “write head”, which is an extremely thin but highly magnetic piece of wire that floats just above the surface of platter. To keep data in order, platters are divided into tracks, which are concentric circles that start at the center of the platter and radiate out to the edge. Tracks are further divided into Sectors, which are segments or “pie slices”. When the computer knows which track and sector address a piece of data is stored in, it can translate that into a physical location within the disk. A second head, known as a “read head”, will fly over to that location and access the data.
For optimum performance, a magnetic disk will start recording data on the first available sector, and then continue recording on the next closest free sector. This is to ensure that the read head doesn’t have to jump around all over the place to access an entire file. However, through normal use it is it common for chunks of files to become physically displaced across the disk. A cure for this is defragmenting the disk. This process reduces the time required to access a file by moving the fragmented “blocks” of files closer together.
It’s the job of the operating system’s file system to translate logical mappings into the physical locations on the disk we have just discussed. This is why when you format a drive; its contents appear to vanish. Formatting resets the file systems logical mappings, but doesn’t actually touch the data in the physical disk locations – which is of course why it is still possible to recover data even after a drive has been formatted.
The file system will declare the sector as unused and ready for new data. In a magnetic disk, if new data is written to an unused sector that physically still holds old data, it’s no big deal. The new data simply overwrites the previous data; and the write head only requires one pass across the platter to perform the operation.
The biggest noticeable difference between an SSD and magnetic disk is that there are no moving components – hence the “solid state”. This makes them less susceptible to damage caused by shock. It also makes them lighter and means that they drain less power – all valid reasons for why they are becoming popular choices for mobile computers.
SSD’s are based on the same technology found in USB flash drives. They record data using microscopic transistors, which trap a tiny electric charge. The presence of this charge is what determines if the transistor represents the familiar “0” or “1”. A fully charged transistor will not allow any more electricity to flow through it; the drive recognizes this and returns a “0”. An uncharged transistor on the other hand, allows current to flow through it, resulting in a “1”. A totally empty drive has all transistors fully charged. Charge can remain in the transistor for years without additional power being required, meaning that data will remain on the device for just as long. The main benefit of this approach is that the time required to write data is reduced significantly when compared to magnetic drives. Transistors can be charged in microseconds, while those Stone Age magnetic drives take milliseconds to create their magnetic fields.
For all the advantages they bring, SSD’s are not without their drawbacks. The methods used by SSD manufactures to compensate for these drawbacks should be of great concern to the digital forensics community.
Whereas a magnetic disk can theoretically be written to an infinite amount of times, an SSD transistor has a comparatively short life expectancy. Typically they can only be written to about 100,000 times before they are likely to fail. So, unlike the magnetic hard disk, which tries to keep blocks of a file as close to each other as possible; an SSD spreads the load across all the unused transistors in the drive randomly. This technique, known as wear-leveling, avoids consistently storing charge in the same group of transistors, which would make them wear out faster. The computer’s operating system is not aware of this process thanks to the SSD’s onboard controller card. The controller presents the operating system with an abstracted list of hard drive sectors. So for example, the OS may think it’s writing a file to sector 15, but it’s actually writing to sector 155. Then, when the OS needs to go back and read sector 15, the controller card will receive that request and know to return the data in sector 155. Later, if the OS overwrites the contents of sector 15, the controller will create a new mapping and write the contents to sector 200 to ensure that sector 155 gets to take a break. You can compare this process to Virtual Memory Address Translation, which performs a similar job between applications and RAM.
This could have implications on a forensics investigation where evidence is stored on an SSD with a damaged controller card. Faced with the same situation on a magnetic disk, replacement of the controller card with one from the same model would allow an investigator to fix the drive and recover the data. However, without knowing the specifics of the wear-leveling mechanisms used by SSD manufacturers, it is impossible to say for certain whether a replacement controller card would know how to translate the correct virtual-to-physical mappings back to the investigators machine or imaging device. If this is a truly random process and only the damaged controller card knows how the mappings have been set up, it wouldn’t. The contents of the drive would be presented as a jumbled up mess, making data recovery an almost impossible task. Worse still, the integrity of the evidence could be called into question, because the image that the investigator acquired would bear no resemblance to the original disk layout.
Another problem faced by SSD manufacturers is that you can’t actually overwrite flash memory, at least not in the conventional sense. Remember, a magnetic hard drive overwrites existing data in just one pass of the write head. SSD’s on the other hand are forced to “erase” the contents of a transistor before they can write new data to it. It’s like parking your car at the mall the weekend before the holidays; you often have to wait for someone to leave their space before you can put your car in it. If you tried simply driving over the car already in the space, this simply would not work. This problem is compounded because SSD’s storage space is divided in blocks, typically 512Kb in size. If just one byte of data has to be updated, the whole block has to be erased before the updated data can be written. This causes a slowdown in performance, because the SSD has to perform multiple “passes” to overwrite data.
To address this issue, manufacturers are believed to be implementing routines that will preemptively erase old data from blocks that are no longer in use by the computers file system. The routines are managed by the SSD’s on-board controller. This is huge, and could represent the single greatest challenge to accepted digital forensics practice to date.
When we attach a magnetic disk to a write blocker, we are absolutely certain that no command that could alter evidence will reach the disk controller, but when the command comes from within, as is the case on an SSD – we have absolutely no control over it.
To explain why this is such a big deal, let’s run through a typical case study. An investigator seizes an SSD from a suspect machine. Having been nervous that he might be about to get caught, the suspect has formatted the drive to cover his tracks. In doing so, the OS marked all the sectors of the drive as unused, including some that still hold incriminating evidence. The investigator takes the seized drive back to the lab and connects it to a write blocker for imaging. Prior to imaging the investigator produces a cryptographic hash of the source drive. During the imaging phase, the powered-on hard drive starts to perform one of it’s on board “clean-up” routines. Its controller erases sectors that contain old data before they can be imaged, and therefore the evidence is lost. When the imaging process is complete, the investigator creates a second cryptographic hash to verify the integrity of the image. When the investigator compares both hashes, there is a difference – caused by the “clean-up” routine removing old data. This makes it impossible for the investigator to confirm the integrity of the evidence and with that, widely accepted forensics best practice is rendered useless.
It’s possible to see this happening by way of a simple experiment. Using WinHex, I wrote to every sector of a 64GB Samsung SSD drive, filling the entire drive with “a” characters. This simulates normal data filling up a drive.
I then proceeded to format the drive in Windows; this simulates the suspect formatting the drive to cover their tracks, and should trigger the SSD controller to start erasing the unused space.
After the formatting was complete I powered down the windows system and connected the SSD to a write blocker. Using FTK Imager, I generated a hash of the drive.
It took about an hour to generate the first hash, so once this had completed I restarted the process and generated a second hash. After another hour, the results are shown below.
This proves that despite the SSD being attached to the write blocker the whole time, the contents of the drive had still changed.
So how does forensics deal with this? In the short term, with no reliable method of repeatedly obtaining the same hash twice, SSD’s will have to be treated just the same as any other volatile evidence source. Investigators will have to rely on both documentation and demonstration skills to show exactly what steps have been taken while working on the evidence, and hope for an understanding jury. This is less than ideal and can’t go on forever. In the longer term, the onus is surely on the manufacturers of these drives. They will need to open up about, or standardize, the way these clean-up routines are implemented. Perhaps all controller cards should be able to receive a “no erase” command from a write blocker, effectively locking them. It would be only a matter of time before someone hacked the firmware of a drive and configured the controller to do just the opposite upon receipt of this command. We are just at the beginning of this challenging phase for digital forensics, and it’s a very interesting and exciting place to be.
This post was written by Mike Sheward, a contributor to InfoSec Resources. InfoSec Institute is a provider of high quality information security training.
Thansk for this great article. I am studying CF at university and you have given me some great ideas for a little project.
Really a good read. I must say it will defiantly helping me to improvise my knowledge on Computer Forensics.
Hmmmm, so the SSD is aware of the host file system and its free space bit map? Interesting — load leveling (picking where to write a block of data to avoid creating premature “wear” on a hot spot) happens at a level below the controller and is normally not visible to native commands running on the host. Time to spin up a SSD and do some more testing as this counds rather anomalous.
What I suspicion is going on here has to do with the TRIM command which is a way the opreating system can tell the SSD that disk blocks are no longer needed and do not need to be preserved when garbage collecting. So when a file is permanently deleted or the disk is formatted, the operating sytem can issue a TRIM command to tell the drive which LBA’s are now free (as far as the system is concerned) and the drive does not need to preserve that data when doing its garbage collection. Note that TRIM has to be supported by both the drive AND the operating system for this behavior to occur.
Thanks for the post — I had to go learn something new. 🙂
Very good quality article, technical but easy to follow at the same time. Thank you.
Using a OCZ VertexPlus SSD, I could not reproduce the changed hashes. After filling the drive with 0x6161, doing a quick format as NTFS on a Windows 7 system (with the TRIM command enabled), attaching it to a Linux box through a write blocker, two successive sha256sum hashes had the same value. What did change was that after sector 2400 or so, the disk contents were 0xFFFF rather than the 0x6161 I had written. It appears the controller returns all F’s for data read from sectors that have not been written. You might find this article http://techgage.com/article/too_trim_when_ssd_data_recovery_is_impossible/ interesting as well.
Although your results are interesting, I conducted my own experiments using a SSD to check if the contents were changed when connected using a write-blocker. My results showed that the data remained and the hashes matched every time.
The results seem to change dependant on the SSD itself along with the controller, so be careful of this.
The best research that can be conducted at the moment is to test this on the majority of current market SSDs, a list can then be compiled of all the SSDs that will overwrite data during imaging and those that won’t. I imagine this will be a great aid for investigators worldwide.
PS. I used an OCZ Agility 3 SSD