dd – Forensic Focus

by Chris Cohen

There is a Linux Virtual Private Server (VPS) that you have been tasked to collect using a forensically sound method while ensuring confidentiality, integrity and availability. You have the password for a user who has ssh access to a shell account on that VPS and the user is in the super user group. You do not have access to the VPS control panel, and the VPS is located in a country which does not respect any legal notices from the country you are in. You need to log into the VPS via ssh and ask it to image itself.

To ensure forensic soundness we must keep any changes we make to the VPS to a minimum, to this end we will not be installing additional software. This means we will be limited to using only the default installed applications and that we will have to transmit the forensic image across the internet as it is being created. (1) To receive the image we require a Linux collection system we control with a public IP address and enough disk space for the image.

To ensure data confidentiality we will encrypt the exfiltrated data while it is in motion; it should be noted that this encryption will make the process slower than it would be if it were not encrypted. (2)

To ensure data integrity we will take a message digest hash of the image as it is being created, which can be later compared to a hash of the image received on our collection system to prove that it was transmitted without error. Integrity will also be maintained by the transmitted data being encrypted, if it were not then a man-in-the-middle could view as well as interfere with the data as it flows.

Lastly to ensure data availability we will keep the VPS running throughout this process, which means that we get a ‘smear’ of the drive and if we were to perform the imaging process again we would likely get a different hash value.

As is typical in Linux there are usually multiple ways in which any given task can be performed, the following is just my take on this problem. I’m sure that this is just one of many possible solutions and possibly not even the best.

To create the image of the disk we will be using the by-default installed file and disk copying program dd. Dd itself does not have any built-in hashing nor encryption capabilities, so we will have to use other installed programs to perform these actions. (3)

First we need to gather some information about the VPS we are on. To find what distribution of Linux is installed on the VPS use the following command if the information wasn’t display as part of the initial login:

uname -a

To find the mounted devices, the file system on these devices, their mount point and how full they are:

df -T

To show file sizes in human readable format try df -Th.

For this article we will assume that there is a single partition mounted as /dev/vda1. The disk being /dev/vda

The following command will create a disk image of /dev/vda with padded read-errors, create a MD5 hash of it and save that hash to the file called vda.img.md5. The image data will be compressed and sent via ssh to the collection system:

sudo dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum > vda.img.md5) | gzip -c | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”

Where <user> is the username on the collection system and <IP> is its IP address. The image file can be decompressed on the collection system by running gzip vda.img.gz -d. It’s hash can then be calculated by md5sum vda.img.

To view and then remove the file containing the md5 of the image transmitted from the VPS the following commands can be entered:

cat vda.img.md5

rm vda.img.md5

One of the problems of stringing multiple Linux commands together as above is that if multiple parts of the command require additional user input like a password to be entered, then these requests can be presented to the user simultaneously, making it impossible to successfully input the requested information. In the command above, two elements may require additional user input. Firstly the dd access to the physical disk requires a super user password and secondly the ssh connection to the collection server requires a separate password to be entered. These two password entry requests can conflict. One way around this is to provide the sudo password to a command prior to the imaging command. Before the dd command is entered do a sudo ls and enter the password when prompted and it may be cached for subsequent sudo commands. If that fails (perhaps the VPS has been configured to not cache sudo passwords) then the sudo password can be passed in-line as below with the password being password:

echo ‘password’ | sudo -S dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum > vda.img.md5) | gzip -c | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”

Outbound ssh connections to new locations also require further user input to confirm that a connection is desired to the host. Subsequent connection attempts do not require this additional input, so to avoid the conflict, create and break a ssh session to the collection server prior to running the dd command, this will also confirm the accessibility of the server.

The ssh command cannot take an in-line password, and other than installing the public key of our VPS (which may not even have been created) on the collection server, a password will have to be entered for the ssh connection. The calculated md5 cannot be sent via ssh, as a separate ssh connection would require an additional password to be entered with the ensuring conflict in entering it. This is why for the commands shown above the calculated md5 was saved to a text file on disk. While creating a file is not ideal, the file is only small and therefore is unlikely to cause any issues. Lets not forget that just connecting to the VPS will make changes to multiple files.

If you wish to avoid directly creating any files on the disk one way to do this is to send the hash via netcat (which does not require a password to be entered but will also sends the data unencrypted) with the following commands:

On collection system:

nc -l <port> > vda.img.md5

On VPS:

sudo dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum | nc <IP> <port>) | gzip -c | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”

If you want to hash the data before it is compressed as well as after (so you have a hash of the disk itself as well as the data actually sent – which to my mind is overkill) you can do so with this command which will save the hashes to disk:

sudo dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum > vda.img.md5) | gzip -c | tee >(md5sum > vda.img.gz.md5) | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”

Or this command which sends the hashes via netcat:

On collection system – enter these two commands into two separate terminal windows, so both run concurrently:

nc -l 9000 > vda.img.md5

nc -l 9001 > vda.img.gz.md5

On evidence VPS:

In all commands sha1sum can be used instead of md5sum but it will take slightly longer.

The block size does not have to be set at 128k, the best block size can be determined by running tests, although the results will be pretty individual to that tested device. Therefore I’ve plumped for a nice sensible 128k.

When these imaging command are running we will not receive any information about its progress, while dd itself has an option to show its progress, turning this on interferes with the ssh password entry so it has to remain off. To determine how (and indeed if) the command is progressing you can use the following commands which will continually report the progress of the dd command every 5 seconds:

ctrl-z < this temporarily halts the command and returns the command prompt

bg < this backgrounds and resumes the command

jobs -l < this shows what tasks you have running and their Process IDs

while true; do sudo kill -USR1 <PID>; sleep 5; done

Where <PID> is the process ID of the running command as shown by the jobs command. To break out of the loop and return the backgrounded command to the foreground, do a ctrl-c and then a fg.

Note, running any of these disk imaging command is likely to greatly change the memory of the VPS, so if you’re interested in taking a memory dump then do that first.

Tested on:

Ubuntu 16.04.1 LTS.
Fedora 25 x64
Debian 8.6 x64
CentOS 7.3.1611

Chris Cohen

chris.w.cohen@gmail.com

(1) For the benefit of this article we will assume that no tools have been removed in an effort to harden the system and that we can trust the tools already installed. If this isn’t the case then we could transfer known good statically-compiled tools to our VPS, although this is out of scope of this article.

(2) Encryption could be omitted if the data is being transferred over a LAN, though in this scenario it is travelling the public internet and therefore encryption should be used.

(3) The Linux forensic imaging program dcfldd does have the ability to hash-on-the-fly but it is not installed as standard on any common distribution.

First published March 2008

Linux dd can be a powerful and flexible tool to have in your box.You will find it installed by default on the majority of Linux distributions available today and it can be used for a multitude of digital forensic tasks, not least of which is providing a simple means of obtaining a raw image of a file, folder, volume or physical drive. It has a simple, relatively intuitive syntax and a useful set of options to extend its basic capabilities.

On the negative side it does not give any feedback to the user when it is launched, has no error checking by default and perhaps most importantly can be very destructive if you get things wrong, earning it the nickname of “Data Destroyer” (dd) over the years.

As always, read the man pages before you use it [# man dd] and fully test the processes in a safe environment before letting it loose on a job that really matters.

The basic dd syntax is as follows:

# dd if=

of= bs=(“if” being “input file” and “of” meaning “output file”).

(bs= is actually one of the options that I mentioned above. If you don’t include it dd will use a default byte size of 512. The byte size is usually some power of 2, not less than 512 bytes. For example: 512, 1024, 2048, 4096, 8192, 16384. It can however, be any reasonable number). Personally I always set the byte size manually so that I know exactly what is going on with the process that I am running.

It should be easy to work out from the basic command that “if=” is the data being read whilst “of=” is where the data is being written to. It should also be obvious that if you reverse the source and target entries by mistake, you can potentially overwrite your source with your target. In real terms this can mean filling the contents of your suspect drive with all of the zeros from your sanitized evidence drive. Of course, if you have your suspect drive attached through a write blocker as I previously suggested you should be protected to a certain extent from this kind of error. The main thing is to take care with your data entry and get the syntax right before you hit the return button.If you are wondering what I mean by sanitized evidence drive, it is simply the process of wiping and formatting a drive prior to writing new evidence to it. You should always make sure that you start any investigation in this way so that the danger of residual data on your target drive corrupting your evidence is removed. You can use “dd” to do this using this command:

# dd if=/dev/zero of=/dev/

This process will basically fill your target drive with zeros, overwriting any data as it goes. One pass should be enough although you can of course run it as many times as you like before re-formatting the drive. The byte size used in the example will be the default 512. You are free to choose any size you wish and may see reductions in processing times as a result of using a larger number. Experiment with different byte size entries on a spare drive and see what difference it makes. If time is not an issue, then just stick with the default.

Now that we have the basic syntax (# dd if=

of=) we can see that what dd is doing is copying chunks of data from the source, in this example in the default 512 byte blocks, and writing that data to the target, which can be a file or another block device. So we now have a choice as to where, and how we store our forensic image. Lets say that we have an 80 GB hard drive that we want to image. You could send the output straight to a wiped and formatted drive, like this:# dd if=/dev/

of=/dev/ bs=512 conv=noerror,syncwhich produces a straight copy of the original.

You can write the output to a file:

# dd if=/dev/

of=/home/user/linux_image.dd bs=512 conv=noerror,syncalthough in practical terms an 80 GB (uncompressed) file might be a little unwieldy to deal with, unless you then use dd again to write the file back to a clean disc (again a straight copy):

# dd if=/home/user/linux_image.dd of=/dev/ conv=notrunc,noerror

Which simply writes the contents of linux_image.dd to your target device.

You will have no doubt noticed that I have introduced several new switches using the conv= (conversion) option on the back of the command. These are very important additions that I had already alluded to in paragraph 3 above. These switches turn on various forms of error checking within the dd command. By default dd will happily copy out data until it locates a sector or block on the source device that it can’t read. Then it will just stop what it is doing and you won’t have a full image. Using conv=noerror,sync will adjust this behaviour so that dd will pad the bad sectors with zero characters and then carry on copying the rest of the data that it can read. The second part of the switch, sync provides the zero padding and also ensures that the sectors on the target device are aligned with those from the source device, thus ensuring an accurate replication of the original media. notrunc simply tells dd to keep copying to the end of the target device rather than truncating the image early.

There are a number of other useful switches within dd. Open up # man dd to see an explanation of them all.

There is just one more area that I want to cover briefly before I move on and that is splitting images into manageable size files using dd and a unix tool appropriately called split. To do this on the fly using dd you simply have to pipe the dd if= through the split command like this:

# dd if=/dev/

| split -d -b 2000m – image.split.I intend to talk about splitting images in a later post so won’t elaborate too much here. Suffice to say that the above command takes standard output from the dd command and pipes it as standard input to the split command. The result (in this case) is a series of 2 GB files, in the current directory, that will be named ‘image.split.01’, ‘image.split.02’ and so on.

As I say, there will be a more detailed look at this technique in later posts. For now just get used to the difference in syntax from a standard dd operation (i.e. no of= string).

Well, that’s a brief overview of Linux dd, it should certainly be enough to get anyone started with the basics of using it as a forensic tool. As always I would advocate further reading (man dd) and of course a Google search will throw up a good amount of reference material.

—

Reprinted with permission from PC-Eye (Digital Forensics)

dd

Asking A VPS To Image Itself

Linux ‘dd’ basics