by Chris Cohen
There is a Linux Virtual Private Server (VPS) that you have been tasked to collect using a forensically sound method while ensuring confidentiality, integrity and availability. You have the password for a user who has ssh access to a shell account on that VPS and the user is in the super user group. You do not have access to the VPS control panel, and the VPS is located in a country which does not respect any legal notices from the country you are in. You need to log into the VPS via ssh and ask it to image itself.
To ensure forensic soundness we must keep any changes we make to the VPS to a minimum, to this end we will not be installing additional software. This means we will be limited to using only the default installed applications and that we will have to transmit the forensic image across the internet as it is being created. (1) To receive the image we require a Linux collection system we control with a public IP address and enough disk space for the image.
To ensure data confidentiality we will encrypt the exfiltrated data while it is in motion; it should be noted that this encryption will make the process slower than it would be if it were not encrypted. (2)
To ensure data integrity we will take a message digest hash of the image as it is being created, which can be later compared to a hash of the image received on our collection system to prove that it was transmitted without error. Integrity will also be maintained by the transmitted data being encrypted, if it were not then a man-in-the-middle could view as well as interfere with the data as it flows.
Lastly to ensure data availability we will keep the VPS running throughout this process, which means that we get a ‘smear’ of the drive and if we were to perform the imaging process again we would likely get a different hash value.
As is typical in Linux there are usually multiple ways in which any given task can be performed, the following is just my take on this problem. I’m sure that this is just one of many possible solutions and possibly not even the best.
To create the image of the disk we will be using the by-default installed file and disk copying program dd. Dd itself does not have any built-in hashing nor encryption capabilities, so we will have to use other installed programs to perform these actions. (3)
First we need to gather some information about the VPS we are on. To find what distribution of Linux is installed on the VPS use the following command if the information wasn’t display as part of the initial login:
To find the mounted devices, the file system on these devices, their mount point and how full they are:
To show file sizes in human readable format try df -Th.
For this article we will assume that there is a single partition mounted as /dev/vda1. The disk being /dev/vda
The following command will create a disk image of /dev/vda with padded read-errors, create a MD5 hash of it and save that hash to the file called vda.img.md5. The image data will be compressed and sent via ssh to the collection system:
sudo dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum > vda.img.md5) | gzip -c | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”
Where <user> is the username on the collection system and <IP> is its IP address. The image file can be decompressed on the collection system by running gzip vda.img.gz -d. It’s hash can then be calculated by md5sum vda.img.
To view and then remove the file containing the md5 of the image transmitted from the VPS the following commands can be entered:
One of the problems of stringing multiple Linux commands together as above is that if multiple parts of the command require additional user input like a password to be entered, then these requests can be presented to the user simultaneously, making it impossible to successfully input the requested information. In the command above, two elements may require additional user input. Firstly the dd access to the physical disk requires a super user password and secondly the ssh connection to the collection server requires a separate password to be entered. These two password entry requests can conflict. One way around this is to provide the sudo password to a command prior to the imaging command. Before the dd command is entered do a sudo ls and enter the password when prompted and it may be cached for subsequent sudo commands. If that fails (perhaps the VPS has been configured to not cache sudo passwords) then the sudo password can be passed in-line as below with the password being password:
echo ‘password’ | sudo -S dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum > vda.img.md5) | gzip -c | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”
Outbound ssh connections to new locations also require further user input to confirm that a connection is desired to the host. Subsequent connection attempts do not require this additional input, so to avoid the conflict, create and break a ssh session to the collection server prior to running the dd command, this will also confirm the accessibility of the server.
The ssh command cannot take an in-line password, and other than installing the public key of our VPS (which may not even have been created) on the collection server, a password will have to be entered for the ssh connection. The calculated md5 cannot be sent via ssh, as a separate ssh connection would require an additional password to be entered with the ensuring conflict in entering it. This is why for the commands shown above the calculated md5 was saved to a text file on disk. While creating a file is not ideal, the file is only small and therefore is unlikely to cause any issues. Lets not forget that just connecting to the VPS will make changes to multiple files.
If you wish to avoid directly creating any files on the disk one way to do this is to send the hash via netcat (which does not require a password to be entered but will also sends the data unencrypted) with the following commands:
On collection system:
nc -l <port> > vda.img.md5
sudo dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum | nc <IP> <port>) | gzip -c | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”
If you want to hash the data before it is compressed as well as after (so you have a hash of the disk itself as well as the data actually sent – which to my mind is overkill) you can do so with this command which will save the hashes to disk:
sudo dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum > vda.img.md5) | gzip -c | tee >(md5sum > vda.img.gz.md5) | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”
Or this command which sends the hashes via netcat:
On collection system – enter these two commands into two separate terminal windows, so both run concurrently:
nc -l 9000 > vda.img.md5
nc -l 9001 > vda.img.gz.md5
On evidence VPS:
sudo dd if=/dev/vda bs=128k conv=sync,noerror | tee >(md5sum | nc <IP> 9000) | gzip -c | tee >(md5sum | nc <IP> 9001) | ssh <user>@<IP> “dd bs=128k of=vda.img.gz”
In all commands sha1sum can be used instead of md5sum but it will take slightly longer.
The block size does not have to be set at 128k, the best block size can be determined by running tests, although the results will be pretty individual to that tested device. Therefore I’ve plumped for a nice sensible 128k.
When these imaging command are running we will not receive any information about its progress, while dd itself has an option to show its progress, turning this on interferes with the ssh password entry so it has to remain off. To determine how (and indeed if) the command is progressing you can use the following commands which will continually report the progress of the dd command every 5 seconds:
ctrl-z < this temporarily halts the command and returns the command prompt
bg < this backgrounds and resumes the command
jobs -l < this shows what tasks you have running and their Process IDs
while true; do sudo kill -USR1 <PID>; sleep 5; done
Where <PID> is the process ID of the running command as shown by the jobs command. To break out of the loop and return the backgrounded command to the foreground, do a ctrl-c and then a fg.
Note, running any of these disk imaging command is likely to greatly change the memory of the VPS, so if you’re interested in taking a memory dump then do that first.
- Ubuntu 16.04.1 LTS.
- Fedora 25 x64
- Debian 8.6 x64
- CentOS 7.3.1611
(1) For the benefit of this article we will assume that no tools have been removed in an effort to harden the system and that we can trust the tools already installed. If this isn’t the case then we could transfer known good statically-compiled tools to our VPS, although this is out of scope of this article.
(2) Encryption could be omitted if the data is being transferred over a LAN, though in this scenario it is travelling the public internet and therefore encryption should be used.
(3) The Linux forensic imaging program dcfldd does have the ability to hash-on-the-fly but it is not installed as standard on any common distribution.