I was recently dealing with some DMG files on an HFS+ formatted drive. The goal was to mount the DMG files read-only, and extract certain files without altering the file metadata. Could rsync -rpt (recursive, preserve permissions, preserve times) [source] [destination] be considered forensic? Checking the metadata dates after performing the rsync, I did not see any evidence of the dates being altered, and both the original and the rsync copy of the files had the same md5 hash value.
Does anyone know of a reason that rsync would be considered improper (not "forensically sound") to use in this fashion?
Does anyone know of a reason that rsync would be considered improper (not "forensically sound") to use in this fashion?
Isn't it the other way around – rsync is assumed to be dirty, until you show it isn't? What is your reason for believing it is? How did you test it? (What implementation of rsync is this?) As your question is so general, it's seems not just a question if it works in your particular case, but if it works in all cases (for HFS+, at least).
How does the md5 hash you used for testing relate to forked files? Does the tool you use hash only a single fork, or all of them?
How did you check metadata integrity – eyeballing, or a more detailed check?
Some more questions
File names up to 255 characters – does that work? Full Unicode file name – do they survive? (Also the private use areas of Unicode?) n-forked files – did you test that? What about journaling – does that survive? Of course, it would be useful to verify that large files survives as well – at least well past the 4Gb boundary. Done? HFS+ has ACLs – do they survive? Extended attributes?
(I recently read an article on testing tar for file transfer, and the problems involved with some very special cases, like sparse files – if HFS+ has anything like that, you may want to consider if a 'holey' file, and a file where all holes have been replaced with 0's really are the same from a forensic point of view. At the read() level, they are, but … )
What error cases are there – if there is an error, will you learn of it? Or will rsync just quietly die, leaving you to believe that everything worked as planned?
In the general case, *all* information should make it over – in special cases, it is enough if only the important information does. If yours should be a special case, do you know what information doesn't make it?
But basically it's a question of who you want to convince that rsync can be used for forensic purposes, and how you will convince them that it is.
Thank you athulin, your post provides many different ways in which this method (rsync) should be tested. I'll perform more detailed testing and report the results here, unless someone else posts their experience ("I tried this method and discovered that rsync did not handle the resource fork correctly", etc) before I am finished with the testing.