Define "On-the...
 
Notifications
Clear all

Define "On-the-fly Hashing"

6 Posts
4 Users
0 Likes
956 Views
n00bcfe
(@n00bcfe)
Posts: 26
Eminent Member
Topic starter
 

I'm trying to understand hashing on-the-fly. In this case, I'm referring to the hashing of the source drive. Can someone provide me a technical definition of hashing on-the-fly?

My understanding is that the amount of data being read (at any moment in time) is dependent on the set block size. In addition, the data read is being hashed at the same time (or..it is at least getting hashed before moving on to reading the next block of data from the source).

Please let me know if I have misinterpreted that.

Assuming I'm on the right track, let's say I did something stupid, like set a huge huge block size. This would mean I would be reading a large amount of data at once, and be trying to hash it at the same time. Wouldn't it choke or would it just queue up those read/hashing operations and go as fast as possible?

Thanks for the incite. I like to understand the guts of the whole process.

 
Posted : 26/03/2013 4:18 am
jhup
 jhup
(@jhup)
Posts: 1442
Noble Member
 

The cryptographic hash function block size (if it is block oriented) does not depend on the imaging block size.

 
Posted : 26/03/2013 5:41 am
Patrick4n6
(@patrick4n6)
Posts: 650
Honorable Member
 

Hashing on the fly means performing the hash at the same time as performing the forensic image or forensic copy. As compared to pre and post hashing, where you hash, copy, then verify the hash of the copy.

I almost always hash on the fly.

 
Posted : 26/03/2013 9:42 am
n00bcfe
(@n00bcfe)
Posts: 26
Eminent Member
Topic starter
 

Thanks Patrick4n6. That is helpful and makes sense.

I assume this is how we can get matching hashes for live images with FTK Imager. It is only hashing the stream of data being read (versus the before and after - which would have obviously changed in a live environment).

I'm a little confused by jhup's response. Perhaps, he can clarify.

The cryptographic hash function block size (if it is block oriented) does not depend on the imaging block size.

Maybe I'm misinterpreting his response, but I would think the imaging block size would play a role in the amount of data being hashed (at any moment in time). If my imaging block size is 1024 bytes, I am only reading 1024 bytes at one time. Therefore, I would assume the hashing function being applied would only be able to hash that chunk of data.

I'd like your help breaking down an example, which may help me understand this better. Let's assume the imaging block size is 512 bytes. My understanding would be that we are only reading, hashing, and writing one sector at a time. Can you break down (step by step) how that sector is read and hashed with on-the-fly hashing?

 
Posted : 26/03/2013 7:16 pm
jhup
 jhup
(@jhup)
Posts: 1442
Noble Member
 

[…]Let's assume the imaging block size is 512 bytes. My understanding would be that we are only reading, hashing, and writing one sector at a time. Can you break down (step by step) how that sector is read and hashed with on-the-fly hashing?

Let's use HAVAL hash as an example, since it has one of the longest block size for calculations at 1,024 bits.
Here is a quick pseudo code for on the fly hashing, and a very rudimentary explanation how read bytes are independent from calculated bytes.

You must understand that the hash algos (MD, SHA, Tiger, BlowFish, Lucifer, IDEA, RIPEMD, etc) are block oriented (block cipher) [versus stream ciphers like MUGI, Rabbit, A5, etc], but they can queue up blocks to be processed. Most can deal with partial blocks at the end by padding them. But, despite the block size, I do not have to pass the whole block to the processing engine at one shot. I can build a temporary queue, holding place or buffer until the calculator can catch up to the reading process.

Does this help?

Presuming you are using a block cipher based hash function -

BEGIN #Select_hash_function
$BLOCK_SIZE = 1,024 bits
// the block size used by the algorithm, for HAVAL 1,024 bits
$READ_SIZE = 4,096 bits
// or 512 bytes, the chunks we read from the target media
Initialize $HAVAL_BUFFER
//where we store partial chucks of data not yet processed
END Hash_function.

BEGIN #HAVAL_processor
// the mathematical engine which calculates the hash

Repeat
If $HAVAL_BUFFER >= $BLOCK_SIZE
then pop $BLOCK_SIZE amount of bits and "add" to calculation
Until $HAVAL_BUFFER is < $BLOCK_SIZE
If close processor, pad out remaining partial block to $BLOCK_SIZE
then pop $BLOCK_SIZE amount of bits and process them
Return HAVAL hash value
END HAVAL processor.

MAIN
#Select_hash_function
If not end of media
Read $READ_SIZE bytes (4,096 bits) from target media
pass read bits to $HAVAL_BUFFER
pass $HAVAL_BUFFER to #HAVAL_processor
Write $READ_SIZE bytes to destination media
end of media.

If HAVAL buffer not empty
$Final_HAVAL_hash = close processor #HAVAL_processor.

Display $Final_HAVAL_hash.

The end MAIN.

 
Posted : 26/03/2013 8:37 pm
TuckerHST
(@tuckerhst)
Posts: 172
Estimable Member
 

Maybe a simpler way of thinking about this is to recognize that disk reads and hashing are not serial events. Data acquisition is IO bound, so there is plenty of processing capacity to do other things in parallel. Further, modern OSs are multi-tasking and modern hardware has plenty of RAM to buffer data. So the buffer can be filled by whatever block size is efficient for the disk read process, while the hashing algorithm can get data from the buffer in whatever block size is efficient for its process. Thus, the block sizes are independent of each other.

 
Posted : 26/03/2013 9:13 pm
Share:
Share to...