Notifications

Clear all

Define "On-the-fly Hashing"

General (Technical, Procedural, Software, Hardware etc.)

Last Post by TuckerHST 13 years ago

6 Posts

4 Users

0 Reactions

2,839 Views

RSS

n00bcfe

(@n00bcfe)

Eminent Member

Joined: 16 years ago

Posts: 26

Topic starter 26/03/2013 5:18 am [#10433]

I'm trying to understand hashing on-the-fly. In this case, I'm referring to the hashing of the source drive. Can someone provide me a technical definition of hashing on-the-fly?

My understanding is that the amount of data being read (at any moment in time) is dependent on the set block size. In addition, the data read is being hashed at the same time (or..it is at least getting hashed before moving on to reading the next block of data from the source).

Please let me know if I have misinterpreted that.

Assuming I'm on the right track, let's say I did something stupid, like set a huge huge block size. This would mean I would be reading a large amount of data at once, and be trying to hash it at the same time. Wouldn't it choke or would it just queue up those read/hashing operations and go as fast as possible?

Thanks for the incite. I like to understand the guts of the whole process.

Quote

jhup

(@jhup)

Noble Member

Joined: 17 years ago

Posts: 1442

26/03/2013 6:41 am

The cryptographic hash function block size (if it is block oriented) does not depend on the imaging block size.

ReplyQuote

Patrick4n6

(@patrick4n6)

Honorable Member

Joined: 17 years ago

Posts: 650

26/03/2013 10:42 am

Hashing on the fly means performing the hash at the same time as performing the forensic image or forensic copy. As compared to pre and post hashing, where you hash, copy, then verify the hash of the copy.

I almost always hash on the fly.

ReplyQuote

n00bcfe

(@n00bcfe)

Eminent Member

Joined: 16 years ago

Posts: 26

Topic starter 26/03/2013 8:16 pm

Thanks Patrick4n6. That is helpful and makes sense.

I assume this is how we can get matching hashes for live images with FTK Imager. It is only hashing the stream of data being read (versus the before and after - which would have obviously changed in a live environment).

I'm a little confused by jhup's response. Perhaps, he can clarify.

The cryptographic hash function block size (if it is block oriented) does not depend on the imaging block size.

Maybe I'm misinterpreting his response, but I would think the imaging block size would play a role in the amount of data being hashed (at any moment in time). If my imaging block size is 1024 bytes, I am only reading 1024 bytes at one time. Therefore, I would assume the hashing function being applied would only be able to hash that chunk of data.

I'd like your help breaking down an example, which may help me understand this better. Let's assume the imaging block size is 512 bytes. My understanding would be that we are only reading, hashing, and writing one sector at a time. Can you break down (step by step) how that sector is read and hashed with on-the-fly hashing?

ReplyQuote

jhup

(@jhup)

Noble Member

Joined: 17 years ago

Posts: 1442

26/03/2013 9:37 pm

[…]Let's assume the imaging block size is 512 bytes. My understanding would be that we are only reading, hashing, and writing one sector at a time. Can you break down (step by step) how that sector is read and hashed with on-the-fly hashing?

Let's use HAVAL hash as an example, since it has one of the longest block size for calculations at 1,024 bits.
Here is a quick pseudo code for on the fly hashing, and a very rudimentary explanation how read bytes are independent from calculated bytes.

You must understand that the hash algos (MD, SHA, Tiger, BlowFish, Lucifer, IDEA, RIPEMD, etc) are block oriented (block cipher) [versus stream ciphers like MUGI, Rabbit, A5, etc], but they can queue up blocks to be processed. Most can deal with partial blocks at the end by padding them. But, despite the block size, I do not have to pass the whole block to the processing engine at one shot. I can build a temporary queue, holding place or buffer until the calculator can catch up to the reading process.

Does this help?

Presuming you are using a block cipher based hash function -
BEGIN #Select_hash_function $BLOCK_SIZE = 1,024 bits // the block size used by the algorithm, for HAVAL 1,024 bits $READ_SIZE = 4,096 bits // or 512 bytes, the chunks we read from the target media Initialize $HAVAL_BUFFER //where we store partial chucks of data not yet processed END Hash_function.


BEGIN #HAVAL_processor

// the mathematical engine which calculates the hash
Repeat

    If $HAVAL_BUFFER &gt;= $BLOCK_SIZE

        then  pop $BLOCK_SIZE amount of bits and "add" to calculation

Until $HAVAL_BUFFER is &lt; $BLOCK_SIZE

If close processor, pad out remaining partial block to $BLOCK_SIZE

        then  pop $BLOCK_SIZE amount of bits and process them

        Return HAVAL hash value

END HAVAL processor.
MAIN

#Select_hash_function

If not end of media

    Read $READ_SIZE bytes (4,096 bits) from target media

    pass read bits to $HAVAL_BUFFER

    pass $HAVAL_BUFFER to #HAVAL_processor

	Write $READ_SIZE bytes to destination media

end of media.
   If HAVAL buffer not empty

      $Final_HAVAL_hash = close processor #HAVAL_processor.
Display $Final_HAVAL_hash.

The end MAIN.

ReplyQuote

TuckerHST

(@tuckerhst)

Estimable Member

Joined: 16 years ago

Posts: 175

26/03/2013 10:13 pm

Maybe a simpler way of thinking about this is to recognize that disk reads and hashing are not serial events. Data acquisition is IO bound, so there is plenty of processing capacity to do other things in parallel. Further, modern OSs are multi-tasking and modern hardware has plenty of RAM to buffer data. So the buffer can be filled by whatever block size is efficient for the disk read process, while the hashing algorithm can get data from the buffer in whatever block size is efficient for its process. Thus, the block sizes are independent of each other.

ReplyQuote