by Arman Gungor
My last article was about using theĀ Content-Length header fieldĀ in email forensics. While the Content-Length header is very useful, it has a couple of major shortcomings:
- Most email messages do not have the Content-Length header field populated
- If the suspect is aware of this data point, the integer value in the Content-Length header field is very easy to modify to make it match the length of the manipulated email payload
Wouldnāt it be great if there was something more widely used and tamper-resistant? Enter DKIM.
What is DKIM?
DomainKeys Identified Mail (DKIM) is an internet standard that allows an entity to assert responsibility for a message in transit. The entity can be the organization of the author of the message, or a relay.
The signing entity hashes the body of the message and digitally signs it along with a subset of its header fields using its private key. The public key of the signing entity is published as a _domainkey DNS TXT Resource Record for the signerās domain. The recipient can then retrieve the signerās public key with a DNS query, and attempt to verify the digital signature to determine whether the signature is valid.
TL;DRĀ DKIM gives us the message body and header hashes on a silver platterādigitally signed by the transmitting domain!
Letās work through an example and manually verify the DKIM signature. Here is a sample message sent via Gmailās web interface:
As with many other forensic artifacts, verifying DKIM signatures requires that the suspect message be preserved in its original form. Yahoo and Gmail allow end users to download the raw message via their web interface. In this case, I usedĀ Forensic Email CollectorĀ to acquire the message in MIME format, which is identical to what we get directly from the providerās web interface. Iāve then removed a couple of lengthy header fields before taking the screenshot above for clarity.
Letās start by dissecting the DKIM-Signature header field.
DKIM-Signature
The DKIM-Signature header field contains the signature of the message as well as information about how that signature was computed. Interestingly, the DKIM-Signature header field being created or verified is itself included in the signature calculationāwith the exception of the value of its āb=ā tag.
The tags of the above DKIM-Signature are as follows:
v=Ā Indicates the version of the DKIM specification. You should expect to see the value ā1ā in this field as of this writing.
a=Ā The algorithm that was used to create the signature. In this case, it is RSA-SHA256.
c=Ā Indicates the canonicalization algorithms that were used for the header and the body. The canonicalization algorithm determines how the body and the header are prepared for hashingāespecially as it relates to tolerance for in-transit modification. We will discuss this further below.
In this case, ārelaxed/relaxedā indicates that the relaxed canonicalization algorithm was used for both the header and the body. A single value such as āc=relaxedā would have indicated that ārelaxedā was used for the header, and āsimpleā was used for the bodyāequivalent to āc=ārelaxed/simpleā.
d=Ā Indicates the domain claiming responsibility for transmitting the message. This is the domain whose DNS we query to get the public key. In this case, the domain is āgmail.comā.
s=Ā Indicates the selector for the domain. In this case, ās=20161025ā indicates that we can query the TXT record for 20161025._domainkey.gmail.com to get the public key.
h=Ā This tells us which header fields were included in the signature. In this case, the list is Mime-Version, From, Date, Message-ID, Subject, and To. We will use the same list of header fields when verifying the signature.
bh=Ā This is the hash of the body of the message after it was canonicalized, in Base64 form.
b=Ā The signature data in Base64 form.
Canonicalization
Letās go over the two canonicalization algorithmsĀ so that we can prepare the header and the body correctly for manual DKIM verification.
Simple Canonicalization
The simple algorithm tolerates almost no modification. For the header, the simple algorithm presents the header fields exactly as they are without changing their case, altering whitespace, etc. For the body, the simple algorithm removes any extra empty lines at the end of the message body.
Relaxed Canonicalization
The relaxed algorithm providers better tolerance for in-transit modification. For the header, the relaxed algorithm converts all header field names to lowercase (e.g., āSubject:ā -> āsubject:ā), unfolds all lines, converts all sequences of one or more whitespace characters to a single space,
removes all whitespace at the end of each header field value, and removes any whitespace before and after the colon that separates the header field name from the value (e.g., āsubject : testā -> āsubject:testā).Ā For the body, the relaxed algorithm removes all whitespace at line endings and replaces all whitespace within a line with a single space. Extra empty lines at the end of the message body are also removed.
You can find the authoritative documentation with all of the details here:Ā RFC 6376āCanonicalization.
DKIM Verification
Now that we know how to interpret the DKIM-Signature field and how to prepare the body and the header for hashing, we can attempt to verify the DKIM signature manually.
Step 1āBody Hash
The first step is to canonicalize the message body, hash it, and compare it to the value reported in the ābh=ā tag. A mismatch here means an instant failāwe neednāt proceed further.
The canonicalized version of the message body, using the relaxed algorithm, looks as follows:
I set my text editor up to show line breaks and spaces for the above screenshot. Note that the CRLF at the very end remains.
When we hash the above text using SHA-256 (based on the value of the āa=ā tag) and convert the result to Base64, we findĀ NuUVBkHAblnFrMSNaWdGtwpjr9poc3wM2sXMhd25sPE=. This matches the body hash that was included in the ābh=ā tag of the DKIM-Signature header field.
We can already see how powerful this is. The DKIM-Signature header field contains a hash of the message body, which we can verify ourselves very easily without even fetching the public key of the signing entity.
Step 2āSignerās Public Key
The next thing we should do is to query the signerās domain and fetch their public key. We will need theĀ d=gmail.comĀ andĀ s=20161025Ā values for this.
A good resource to use here is theĀ DKIM Record LookupĀ tool from MxToolbox. When we populate the domain name with āgmail.comā and the selector with ā20161025ā, we get the following key:
Step 3āCanonicalize The Message Header
In order to prepare the message header for verification, we need to choose the header fields indicated in the āh=ā tag in that order, add the DKIM-Signature header to that list (except for the contents of the āb=ā tag), and run it through the canonicalization algorithm (relaxed in this case).
Once the above steps are complete, the canonicalized message header looks as follows:
There are a few things to note here:
- The DKIM-Signature header field includes the body hash (the ābh=ā tag). So, although we are not including the body itself in the sign/verify process, we are including the hash of the canonicalized body.
- The header field names have been converted to lowercase, and whitespace has been adjusted according to the relaxed canonicalization algorithm.
- There is no CRLF character at the very end of the text, after the āb=ā tag.
- The value of the āb=ā tag is excluded. This makes sense because the signing entity had no way of knowing the value of the āb=ā tag (i.e., the signature data) until after the signature was calculated. So, the value of the āb=ā tag could not have been used in the calculation of the signature.
We now pass the canonicalized header above, the public key we obtained in step 2, and the signature provided in the āb=ā tag of the DKIM-Signature field to our signature verification function. I used the RSACryptoServiceProvider.VerifyData() method available in .Net for signature verification. You can use an equivalent in your programming language of choice.
The signature verification process determines the hash value in the signature using the public key and compares it to the hash value of the canonicalized message header. In this case, the two hashes match and the signature is verified.
Automation, Anyone?
Although it is great to know how to do so, verifying DKIM signatures manually can get tedious. You can use a number of open-source tools to add some automation to your DKIM verification workflow. If you use Perl, you can check outĀ Mail::DKIM::Verifier. If Python is more your thing,Ā dkimpyĀ is also a good optionābe mindful of how multiple DKIM-Signature headers are handled.
What Is The Forensic Relevance?
DKIM signatures give us some very powerful information to work withāthe cryptographic hash of the message body and a subset of the header fields, signed by the sending domain. Even if one non-whitespace character changes after the message was signed, the DKIM signature would fail verification. When forensically authenticating an email message, a valid DKIM-Signature header and a verified signature indicate that the message body and signed parts of the message header were not modified after the signing entity calculated the DKIM signature.
Could a suspect work around this? A few ways that come to mind are byĀ gaining access to the signing entityās private key to sign on their behalf, by manipulating the message body and/or header without changing their hashes (not currently possible for SHA-256), and by removing the DKIM signatures from the manipulated message. The latter is simple to do, but would also be fairly easy to detect when the suspect message is compared to other messages from the same sender within the same time period.
Another security consideration is the use of the āl=ā tag. This optional tag in the DKIM-Signature header field indicates the body length that was included in the cryptographic hash. Absence of this tag indicates that the entire body was hashed. If the āl=ā tag was used, the suspect could append fraudulent content beyond the hashed subset of the message body without failing DKIM verification. RFC 6376 has a section onĀ Security ConsiderationsĀ which is an interesting read.
Finally, it is important to note theĀ Authentication-Results:Ā header field Gmail inserted into the message upon receipt (seeĀ RFC 7601). This header field indicates, among other things, that DKIM verification was successful at the time Gmail received the message, and contains the first 8 bytes of the signature data in the āheader.b=ā tag. If the message was manipulated, this can be helpful in determining when the manipulation took place.
Conclusions
Forensic examiners should pay close attention to DKIM signatures when authenticating emails. Adding automated DKIM signature verification to your workflow would be a good starting point. If DKIM verification fails, it is important to know why. Was it because the signing entityās public key is no longer available in their DNS records, or did the body hashes or header hashes not match?
DKIM specification has quite a bit of detail, and most tools I have encountered do not appear to have implemented all aspects of the specification. When working with DKIM, it is important to know the details and be able to perform manual verification as needed, especially to cover edge cases.
References:
RFC 6376: DomainKeys Identified Mail (DKIM) Signatures ā https://tools.ietf.org/html/rfc6376
RFC 5863: DomainKeys Identified Mail (DKIM) Development, Deployment, and Operations ā https://tools.ietf.org/html/rfc5863
RFC 7601: Message Header Field for Indicating Message Authentication Status ā https://tools.ietf.org/html/rfc7601
About The Author
Arman Gungor, CCE, is a digital forensics and eDiscovery expert and the founder ofĀ Metaspike. He has over 21 yearsā computer and technology experience and has been appointed by courts as a neutral computer forensics expert as well as a neutral eDiscovery consultant.
Hi, I tried to decode this NuUVBkHAblnFrMSNaWdGtwpjr9poc3wM2sXMhd25sPE= and I cannot obtain the SHA256 code, only unprintable chars. I tried to hash (sha256) the body of a gmail e-mail and I cannot get the base64 encoding of the sha256, the lenght is different from all the base64 encoded text in “bh” tag in every mail. In every e-mail the lenght is 44 chars, if I do Sha256 of the body then base64 I get 88 chars len code. So, why if I decode your base64 bh tag value I cannot get a sha256 code? And why if I canonicalize my e-mail body then I do sha256 and then base64 it is not the value contained in bh tag?
Thanks