Tiffanie: Hi, my name is Tiffanie Edwards, and I’m gonna be presenting the paper “BlockQuery: Toward Forensically Sound Cryptocurrency Investigation”.
So this is a little bit of background on the authors of this paper. Tyler Thomas is a primary author. He couldn’t be here today. He holds a bachelor’s and master’s degree from the university of New Haven and cybersecurity networks.
My name is Tiffanie again, and I have a bachelor’s degree in computer science from Southern Connecticut State University, and I’m currently working on my master’s in cybersecurity networks. And our mentor, Ibrahim Baggili, he leads our research team and he helped us come up with the idea for this paper.
Before I begin, I’d like to acknowledge the NSF for funding this research project. And I’d also like to say that any opinions, findings and conclusions are not views of the NSF.
So, I will be introducing this topic and talking about the motivation behind the topic, giving some background information and then going over what we actually did to create this BlockQuery framework, and then discuss the limitations that we found as well as future directions of this work.
So to begin, we decided that like every other forensic investigation, cryptocurrency investigations need to be forensically sound. And this includes completeness: given a public key or wallet address, all transactions conducted using the key wallet address are recovered. Integrity: the blockchain ledger being queried is identical to that which is currently accepted by the consensus network. And confidentiality: information regarding which transactions are relevant to the examination are not being unintentionally disclosed with any third party indexers or third party organizations.
We did find that the vast majority of publicly available tools that are currently on the market for querying cryptocurrency blockchains fail to meet one or more of the requirements that we have set forth here.
So, many tools take the form of websites, which query a server running on a blockchain indexer and is often managed by a third party. This violates the confidentiality of forensic examinations by unnecessarily disclosing which accounts are subject to ongoing investigations, which investigators would not want to happen.
Also without control of the indexing server, an examiner cannot guarantee the integrity of the query response, so the results may be inaccurate, incomplete or out of date due to program error or malice, and you can only confirm this by observing a local copy of the ledger.
So cryptocurrencies like Bitcoin have many address derivation schemes, and many of the publicly available tools we tested fail to account for these differences in address derivation, which left out a lot of forensic information that investigators would find important.
So, because of these reasons, our motivation was to create a forensically sound cryptocurrency lookup platform that must consist of a trusted full node running directly on the blockchain network. This will allow forensic examiners to guarantee the integrity of the data that they are searching through by maintaining a complete copy of the ledger that is updated in real time.
Also the full node would preserve the confidentiality of the investigation by eliminating the need for a third party to handle the queries. And then instead of broadcasting the specific wallet address and transaction IDs that you’re interested in, the full node would blend into the network and passively collect blocks as they are broadcasted.
Our specific contributions are providing a primary discussion on what it means for a cryptocurrency investigation to be forensically sound, presenting BlockQuery as an open source proof of concept blockchain query system for Bitcoin, and we also show that our approach is capable of detecting transactions generated by the hierarchical deterministic or HD wallets that many publicly available tools cannot find due to failures in their address derivation methods.
So, some background to go over are the HD wallets. They use extended keys to compartmentalize addresses under logical accounts. They are very valuable artifacts in any cryptocurrency investigation because you can use them to derive the wallet addresses that are associated with the keys.
Each account has an associated key pair and the accounts are organized in a hierarchy. At the lowest level in the hierarchy the keys are used to deterministically derive ephemeral wallet addresses which allow users to maintain relative anonymity across transactions by not reusing the addresses while also being able to maintain only one master key pair. The forensic examiner could then use the extended public keys that they have to de-anonymize portions of the subject’s transaction history for any wallet that they’re interested in in an investigation.
The Bitcoin blockchain uses three valid address representations: they have an xPub, yPub and a zPub. Each address type has a respective extended key representation that are used to derive addresses of that type. So, these extended key representations can easily be converted from one to another, and one can deterministically derive all possible wallet addresses, given any non hardened, extended public key representation.
When performing memory forensic analysis of the applications using all three extended public key representations, we found out that not everyone is able to…not everyone used all three representations, for example, they would only use the xPub because they didn’t know that there were three valid Bitcoin address representation which left out information.
So, given that most query tools do not account for the fact that wallet addresses can use all three valid Bitcoin address representations, it left out a lot of information that is relevant to any forensic investigation. So, to ensure the recovery of all relevant transactions, a forensically sound blockchain lookup service cannot assume that addresses should be derived using the same format in which their keys are represented: they should use all three address representation just to be sure they have all the information.
So, this slide just shows the xPub, yPub and zPub. And the example column shows their address representations. As you can see, they’re all in different formats. And we did find that most query tools only refer to the xPub when they are trying to derive wallet addresses from the xPub extended key.
So, this slide shows what our BlockQuery looks like. We have a Bitcoin node, which is a standard Bitcoin JSON-RPC API server that’s fully synced with the current state of the blockchain. For this, we chose Bitcoin D as our protocol implementation for BlockQuery, and we chose it because of its ease of use, customization and the integration into a variety of open source indexers.
An indexer is a service that processes and indexes the raw block data from the node for quick and easy querying. It was necessary to use an indexer because we chose Bitcoin D, and Bitcoin D does not provide an API call to retrieve the complete list of transactions given while address participated in.
For our indexer, we chose to use electrs. And it was designed with privacy in mind, which just makes it better for the forensic examinations, because it does not communicate with any third party services.
The web application is simply the user interface that is used for making queries and exploring and discovering transactions. This was a custom application that was built to accept user queries, compute address derivations and cash discovered transactions, and query electrs at the same time. And we also allow users to input their derivation depth.
So, this slide has our algorithm for the address derivation scheme. The public key (either the xPub, yPub or zPub) is taken as a parameter as well as the desired derivation depth.
Once the algorithm has that, a set of direct addresses are returned. Bitcoin used this Change addresses with their wallets, in an attempt to obfuscate outgoing transactions by generating new addresses on a separate derivation path. They do this because they have an internal and external key chain.
So, addresses derived along the external key chain are intended for receiving transactions and…sorry, addresses derived along the external key chain are used for receiving transactions, and addresses derived along the internal key chain are used at the Change addresses for other wallet functions that are not meant to be public.
So, for examiners to have a complete understanding of how Bitcoin moves in and out of the wallet over the course of transaction history, you need to derive addresses along both the internal and external key chain. And you can do that by computing all the addresses derived in the x, y and z Pub format.
So, after all the set of possible child addresses are generated, electrs is queried with each address to retrieve the associated transaction history. This is done by brute forcing all the possible 231 possible child addresses on the derivation path.
It’s necessary to brute force all possible child addresses because you want to make sure that all of the possibilities are found and are queried against the blockchain. So, our tool allows the user to specify their depth of each query. So if they do not want to go through all 231 possible child addresses, they don’t have to.
I also want to stress that hardened keys will stop the derivation. So, the public keys are non hardened keys, which is why they can be used in this algorithm and checked against a blockchain. At a certain level the keys and their address representation start to become hardened, and you would then need the private key in order to get any more information from that. So, this algorithm only works when you’re given the extended public keys.
Now I’m going to talk about our findings and our evaluation criteria. So, we surveyed publicly available Bitcoin lookup platforms that were currently out at the time of writing this paper. We only considered services that allowed users to search for transaction by the extended public keys, because that’s what BlockQuery was made to do.
Then we performed memory and file system forensics against each system to obtain forensic artifacts, including the extended public keys and the addresses. But then we also created a second wallet which allowed the users to manually set the index of the derivation path.
So, all of the platforms were assessed for forensic soundness. Bitcoin transactions were made using a Ledger Nano X cryptocurrency hardware wallet with a Ledger Live wallet software. I would also like to note that the BIP44 standard of Bitcoin defines the address gap for HD wallets at 20.
It’s possible to not use the standard of 20 for the address gap limit and change that number, and then you would then be able to hide addresses and transactions that associate with those wallet address in your blockchain. So, to account for that, we tested each platform (and including BlockQuery) with address gap limits of 100 instead of 20.
So, we were looking at each tool to determine if it was open source or not, if the tool queried a third party server and thereby compromised the confidentiality of the investigation, if the tool automatically converted the key to every possible representation to cover the entire address space, and if the tool allowed the user to manually address the address gap limit or derivation depth.
So, this was a table of our findings. We surveyed 7 platforms plus BlockQuery, so there’s 8 all together. From this table, we found that Ledger’s xPub scan utility was the only tool besides BlockQuery capable of finding all the transactions generated with extended public keys provided in the query, but it did lack confidentiality, which is very important for forensic soundness.
And it lacks confidentiality because it utilizes Ledger service to search the blockchain, it doesn’t do it locally like BlockQuery does. And there is an option to use a different tool to search the blockchain, but that still doesn’t prevent information disclosure to any third party source.
We also noted that the 7 platforms that we tested were not developed with forensics in mind, only BlockQuery was. So they had a lot of security issues, and 6 out of the 7 tools failed to automatically derive the segment addresses when provided an xPub extended key.
And several of them were not able to discover the segment transactions, even when provided all 3 xPub, yPub and zPub extended keys. So that left out a lot of forensic information. Our proof of concept was successfully able to discover all transactions while maintaining the confidentiality of the searches by not calling any third party APIs, and instead indexing the blockchain locally.
A limitation we did find in our process was that if you did need to compute all 231 possible addresses for a given extended key, it would require significant parallel computing power. At the time of writing this, the Bitcoin blockchain was approximately 350GB and the Ethereum blockchain was approaching 1TB.
So, we noted that smaller law enforcement agencies with limited resources may not be able to fully conduct cryptocurrency investigations with this platform. A solution that we did offer was outsourcing the responsibility to larger agencies that have the capabilities, or to trust in universities who can perform this step for them.
Some future work that…noted was that we had to create an extended key data set ourselves in order to conduct this investigation and experimentation. And we thought that if one was already developed, it would make it easier to develop more forensic tools that can help in cryptocurrency investigations.
We also noted that since this tool, BlockQuery, was created for Bitcoin and only tests on Bitcoin, that the tool should then be tested with other cryptocurrencies that implement address derivation schemes compatible with HD wallet.
And from this, we also noted that the other 7 tools that we looked at and other tools besides those 7, they don’t really do…they do the same thing that we do. They really only apply to Bitcoin and they don’t really apply to other cryptocurrencies, so we felt like that was missing in this research.
And to streamline the investigative process even further, we considered making BlockQuery a plugin integration for a certain software that already collects forensic artifacts, such as Autopsy or Volatility.
So, these are our emails. I’m open to questions now, and if I don’t get to your question, you can contact any of us with questions that you may have in the future.
Host: Thank you very much, Tiffanie. Do we have any questions in the room or online? Don’t see any on here. So I have 2 or 3: how long would it take to do actually the brute force computations? Did you actually run them, or was it only, like, “okay, we would have to do them and it’s, like, 231.”
Like, how complex is this computation that you actually run it and can, are we talking of days? Are we talking of months? Did you do any more testing, like on the actual run time?
Tiffanie: Yes. That was handled by Tyler and I’m not exactly sure the amount of time it took, but if I had to guess, I would guess definitely longer than a day, so maybe like a couple days. I’m not sure what system he used to do that part of the experiment, or if he had some tool that he was using to just make it go faster.
Host: Okay. And just for clarification: the confidentiality aspect that you raised as part of your criteria, this pretty much equals that they did not operate in a local copy, is that correct? Because they didn’t use, or these tools, do not use a local copy, but they use, like, online systems.
Tiffanie: Yeah, they use online platforms that store blockchain information, instead of having the blockchain locally on their system. So, that was the biggest difference when it came to confidentiality between our BlockQuery and the other platforms.
Host: Okay. Thank you. And I have one more: is this publicly available, or freely available? Is it, like, released on some kind of repository online? Or is this some proprietary software?
Tiffanie: Yes, it’s on GitHub. The link to the GitHub is in the paper itself, so you can find it there.
Host: Great. Are there any other questions in here? No, let me check online real quick. There’s nothing online then again. Thank you very much, Tiffanie.
Tiffanie: I do see a question.
Host: There is one online.
Tiffanie: We didn’t cross validate the results with any other tools, but when we used those 7 tools, we noted that forensic information was missing. There was a previous paper that conducted memory forensics and analysis on blockchain itself. So, when we did that paper, we noted that a lot of the other tools did not have forensics in mind and they might have been missing information.
So that’s why we created this one to gather as much information as possible, and we believe that we were able to get all the information that was necessary, especially since we created the extended key data set ourselves, so we knew what we were looking for.
Host: Okay. Oh, now we have…
Tiffanie: You’re welcome.
Audience member 1: …completely talking, let’s say, bullshit, but that’s the problem always with it. So, I wondered, can you do the calculation without knowing the seed? So, could you use it based on just the information on the blockchain itself, or do you need to have the wallet information as well?
Tiffanie: Do I need to have the what information?
Audience member 1: Well, my understanding of this hierarchical things is that you need to have to seed to be able to calculate all the other keys. And I think the seed itself is not stored on the blockchain, so can you do this calculation without having the seed?
Tiffanie: Well, you need at least one of the extended public keys. They’re public, so you should be able to access them on the blockchain, but you won’t have the private keys. So that’s why you can only do this calculation up to a certain level, because anything past that would need the private key.
So there will be some information missing, but anything that’s publicly available that you can get from…any transaction history that’s publicly available that you can get from knowing all the wallet addresses, you should be able to calculate them once you have at least one of the public keys. But it’s better to have all 3 public keys.
Host: Okay. I think then we got all questions. And again then, thank you, Tiffanie, for your talk today.
Tiffanie: You’re welcome.