University of Adelaide's Dr. Matthew Sorell on Evidentiary Health Data at DFRWS-APAC 2022

Christa Miller: Welcome back to the Forensic Focus Podcast. I’m Christa Miller and this week Desi, Si and I are welcoming Dr. Matthew Sorell; senior lecturer at the University of Adelaide School of Electrical and Electronic Engineering in Australia. The university is hosting the upcoming Digital Forensics Research Workshop for the Asia-Pacific region and we’re excited for Matthew to be joining us to talk about it.

Matthew: Hey, g’day, Christa.

Christa: Welcome. We’re so happy to have you here.

Matthew: My pleasure.

Christa: Yeah. So, I think most of our listeners are probably familiar with DFRWS, and the Rodeo challenge in particular, I think that’s going to be a major topic of conversation today. If you could describe for us what it is, what the workshop is, and what’s involved with the workshop and the Rodeo?

Matthew: Sure. So, health data is becoming increasingly important now as a source of evidence. And it’s an interesting space because in the digital forensic space, we typically think about digital forensics being about the analysis of logs and files and RAM and effectively in that digital space, health data is really interesting because it brings our physical world into the digital aura.

So, this is now sensors that are on your watch, that are on your phone, that are tracking your movements, that are tracking your behaviors, and bringing that into then a summarized log. It’s quite important then that we understand the limitations of that.

So to just give you one example, Apple Health does things like recording that you climbed a flight of stairs, but that’s not quite what it does. Actually, what it says is it uses the barometer to say, “You’ve climbed about 10 feet and you’ve walked about 18 steps. And so therefore I think that you’ve climbed a flight of stairs.” Well, if you are walking through a pressure door into an air-conditioned building you can trigger a false alarm. So it’s important to understand there are limitations in how we can interpret that information.

So, the Rodeo this year is being set up by my PhD student, Luke Jennings. And he’s using my personal health data, which I started collecting in 2017. And I’ll tell you why that date’s important in a moment, but basically what I’ve got is five years of longitudinal data across a range of iPhones as I’ve upgraded over the years, across a range of Apple watches as I’ve upgraded over the years, and of course, big changes in iOS and watch OS along the way.

So that data set gives us a really rich source of diversity. We can see when Apple has fundamentally changed the way in which they’ve operated the Health app and so on. But it also highlights just some of the rare anomalies that we sometimes find in that data set.

So, the Rodeo is essentially a CTF based on that data set, and we will also bring some of the associated sets in because there are some things that you can get out of an export file that aren’t actually captured in that particular database. And there are also really strange anomalies in the exports, as well.

So, really there are two points here, right? One is to find things and then the second thing is to understand some of those limitations. So we are really excited about that. We’re writing at the moment, a bit of a mad scramble, but there’s some really interesting stuff.

Before we do that, we’re going to be running a workshop on the Wednesday afternoon, the 28th of September and we’re basically, for many of our participants, this might be the first time you’ve actually dived into that database. So, you know, we’re going to assume of course, that you can dive into an SQL database, but just understanding the structure of it and how you can interpret that and how that relates to other data, as well.

One of the really interesting data sets that we found in that health database is in fact a record of every phone and every version of iOS and every watch and every version of watch OS is listed, which is interesting because if you then correlate that with your health measurements, you get a very accurate feel for when you upgraded your phone, when you upgraded your operating system.

We also found that time zones are recorded there. And as far as I’m aware, that’s the only place where you get a recording of every time zone the phone records itself as being listed in. So recently when I was reviewing all of my travel, in fact, over the last five years, that was the best place to start because I could see; taking off in Adelaide, landing in Doha, landing in the UK, coming back, I could see all of those locations.

And what’s really interesting about that is that it also works as you SIM swap as you move, you know, if you find a pre-paid SIM to put into your phone, all of that information is captured. So it’s a potentially a very rich source of evidence that we really haven’t tapped on. So it’s going to be fun.

Desi: I was just going to say, it sounds really interesting. It was actually the first time I’ve ever heard what that acronym was when we’ve set it out for DFRWS. It’s definitely something that is new for me. Like I’ve, I’ve been in, I guess, cyber security for a while, but I have never heard of this conference until I started talking to Christa in that. So it’s something that seems really cool, but how long has it been around in APAC for?

Matthew: So this is actually the second time we found it here in the Asia-Pacific region. So the conference has been around for quite a while. We were going to be running the first 2020 conference alongside the International Association of Forensic Sciences in Sydney, and everything was set up to go, and then of course it got deferred and then it got canceled.

So we ended up running the first conference virtually, the 2020 conference in January 2021, all online. It ended up being a good conference, but wasn’t a great conference because I think despite the fact we’re talking on Zoom right now, we’re all a bit sick of Zoom.

So I’m really, really looking forward to seeing people for the, you know, for the first time in a long time in person. The conference, we expect to have around about 50-60 people in person here at the University of Adelaide. But we’re also going to have 200-250 maybe more online, so that’s going to change the vibe, and I’m really looking forward to that.

Desi: No, that sounds really cool. And so the people that are online will still get a chance to do the workshop on the Wednesday afternoon? That’ll be run virtually?

Matthew: Absolutely. So funnily enough, as a university, we kind of got used to teaching online and also teaching hybrid. So we’re pretty good at that now.

So yeah, this is certainly going to be accessible for everybody. Those people who in person will be able to do this, you know, on their laptop in the room. Those of you at home will probably have a better setup to be able to do the Rodeo, but you don’t get the pizza. So, you know. Although, you know, you can supply your own beer, however you want to do that.

Si: So you, you said that there’s going to be a workshop, which, you know, sounds fantastic. And there’s a CTF. What’s the nature of the CTF? Is it figure out when Mike went for a jog or?

Matthew: So there’ll be a little bit. Now, just to be very clear, it’s my personal health data. And so there’s an ethics question here around the start, right? Which is, I can’t hand over a phone to a student and say, “Go and explore, go and find out where you went,” because not only does that inadvertently capture a lot of their personal information, it may potentially capture information that we don’t know about cause it’s just something hidden somewhere in the data set.

So I made an ethical decision in 2017 that in capturing the data that I needed to capture, I was just going to keep it and I was going to share it to the research community. So that’s basically what we are using. So it will literally be where things like, where did I travel? When did I jog? You know, when did I stop doing any exercise? And exploring that data set, right?

You know, we have a lot of tools for breaking into these databases and pulling them out. You know, there’s, you know, XRY XAMN, there’s Magnet AXIOM, Cellebrite, et cetera, et cetera. That’s great, but I’m a little bit concerned sometimes that they take a few shortcuts in what they’re interpreting and particularly in how they visualize that data.

So what got my interest in this particular data set in fact was a murder that occurred here in Adelaide in 2016. Now, you can go under Google and I’m actually not going to name the case, but it’s public enough. The victim was wearing an Apple Watch when she drove her car home, parked the car, turned off the ignition, Bluetooth disconnected, 45 seconds later walks 20 steps, then there’s a big flurry of activity and then seven minutes later, we see the last recorded heartbeat.

And this was the first example that we’re aware of, so late-2016, the first example that we are aware of, where we actually recorded the process of assault on the victim and the subsequent death of the victim on one of these devices that was admissible.

And one of the key reasons why it was admissible is because we got no further recordings, except that the phone kept logging every minute, basal energy. And it kept doing that until 1:00 AM and the police arrived on the scene at about 10:30 PM. So we’ve got continuity showing that the phone and the watch continued to work and communicate with each other.

So that was really, really important, but it also highlighted that nobody had thought about this. So, around about this time, 2017, I presented that information at an Interpol meeting and a week later Cellebrite and MSAB, XRY’s parent company, made sure that they were pulling out the health database into their acquisition software and started to do that interpretation.

So it’s nice to have that sort of impact. You know, it’s a tragic murder, but what we discovered in fact is this is a rich source. And up until that time, there hadn’t really been a serious push to analyze that health data from a criminal investigation perspective.

Si: So, forgive me for being slightly morbid because that’s the nature of the case, but one would assume that Apple has refined the way that they’re monitoring for movement to reflect things that we normally do, like jog, swim, walk up a flight upstairs, you know, the exceptions to the pressure that you mentioned.

How has it captured and reflected what would’ve been a set of movements one would suspect that are not normally part of one’s day-to-day routine? Were you able to draw any inference out of the way that it had recorded movement or was there just not enough granularity in there to do it? It just looked like…go on. Sorry.

Matthew: Yeah, no. So it’s an interesting question. In fact, it has changed quite a lot at the time and in fact, part of answering that question, cause it was very much front of mind was that Luke’s PhD work is focused on repeatable experiments where you put a watch or, you know, some other wearable device up to, you know, even a phone, onto a rig, and you know, we make it walk and we know how fast you walk and we know what the pace is and how long and how many steps there were, and then we look at the accuracy.

And this turns out to be a really, really hard problem because one of the things that happens in the more recent versions of health data is that we’ll count steps in 10-minute intervals. And to get that 10-minute interval to start, you basically have to have a gate that says, “Okay, you’re walking, I’m going to start a walking session.”

There’s usually about an eight to ten-second delay before that actually says, “Okay, you’re walking now,” but if you stop walking, but you are ambling, you might be just shifting around on your feet, but there’s enough walking-like movement, that may still count as steps.

And it’s really, really hard to create a repeatable experiment. And the reason, which is what you alluded to is, the watch is built to be fit for a particular purpose, which is an assumption that you are keeping track of your health over the day. So if it says you are walking, there’s usually an assumption that you are walking.

If there’s a distance measurement, that’s essentially the number of steps x the average stride length. That’s been updated with, I think iOS 13, where it now actually keeps track of your stride length during the day and that can vary a bit.

So, there are all sorts of changes. So, young Luke was up in Queensland with our collaborators and had spent two days collecting data and then the team decided to go out for lunch and just leave the system running, and before they did that, they just updated everything.

And of course they updated at one of those critical junctions where Apple had completely changed the system. So we had two days of one minute 30 seconds, 30 seconds of nothing, one minute 30 seconds of nothing. And you could see boom, boom, boom. They came back and all the data set was just, ah, 10-minute intervals. 10-minute intervals. 10-minute intervals. That was frustrating, but it was also very insightful. Because you don’t get great granularity with step counts.

Back with the original generation Apple Watch in late 2016, you did. So basal, as you said, this is an estimate of how much energy your body’s burning just to stay alive; you know, breathing, heartbeat and so on. So around about a calorie per minute, or four kilojoules per minute, if you want to work in propping units.

That data was recorded in one-minute intervals, and the result was that it absolutely dominated the data set. So that’s now 15-minute intervals and it’s not quite as dominating. But it’s still useful because it tells you since that’s recorded will in fact, calculate it off the watch, it tells you that the watch and the photo are talking to each other.

So there are all sorts of interesting inferences that you can make from this data, but you can’t necessarily say, “Well, you know, from this particular instance to that particular instance, you took this sort of stride.”

So yeah, it’s fascinating stuff. As it turned out with this particular person, and I’ve done this since with several other cases, you can actually start to see a pattern of life and a normal day. A normal day is not quite as obvious as you think though, because it’s not even a seven-day cycle.

But you can see the watch is, you know, off the wrist and being charged, you can see when somebody gets up, you can see when they’ve taken it off in the afternoon. And if that becomes a pattern, then you start to see anomalies in that pattern. Particularly when you’ve got suspicious events.

Desi: Have you seen a, like diving into the technical aspect of the watch, have you seen it be more accurate when you have a watch that enables an ESIM to be used with it versus one that’s wifi and Bluetooth-enabled that has to pair with a phone because it has that connectivity the whole time?

Matthew: We haven’t really examined that aspect in a great deal of detail, but one of the things that does happen as a result of being on an ESIM is that your health data off the watch can be uploaded to iCloud. So, if you enable iCloud in that mode, then rather than transferring straight from watch to phone, it now goes from watch to iCloud to phone.

So it’s not so much a question of accuracy as a question of latency back into the phone system. What’s really interesting of course, is that the Apple ecosystem for health data is very much contained on the phone and the watch. The ESIM-enabled phones as you know, are both connected to the cellular network and, and have GPS enabled, I’m not particularly concerned about the GPS on the watch at this stage. I want to talk about GPS in a minute in a different context.

What we do find is that all it really does is include iCloud then necessarily into the ecosystem. So, when you’re exercising and wearing a watch and not carrying a phone, of course, the data is buffered on your watch, but it could also be relayed via iCloud into the phone, which is useful if you go missing and we’ve just got your phone left behind. So there are some useful aspects to that.

What’s really interesting from my perspective with the Apple health ecosystem, is that by default it’s just on the phone and the watch. iCloud backup if you want it, certainly you can do an encrypted backup and, and recover it from there, as well. But it’s not being pushed into somebody else’s server, unless you want it to.

And if you compare that with something like Fitbit, Fitbit will log for a lot longer. You know, I think seven days of buffer, but when you upload, your phone is just a relay. It relays it into the cloud somewhere, and that comes a summary and all you see on your phone is a summary.

And I don’t know what Fitbit’s doing with my data, but I guarantee you that it’s probably on the edge of privacy, because they’re looking for patterns and are looking to improve their product. So that bit’s interesting.

Now, I do want to talk about GPS because, you know, whether the GPS is in your phone or it’s in your watch core location is not GPS. GPS is just one of the inputs that your phone uses for location, right? It’s if you happen to be outdoors and you happen to be able to synchronize to four or five satellites or more, then you might get a GPS location, which gives you an accuracy in the order of about 10 feet, or three meters if you want to use rural units. Let’s call that 10, 20, 30 meters.

Pretty much that’s what you get with the jitter variation that you find with a GPS or a GNSS system. And it could be Galileo of course, with these other systems as well, depending on the chip set and the version of the phone.

But Apple does a couple of other things, as well. If you are indoors, it will look for wifi beacons, if you are outdoors and it can’t get a lock on GPS, it may be able to use crowdsource data on wifi. You know, something like Wiggle and decide that you are halfway down the road, because somebody drove up your street, saw your wifi access point and decided that that’s a good location to say, “All right, I can see that particular wifi access point.” And failing that, it’ll just say, “Well, I can see this cell. So therefore I’m just going to place it on where I believe that cell connects to.”

That’s great. A lot of the tools will give you the latitude and longitude, but they won’t give you the estimated accuracy. And if they do give you the estimate, because the geolocation cache doesn’t tell you the source, it just gives you the answer. And one of the issues there is that once you hit 500 feet, 165 meters, it just says, “I don’t care anymore.” Right? “165 meters is close enough.” Right? “You’re there, or you’re more.” Right?

And you can see that because if you look at that data in an urban area, a conurbation area, houses are just down the street and most houses have wifi, and that’s accurate enough. If you’re in a semi-rural area, right, and you’ve got a wifi access point, but you can see it across the field, you might be 400, 500 meters away, and it just says “165 meters.” It’s a little bit crazy, right?

Because to understand that, you have to understand radio propagation, and that’s not in the usual toolkit for most digital forensic investigators. So, and for me it’s the stuff that I have been teaching for over 20 years, so that’s kind of in the bag.

So, it concerns me then when you’ve got, yeah, let’s name a product. Cellebrite goes and says, “It’s a cached database, therefore that must be a GPS fix.” No, by labeling it that you’ve made it really misleading and you’ve misled investigators because you haven’t incorporated broader knowledge of what that is.

Christa: So, I think the big question that’s coming up for me right now here you talk about this, is a lot of data and, you know, you’ve mentioned homicide and missing persons and, I mean, what are the main categories of cases that this is going to be applied to? Because, I mean, for, you know, general, I want to say, everyday cases, but I guess lesser crimes?

Si: The hypothesis that I suggested was, “Can we tell if somebody’s running away from the scene of a robbery?” Actually, I think so.

Christa: No, right, right, yeah.

Matthew: Okay, all right. So, the answer to your question is that, you know, have a look at mobile phone forensics from say 20 years ago, which was; pull out the SIM, which contains your address list, your phone contact list and your last 20 SMS messages, right? And you go, “Okay, great. Extracting the SIM is really useful. And the phone is basically just a shell.”

And now what have you got? You’ve got basically a battery-powered, very powerful handheld computer, full of sensors, all of which are being logged to some extent, and it’s a pattern of life recorder, and you’re just carrying it around, right? Now, for many cases, it is very straightforward because you’ve got text messages and photos talking about the drug deal. I don’t get involved in those cases. I don’t need to, that’s really straightforward.

And as a consequence, I have a bias, and the bias is that I only see the cases where law enforcement basically say, “We’re not quite sure what this means and how to deal with it.” So, on the one hand, that means that yes, I have a biased view. On the other hand, it means I get all the interesting cases.

And so that does tend to be in the major crime, the homicides, missing persons side. It is also in some of the organized crimes space. And just other weird and bizarre ones, you know, things like putting trackers on vehicles, for example, where you want to be able to show that our suspect at some point traveled with the tracker, right? And being able to correlate that through.

So, to give you another really weird anomaly, right? Mobile networks, old school, will tell you the start time and end time of a phone call. And along with that, it will identify the start cell and end cell. And in a plain old telephone system; GSM, 3G, voice communication, that’s pretty robust, cause it’s all captured in the one place.

But now what have we got? We’ve got voiceover LTE, and we’ve got data sessions that are not three-minute phone calls, but four-and-a-half-hour data sessions. And one of the things we realized was that in fact, what you’ve got is a start time for a data session and a time for data session and then you’ve got a cell location that says, “This is the first cell on this session where data was transferred. And this is the last cell during this session where data was transferred.”

And that’s not necessarily at the start time and the end time, right. And there are even bigger anomalies than that, which I won’t go into because they’re fascinating, but will take too long.

But being able to understand that, I’ve had a recent case, for example, and I know Desi you’re here in Adelaide, but it was across the Spencer Gulf, right? And so I had suspects in and around Port Wakefield, and then 50 kilometers away in Ardrossan on the other side of a body of water, I suddenly got this phone connecting and then breaking out again.

And you go, “That’s really weird.” unless you’re a radio engineer, in which case, you know, actually, that’s a really common occurrence when you’ve had a hot day and a cold night and you’ve got a temperature inversion and the radio signal just bounces up and down on the warm air, right? At which point you go, “Oh, we know this. That’s easy.” Now you’ve got to explain it to a court. So that’s the fun space.

So, I mean, I guess where I’m coming from is, I don’t come from a computer science background, I don’t come into digital forensics from a cybersecurity background. I have that background. You know, I’ve been programming since 1978, you know, my first computer, which is still in my office. It still works, it’s an Exidy Sorcerer, and I’ve still got the tapes and the tape deck. It’s just so much magic, right?

So don’t dump any Z80 machine code at me because I’ll probably be able to execute it in my head. It’s that sort of era. So I kind of have that background, but what I really have is the electronics background, the signal processing, the radio frequency engineering, the sensor background. And if you understand that really well, it brings a different dimension to the forensic investigation of digital evidence because it’s come from the real world.

Si: So what was it that actually drew you into, you know, Apple iWatches as an analysis, then? I mean, if you’re a radio specialist, geek, I was going to say, but I didn’t want to be offensive.

Matthew: ‘Geek’ is fine. ‘Geek’ is a badge of honor. Let me ask you a question, actually. So, I’ve been an academic now for 20 years. I came from a background first in radar, and then in telecommunications. I found radar bored me silly, telecommunications was the same toolkit with a slightly different question. Did a lot of commercial work, entered into academia and in about 2006 I was asked to comment on some digital photographs of a fairly unpleasant case.

And we realized there was no science behind examination of EXIF metadata, which was new; or digital cameras, which was new. So that sort of started a fascination in recognizing there was a gap here that wasn’t being addressed properly because it was coming from a perspective of computer science, when in fact we were dealing with sensors.

You know, so this is about the era of sensor pattern noise, Jessica Fridrich’s work in that space and Hany Farid getting into that space, which is terrific work, but there’s a certain level of naivety about it. I don’t say that unkindly, naivety about it because they don’t have a background in signal processing and statistical signal processing, which is necessary to work in that space.

So that’s started an interest in looking at devices, cameras in this case as evidence. One of the early things that I did was that I got all of my colleagues with their newfangled digital cameras to give me a whole bunch of photos. And I just wanted to strip the metadata out and see how different manufacturers implemented JPEG.

And one of the things I found was that two camera manufacturers had identical JPEG structures. They had the same quantitation matrices, they had the same structure. And I raised this with one of the manufacturers, as I said, “You realize that, you know, your competitors are doing the same thing as you,” and they went, “Oh, that’s very strange. We don’t understand why.”

And six months later the competitor actually admitted, “Yeah, we were so far behind the eight ball that we kind of stole some intellectual property. Sorry.”

So that was kind of interesting. I’m not sure that my work triggered that admission, but it certainly identified it before it was public. So that was kind of interesting, as well.

Fast forward to 2016, and South Australia Police came to the university and said, “So, we’ve got this problem. We’ve had this murder, we’ve identified a watch, we’ve got exported data from it, and we want to understand how it’s calibrated.” And one of the key issues here is that the data that had been exported was off by an hour. It was an hour out.

Now, it turns out, here’s another little secret from our workshop in a couple of weeks’ time, right? You’ve got the SQLite database, the health DB’s secure. All of the timestamps are recorded in Apple Cocoa Core, so they’re all UTC-based. So far so good. There’s a detailed list of time zones that the phone is working in. So far so good.

However, the app from the health app, if you do an export, export takes a shortcut and it says, “Whatever time zone you are in right now is the time zone for all of the data that I’m exporting.”

This particular murder happened on Friday, the 30th of September. Sunday morning, it was daylight savings. This data was extracted the next Tuesday. So, you know, recognizing why something might be, and then having, you know, so, the result of that is that I had this watch and phone switched off for a long time while we got another watch and phone and went, “Right, how does this behave?”

And that was really critical to do, so that when I did finally turn it on, and we discovered that actually there’s a whole bunch more data in there than the export file tells you.

So that was really interesting. However, one of the things that happens now, when you do an export is, will also, for example, show the route of an exercise routine. So if you’ve gone for a walk or a jog or a run around your area, that map will show up there.

We haven’t really found it inside the health data but it’s probably there, right? We do know where the start location is recorded in the health data. But we can’t easily pull a map out. It can be done, I guess, but we don’t need to, because now part of our routine would be, do a data export before you image, right? By doing that, you’ve actually got that data then easily accessible when you do your export, if it’s necessary.

That then raises a really interesting question around evidence handling, right? So if you think about conventional computer forensics, which is essentially, “All right, I’ve got a hard drive and I’ve got a write blocker and I’m going to make a bit image copy before I do anything, right?”

Now, a couple of problems with that: first is that if you’ve got a stack of two-terabyte drives, you don’t know which one to do first, and you spend all of your time, just basically imaging before you even start doing analysis. So some sort of triage is necessary.

You can’t do that with a phone. With the phone, you have to go in through the front door. Sometimes you have to break down the front door to go through the front door. So, you know, some sort of side loader. But still, you are requesting files out of the file system.

As a consequence, it’s a live unit. So what do you do at a crime scene? If you pick up an iPhone the accelerometers will trigger, right? You know, there’s a whole bunch of evidence handling here, which is necessarily going to add to the data set.

So a really interesting research question then is, well, what’s acceptable and what’s not, right? What sort of actions can you take at the crime scene that are visible so that you can actually say, “That layer of dirt there is triage at the crime scene and everything under here is still wholesome and uncorrupted.” And you have to think about it in that way.

Even if you’ve got the most perfectly trained crime scene examiners, right, just evidence handling there, mistakes will happen. But also deliberate things will happen. Or there’ll be a decision made that accessing this phone here will save a life, even if it’s at the cost of the evidence.

And so being able to quantify and qualify what that means, it’s a really, really uncomfortable conversation to have with classically-trained digital investigators, because you very much see it as, “Can’t corrupt the evidence, can’t corrupt the evidence.” Right. In actual fact, what you have to be able to do is contain the evidence.

Si: Are you involved in training first responders in that regard?

Matthew: Not so much the first responders at this stage. I very much focus my attention on analytics. So essentially, we’ve got the data in particular, I consider that the digital forensics technicians, the e-crime units that do the extraction, I have a kit, I can do extraction, right. But I’m a civilian. So that’s going to corrupt the chain of evidence and police procedures.

So I’d much rather the expertise be built and remain and stay active amongst the first responders. Certainly I can contribute to that discussion, and I’ve done that through a European project called FORMOBILE, which we’ve released as standard. I’m sure I can give you a link to that if you want to post that as part of the blog.

But where I do a lot of training now is in analytics. So what does this database mean? How do we interpret it? How do we use it? The way I run that at the university, there’s a university version, which is a little bit sensitive, but not really sensitive. And then there’s the version that I teach into law enforcement, where we can use much more recent data and have more of a quiet conversation.

So the student version, the way we teach that analytics is we start with a case using cellular network data, and we track movements of suspects. So we have a real case, we’ve sanitized the data and got approval to use it. Part of that then is the students have to write an expert witness report that’s of the standard that it can be used in court.

Now for technical people, that’s a real challenge, right? Because you have to step back and say, “This is what the evidence says.” as opposed to, “This is what I think happened.” right? So it’s hard. So we do that.

The second part of that is we had some fun. I handed out a bunch of phones to some students and some family members. And over a period of months, people had various conversations about. Cats as it turns out is a NSFW discussion, right?

And then there’s an incident as a result of which for some magic reason, all the phones are brought in an imaged. And in the process of analyzing that data, we discover that there’s a group of people selling, trading, growing catnip. Okay, that’s kind of interesting.

And then along the way we discover that someone who has a little bit of a puerile interest in ducks, right? So it’s a little bit of fun, but it’s also a serious message, right?

And when you’ve got a multidisciplinary team taking that course, you have different specializations coming to the fore as to what you look for. So some of that is about messages and images, some people are deep-diving into the health data, or in one case I had a student who knew a little bit more about jailbreaking iPhones, and since it was his iPhone, he decided to jailbreak it and go deeper, which was a great thing to do.

We also had a cheap feature phone that I bought at my local petrol station, which, and I’m not going to give you the brand, and the reason is it’s a great burner phone, because you can’t get in, you can’t break the Android front end. One of my students did find out how to do a backup and then did a bin walk, managed to extract the data.

Now, it took four or five days to do that, but it was great because no one had done it before. And a little bit of social engineering I believe associated with that too, cause he found somebody in New Zealand where that brand of phone is also sold and posed as a girl and chatted this particular hacker up and got the data. So, you know, we did some bit of social engineering, as well.

So what we do with that report though, is the students then get to present their evidence as a group in moot court. There is so much data there that they have to break it up, work as a team, answer different questions and then present the evidence.

And for university students, that’s enormously insightful to see this is what your work means, right? So what we’re now doing is we’re building a curriculum that is a little bit more than the conventional digital forensics, right? Because conventional digital forensics really comes down to very competent technical skills, right? Investigation of digital evidence requires much deeper analytical skills. And so being able to teach that is really hard to do, but really, really necessary.

You know, digital evidence load is now a significant 30-40% of the total forensic evidence that you’ll pick up at a crime scene; whether that’s phones, CCTV, wearable devices, vehicle, you know, engine management units, entertainment, head units, et cetera. You know, we leave so much data coming from our, you know, the aura, the digital aura that we have around us just left behind. And it tells us a lot and you know, for people who think, “Okay, that’s a bit Big Brother.” keep in mind that just as often, that will be exculpatory.

My favorite example of that in fact, was a parking ticket that I got last year from the city council who said, “You parked here for too long.” Right? I said, “But I’ve got a valid ticket.” And they said, “Yes, but you were here at 10:30 in the morning.” And I went, “Well, hang on, I’ve got my Google Maps, I’ve got my credit card receipts, and I’ve got CCTV at my home.”

Now, we haven’t mentioned this, but one of the little jobs that I have is that I’m the Honorary Consul of the Republic of Estonia here in Adelaide. So my house is a Consulate. So we have CCTV. It is very carefully maintained and I don’t muck about.

Now, it took 11 weeks for the council to admit that maybe they got the identity of the car wrong, but all of that work, you know, the hours of work that got me out of a $76 fine. So I was really, really pleased with that return on that investment.

Desi: In all honesty, the greatest achievement there is that you got the council to admit that they were wrong.

Matthew: Oh yeah, absolutely, yeah. And I managed to do it without outing them publicly.

Desi: Oh, damn!

Christa: So, I want to bring this all back to DFRWS. The tremendous amount of work and it sounds like quality here, how is that going into the workshop and the Rodeo in particular? Like, how is that, I guess what can participants of the Rodeo expect from the scenarios that you’re building?

Matthew: Okay. So, funnily enough, it’s not really a specific scenario. You’re not solving a case. What you’re really doing is diving into my pattern of life, right? So it’s a little bit different in that regard, right? Well, as part of a training for this, we’ll just give you a short burst of data that you can handle because it’s only going to, you know, my health database right now is I think around 570MB.

So, it’s a non-trivial escalated database to deal with. I mean, you can still deal with it on a desktop computer, so we’re not talking huge data, but it’s enough. So, part of that is finding little artifacts and then as we progress, we want you to start telling the story about what I do, right? And so that gets a little bit more involved, requires you to start cross-linking things and looking through and drawing some influences from that data.

But really, we want our participants to dive into that database and the associated databases linking into the health data and say, “Actually, there’s more in here if you care to look. You have to think about accuracy and so on.” So, the workshop really is an introduction to some of those tools then there’ll be a practice data set.

The Rodeo itself is a little bit more of a longer CTF. And of course, yeah, working across time zones will make that available for 24 hours so that, you know, people have a chance to play with it. And of course I’m making that data set available to the research community.

All I ask of people is that if you are going to use it and publish a paper about it, there’s one: acknowledge where it’s come from; and secondly, don’t publish my address, right? It’s in the data. It’s not that hard to find, but I don’t really want it, you know, in an IEEE transactions journal.

So, we’ve got four workshops happening: Claude Roux from Sydney is talking about the Sydney Declaration. So this is really about revisiting the science behind forensic science. And we want to address that in digital forensics, because it concerns me that it’s missing, right?

We put a lot of technical effort into things like ensuring integrity of files, you know, so signing a hash and processes for chain of evidence and write blocking and so on. Great. I totally respect that. We need to do that, but where’s the science that says, “When this log occurs, these are the circumstances under which it occurs, and these are the circumstances under which it might also occur,” right?

And that’s a really non-trivial space to think about, but we have to think about it because if we’re going to convict people on the basis of digital evidence, we have to put a hand on the heart and say, “We know this because we did this science.” as opposed to, “I reckon it’s this cause I’m an expert.” I don’t think that’s a good enough response.

So that’s the Sydney Declaration, Michael Cohen’s going to talk about Velociraptor, Harm van Beek from the Netherlands Forensics Institute’s going to talk about digital forensics as a service. So we’ve got a great lineup there.

Amongst our keynotes, we’ve got Harm again, but we’ve also got Yuval Yarom who’s a professor here at the University of Adelaide. He’s one of the guys that basically broke Intel a couple of years ago with side channel attacks. So he’s going to talk about how that’s come together.

So, some really, really interesting work there, a bunch of other papers, quite a diverse range of topics from, you know, cloud and network forensics through the computers and live RAM extraction and so on.

Through, you know, a couple of papers that I’ve produced with my young colleague, Richard Matthews, my PhD student, who took cameras into the seamless world. He’s done a whole bunch of stuff, scraping Snap Map.

And so, a really interesting topic for him was back in 2020, you might remember the riots in Minneapolis St. Paul. And what he did is he used his Snap Map scraper, and he just collated whatever video and images he could find and then he curated them and just published them on Twitter.

So he had a tool that just went grab it, right? We got an email from somebody living in Minneapolis-St. Paul saying, “Thank you, because at three o’clock in the morning, you were the only news source that told us whether it was safe to leave the house.”

So it’s nice to have that impact, right? And, you know, terrific achievement for young Richard to be able to develop that sort of tool. And yeah, and we’re automating that now so that we could actually scrape that. And of course, you stick it on Snap Map, it is public. So it’s actually quite ethical, not breaking really anything. It is open source, so, you know, really impressive little tool. So we’re going to talk about that, as well.

Desi: Nice.

Christa: Excellent.

Si: Fantastic.

Desi: For something that I’ve just learned what the acronym means, I’m very excited for all these talks. Don’t ask me what the acronym means, cause I’ve forgotten it already.

Matthew: No, it’s the Digital Forensics Research Workshop, but we just call it DFRWS. Because, you know, we’ve moved on. This conference is designed to bring academics and practitioners and law enforcement together and actually talk about real problems. It’s very focused on the real and that’s absolutely critical because that’s how we translate research into action.

Si: The one that we did in Oxford fairly, well, fairly recently, it was this year, which is nice. But yeah, you’re right. We had a really good mix of practitioners and law enforcement and academics and it was great to actually sit down at lunch and speak with different people from the Netherlands and the UK and various other academics.

And yeah, no, it was fascinating and it is a really good sort of group of people to get together. So I’m really excited for you and I’ll be looking up some of these things afterwards, depending upon what time zone I’m actually sitting in when it’s all going on.

Matthew: Fantastic. So it’s not too late to register, dfrws.org. Just look up the APAC 2022 Program. Desi, since you’re local, we may even see you there, right?

Desi: Definitely, yeah. I’m planning on being there. Yeah.

Matthew: Good, fantastic. So that’s going to be great.

Christa: And the sessions will be recorded as well. Sorry, Matthew, but I want to note that as well, that the, you know, because it’s a hybrid that the sessions are being recorded. So anybody that is unable to make it will still be able to view them online later.

Matthew: That’s right. So look, if you’re registered, you can either watch it live or you can watch it on playback because, you know, time zones. But it will be recorded and we will all make some of those recordings available after the fact, as well. So yeah, absolutely. Yeah, the time zone issue, always the tricky one and just, you know, to add insult to injury, South Australia is in a half-hour time zone.

Si: Oh, wow.

Matthew: That’s really unusual. It’s like, well, so is India, right? So it’s actually, you know, in terms of total world population, it’s not that unusual, but it is unusual because if you fly out of Adelaide and you are wearing an Apple watch, and it’s looking at your active, you know, number of stand hours, for example, and then you land in, say, I don’t know, Qatar, your watch will get very confused because it’s doing everything on the half hour, and then it does it on the hour. And then you end up with a day that has like 36 hours in it. And there’s weird, was weird stuff that happens.

Desi: Even within Australia, I reckon the weird one would be if you flew from South Australia, when daylight savings was turning off to Brisbane, which doesn’t have daylight savings and you flip the full hour over Brisbane, because we just shift either side of them because of that half-hour difference.

Matthew: Oh, I get this all the time, right? Part of my class, I have two whole lectures on, one of them’s called It’s About Time and the other one’s called The Order of Things. One of our networks in Australia, right, mobile networks operate off UTC, right? Absolutely. Of course it does, right? Except for one of our networks, which decided when they launched in Australia, that they were going to run everything off Eastern Standard Time.

Desi: Is that Vodafone? It feels like Vodafone.

Matthew: I couldn’t possibly comment. You may not be incorrect, right? The result of that is precisely that when you’ve got a data set that crosses a boundary of daylight savings, you’ve got to remember to add instead of subtract half an hour, right?

And, you know, it’s a subtlety, but you’ve got to know, right? And then sometimes of course what’s occurring is at 2:00 AM on a Sunday morning. And so then you’ve got to be really careful. Now, interestingly enough, all of that data comes in Excel spreadsheets, right? Do you know how to do calculations around daylight savings in an Excel spreadsheet?

Desi: Unfortunately, I do, cause I had to do it a lot. And I used to get spreadsheets all the time where it would also come in spreadsheets with no time zone indicated, but then it would be testing to figure out which time zone it was in.

Matthew: Yes.

Desi: Unfortunately I’ve dealt with the pain.

Matthew: I can’t, I can’t wait to have you as a student in my classroom. That’s exactly right. You know, I have, I have a big spit about Excel, right? Unfortunately it’s the least worst tool to use. The least worst, meaning that investigators who are not digital investigators have Windows set up on their desktop. So they’ve got Excel and the data, the only thing you look at it in is Excel.

Desi: Yeah. Excel is the greatest DFIR tool ever made, isn’t it? Microsoft didn’t even plan on making it.

Matthew: Well, what I love is that they deliberately introduced a time bug in its first inception so that it would be compatible with Lotus 123 back in the early eighties. So Lotus made an assumption that 1900 was a leap year.

Desi: Ah.

Matthew: And it wasn’t, so the dates are off by one.

Desi: Right.

Matthew: And Microsoft deliberately, now, try this at home, kids: get a list of numbers, 12-90, right, and then format that to be a date. And if that goes 0-90 then you’ll get some really, really weird results around about 59-60. I can see Si’s doing that right now.

Si: Yeah, I’m doing it right now.

Desi: I’m going to do it as soon as this is over.

Matthew: Right. Timestamps are an absolute bugbed for me, right? You know, if you dive into some of the iOS databases, you’ve got databases that have Cocoa Core and Unix Epoch timestamps, right? Now, they’re seconds away from their epoch. As everyone on this podcast knows, Unix Epoch: 1st of January, 1970; Apple Cocoa Core: 1st of January, 2001. And they’re in the same table. Different columns, same table. Pick one, guys. Pick one. Honestly.

Desi: They can’t even pick their charging cables for their phones. They’re not going to be able to pick the time stamps. That’s for sure.

Matthew: I love that the BDU has decided to go with USB-C. My advice to everybody is go and invest in the tools for picking rent out of USB-C cables, because it’s going to be a huge market. If you’ve got kids with a MacBook, you already know this problem because you’ve spent hours picking that lint out of the socket. Technologically, it’s great. Mechanically, not so much.

Christa: Well, I think we’re going to wrap it there. Matthew, thank you so much again for making the time joining us on the Forensic Focus Podcast today.

Si: Yeah. Thank you, Matthew.

Desi: Thanks, Matthew.

Matthew: My pleasure.

Christa: Thanks also to our listeners. You’ll be able to find this recording in transcription, along with more articles, information and forums at www.forensicfocus.com. Stay safe and well.

University of Adelaide’s Dr. Matthew Sorell on Evidentiary Health Data at DFRWS-APAC 2022

Get The Latest DFIR News

Leave a Comment Cancel reply

Forensic Focus

Accelerating Evidence Analysis: An Investigator’s Take On The Advantages Of Media Acquisition From Detego Global

The Differences Between Full Disk And Triage Acquisition

Digital Forensics Round-Up, May 15 2024

Investigating Video: The Vital First Steps

Forensic Focus Digest, May 10 2024

Oxygen Forensic® KeyDiver