Jan Peter van Zandwijk discusses his research at DFRWS EU 2019.Hello, my name is Jen Peter van Zandwijk. I work with the Netherlands Forensic Institute. And in this half-hour I will tell you about some research that they did on the Health app on iOS, and then you can go to lunch.
I did this research not long ago, I worked together with my colleague Abdul Boztas, who helped me with a lot of the experiments.
And the most important thing that I want you to remember from this presentation is that the app registers both steps and distances, and we have found that the accuracy of the steps registered is much better than the accuracy of the distances registered.
So what are we talking about? The iPhone Health app is a system app that is shipped with the iPhone since iOS version 8. And if you look at it, then it’s this on with the heart. And if you open it then what you get to see is this menu. So it stores a lot of information, and it falls also in a broader category of all kinds of health-related applications which are available at iPhones but also at other kinds of telephones.
So the part we are interested in is in this area, it registers information about all kinds of daily activities we do. And it does so automatically. It also stores other things, even things like reproductive health — I don’t know what you want to register about that — but things from a forensic viewpoint that we are interested in: this daily activity.
When you click on that one, it opens, you see it stores three different kinds of things. It stores number of steps; it stores distances that you travel; and it stores also the number of flights, the number of stairs that you go up and down. So these kinds of things.
And if you for instance click on number of steps, then what you get to see is some detailed information. So you get aggregated number of steps over periods of days, in this case. So you can see on which day we did the experiments: must be here for this one. And if you click on one of these days, you even get more detailed information. So this is the number of steps, in time periods in the order of one minutes, that you get. And if you do that with distances as well, then you can also get the same detailed information.
Now of course this is, from a forensic perspective, very interesting information. But if you want to use it, then you just have some idea about the reliability of these kinds of information.
First, a quick word about the forensic use. Yesterday we also had an interesting workshop about probabilistic reasoning in forensics. For instance, you can try to make a probability statement about different distances. So in one case, you have three scenarios: one scenario in which the distance… [tech issues]
We have a scenario that the distance is 100 metres, or that the distance might be 250 metres, you can try to make a probability statement about the likelihood of the traces that you find in the telephone under the assumption that the distance was 100 metres, and also the traces that you find in the telephone under the assumption that the distance was 250 metres.
And another one is, if we also have several scenarios, things that might have happened in a case, then you can also make similarly a probability statement: how likely are the traces that you find in the Health app under the first scenario or the second scenario is true?
So those kinds of probability statements can be made if you have some kind of idea about the reliability of this information which is present on the iPhone.
And finally, of course, it also can be used as an indication that the user of the phone was walking, and didn’t die already. So it can also be used as an indication of physical user.
Now of course, we are not the first ones to think about it. There have been some — at least in the Netherlands — there have been at least two high-profile cases in which this iPhone app was used as evidence in a case. And also, as I noticed, in Germany there was a high-profile case in which data from the Health app was used. I nthat case, they used a number of things, because the question is whether somebody travelled up some stairs, this was a kind of rape/murder case in Germany.
So if you have been in the NEtherlands, you will recognise this one, it has also been a high-profile case. In that case, the suspect made some detailed statements about what had happened in a time period, so it could be matched against data from the Health app to see whether it matches or not. So that was one.
And finally, this was a case in America, it was not an iPhone Health app, but it was a fitbit, so another kind of health-related application. This was a wrist thing that you can wear, and i n this case it was also a murder case. And then the husband he had a murdered wife, and in that case he claimed that somebody raided the house and she was killed, but data from the Fitbit indicated that during the period that he said that his wife was killed, there were still some steps or distances registered, so that this proved his story.
OK. So the main question I will address in this presentation is about the reliability of the number of steps, and the distances which are registered in the iPhone. And our approach is two experiments, and then compare what we find in the iPhone to the ground truth. So we use fixed routes, so we know what the real distance is, and we compared it to what was found in the telephone. And we also memory recorded the steps — I will show how we did it — and then we compared it to the things that we found in the telephone itself.
So in that way, we get a feeling about reliability, and also the factors that influence the accuracy of the device.
We used three different telephones: an iPhone 6, 7 and 8. One subject. We had the telephones at different locations: so we put them in his trouser pockets; we put them in a backpack; we put it in a jacket pocket; and put it in the hand; and also put it in a low jacket pocket; and finally, we gave him one of those things that flight attendants also have, that enabled him to manually count the number of steps. And we also had the experimenter walk together with him, sop we also had a second manual count of the number of steps.
So this is what we did. Finally, we had… so this is a part of the premises of the NFI in the Netherlands. So we had a forensic archaeologist, he is very good, specialised equipment in order to determine the locations of certain points, so we had him trace out the route, so we exactly knew the location of these numbered points. And on the basis of that we found some walking routes.
We always started on the first point, and then we had a route of about one hundred metres which ended at a clear end mark, so you could always see that you ended at the same position. So we had one at about one hundred metres; we had one at about 250 metres; and finally, we had one about 450 metres. So we had three different distances that all the subjects had to walk.
So to sum up, I got a number of colleagues who were willing to participate. So we had five subjects walking with all these telephones. We tried two different walking speeds. First we tried normal walking, and then we tried some freely-chosen running paces, to see how that influences the accuracy of the device.
As I said, we had three different distances, to see if that was also a factor influencing. As I showed you, the telephones were carried at five different locations.
Now we wanted to have some good statistics, so for each combination of walking speed, distance and carrying location, we asked them to do the distance twice, to see whether the registrations actually matched to one another. So that was really quite some work! So in total the subjects, they walked 600 trials, which amounts to 130 km, so a lot of very good walking conditions. And we counted about 450,000 steps between them.
Now during the trials, we also recorded the start and the end time of one specific trial, so you can see it here. We recorded the start time and the end time, and also the number of steps that were recorded by the subject using the clicker. And we did that because the digital part of this research was limited, but nevertheless I was surprised there is some database on the device itself which stores all the information about the Health app. It says the Health app is secure but it;’s not secure, it’s just a SQLite database, which you can pull off using commercial tools, and then you can pull the data out using standard SQL commands.
And if you look in the database, this is the information we extract from the database, what you see is, you get the information about the start time of the registration and the end time of the registration, so this is the database ID. and you’ll get information about the number of steps, and the number of distances. But what you see is that it can happen that…
So these are the timestamps, and these are the number of steps registered and the number of distances registered. But what you see is for a trial, there can be several registrations responding to the same trial. So the only thing that we have to match afterwards — and that’s why we used this form which has the start time and the end time of the trial — then we have to match the registration from the database to the actual file.
So in this case, we have to compute a total number of these registrations, which are spread out over several points in the database. That is the only thing that we had to do offline. And then we got the total number of steps and the total distance, and we could compare it to the actual values that we measured ourselves.
OK. So concerning the number of steps, then, we get this graph. So the data for all the three telephones that we used; for all walking distances; for all walking speeds; and for all carrying locations together.
On this axis on this side, it is the number of manually measured steps, and this is the number of steps that are measured by the telephone. And the black line indicates when both are together, then all the data points will fall on the black line. So in this case you see there is a very strong correlation between the actual number of steps, and the number of steps which are registered by the telephone itself.
So we tried to quantify this a bit. We computed what is called the mean absolute percentage error, which sounds very complicated but it is only the percentage differentiation between the number of registered steps with respect to the true number of steps.
So for instance, if you take 100 steps and on the telephone are 110 steps, then the difference is 10 steps. You take the difference of the actual number of steps, so in that case the error would be 10%. That’s all. And then you average over all different trials, you average over all conditions, and then what we find is that this average error is about 2%.
So here for the three different telephones that were studied, it was always about 2%. So that’s good.
So to come back to the main message, which I showed you at the start of the presentation: on the basis of the data I hope you can now believe that the number of streps registered by the telephone are accurate.
Unfortunately, for the number of distances registered by the telephone, the situation is somewhat more complicated. Because in that case, it does depend on several other factors, like walking speed and the style of walking of the subject, for instance. In order to show that, I will show you some background.
In this case, this was the iOhpne 7 that we investigated in this case. The iPhone was worn in the trouser pocket. On this side here we have our five subjects, and there are two groups with two different distances that we had the subjects walk. And the green line indicates the true distance that they measured.
So walking is blue, which means the distance registered by the telephone when the subject is walking. And the red ones is the distance which is registered by the telephone when the subject is running the same distance. So in this case, you can clearly see that for all our subjects and for all our distances, the telephone registers a larger distance when the subject is running, compared to the distance when the subject is walking.
Now another example is this one. So this is the same telephone, but in this case the telephone was held in the hand while walking. So this is the same. So over here, the blue one is walking and the red one is running. So what you see in this case is that there’s one subject, when the subject is running in this case there’s an enormous larger distance which is registered by the telephone. We attributed this to the walking style, because this was somebody who used to do athletics, so when he was running, he was swinging his arms very much, and we can clearly see that that affects the distance registered by the telephone.
OK, so this summarises what I have just said. So we see there is a significant distance registered by the telephone, and sometimes the distance is larger when you are running, like the first example I showed you; and sometimes the distance is larger when you are walking. And also, we see that there can be large differences between subjects, which we attribute to the walking style of the subject. So if you walk in a different way, then you can also get different registrations.
Unfortunately, I only showed you some data from the iPhone 7, but for the other two telephones we got very similar results.
OK. We also counted — like for the number of steps — the average percent of accuracy of distances. And what we noticed is that the distances registered by the telephones are more often too low than too high, but there is a large discrepancy which can go up to 30-40% of the true value, which is significantly larger than the error that we found in the number of steps, which was about 2%, as I have just showed you.
OK. So we were really puzzled about this large spread in distances registered by the telephone, and we came up with some hypotheses which might explain the reason why these distances arte sometimes larger. For instance, we had it in the trouser pocket.
And our idea was, when the telephone performs a larger forwards-backwards motion, then for some reason the distance registered by the telephone is larger, which would explain, if you had the telephone in your trouser pocket, if you run then there is a larger forward-backward motion than if you are walking. And also, this subject who ran had a larger forward-backward motion. So that was one of the hypotheses that we liked.
And in order to test it, we had our subjects perform some additional experiments with the telephone in the hand. Initially we didn’t give any instructions about the way that you should keep your hands during walking, they could do it how they liked. But now we specifically asked them: “Go walking, and keep your hands as closely as possible, like this.” And after that we asked them: “Go, swing as much as possible, like this.” And then we compared data from those experiments to the ones in which there was no instruction given concerning arm movement.
And from that, we get this picture. So this is the same. So the subjects are here; the blue ones are the distances registered in walking, and the red ones are the distances registered in running. And the ones with the triangles are the single ones that we got when the subject was swinging his arms, like this. And the black dots are the distances which are registered when the subject was instructed to keep his arms as still as possible.
So you see in this case, that the distances cover the same trend. So if you are covering a larger distance, and you are keeping your arms as still as possible, then you get about the same or sometimes less distance. So we take this as another confirmation that indeed the forward-backward motion of the telephone is an important factor influencing the distance.
So this summarises. But of course we are the Netherlands Forensic Institute, so we did it carefully. There is only one additional trial that we had the subjects make, bt it can be taken as an indication. But it seems to point in the same direction, as I just said before.
So, to come back to our main message again: I hope I have convinced you a little bit that the number of steps are accurate up to about 2%, and now distance is registered by the telephone, they are influenced by a number of factors and they can deviate about 30-40%. But if you take that, that shouldn’t be a problem to use it for forensics, but you should be aware of the peculiarities of the Health app.
So if, as I showed you before at the beginning, if you have a scenario in which one distance is 100 metres and the other is 200 metres then, on the basis of this data, you can make some probability statements. But if the distance is 100 metres for the first scenario and 110 for the second one, then it’s not so large a difference, it will not be very meaningful.
OK. So once again, we’re the NFI. One very important point is that we investigated the situation if, when you are walking or running, how accurate are the steps that you get? So it doesn’t imply that if you get steps and distances registered on the telephone, it doesn’t need to imply that some walking or running has taken place; you can also have some other reasons why you get registered distances and steps in the telephone. For instance, if you’re driving a car over some traffic bumps you can also get some registration; if you shake your telephone, you can also get it. If you know there has been walking, then you can use this data in order to estimate the accuracy.
Now we really did our best to get accurate data, but it’s still a bit of a limited dataset. We had five subjects, and we also saw there were individual differences between the subjects which affects the accuracy of the distances, for instance. So you have to be a bit careful.
OK, so this was our main part of research. At the moment we are working with our forensic statistics group in order to see when we can develop some statistical models, so use our data to develop a statistical model, which can be used to estimate steps and distances on the basis of traces that you find in a telephone, and also get some estimate of the spread that you can expect in that case.
Of course, we are hoping to keep our dataset up to date, because continuously there are new version s of iOS and also new versions of iPhones coming to the market, so we really have to put some effort in to keep our dataset. And also we will soon start analysing a case in telephones that we haven’t analysed, so we need to do that all over again.
It will also be very interesting, too, to look at other kinds of health-related technology which contain other kinds of information. For instance we did not do any experiments with Apple Watch or with the Samsung or Android telephones or any third-party app. It might also be very interesting to have that.
But at least I hope I was able to convince you that this health-related app, or at least the Health app on the iPhone, has potential to be used as evidence in forensic casework.
Thank you very much.