Welcome to this presentation on how to evaluate the results from automated system in forensic science. I’m Timothy Bollé, a PhD student at the University of Lausanne, and Eoghan Casey is my supervisor.
Automated systems are, including a rule-based system and machine learning and deep learning systems, are more and more used in forensic activities and in digital forensic. And one of the questions that’s raised is how to critically evaluate the output of those automated system. Last year, we published a paper where we presented three level of evaluations in order to do this.
- The first one, the performance evaluation aims at being sure that the system is fit for purpose in regard of the question at hand and on the data.
- The second level, the understandability evaluation, aims at being sure that the user understand the meaning of the output.
- And finally, the forensic evaluation, aimed at drawing conclusions from the output.
In the extended abstract, we present some use case scenario and give some detail on how to perform those levels of evaluation, but here I will directly jump to the recommendations.
The first one is that transparency is important, but not only the transparency on how the system works, but also the transparency on how it was tested. Then we realized that users have different level of expertise, meaning that their understanding of a system will be subjective. But there is a need for communication between the developers of the system and between the users to be sure that the needs of the users can be addressed by the system.
Then we wanted to insist on the importance of the forensic evaluation, because it will help formalize the reasoning that will go from the output of the system to the conclusion. And if this reasoning is based on some assumptions, it will help checking those assumptions. At the end, formalizing the reasoning and checking assumptions will result in more robust conclusions.
Finally (and maybe this is the most important point of this presentation) is that currently the responsibility of these critical evaluation of outputs is on the user. If the user does not have a sufficient level of expertise, the consequence could be that this critical evaluation is not performed at all, or it’s performed incorrectly. And this could lead to conclusion that are not robust enough, that are easily challenged by decision-maker or lawyers etc. And in the long term, it could lead to a loss of trust and confidence in those automated systems.
Now, if the systems support those level of evaluation that we presented, it will take some of the responsibility from the user and put it on the system. And it will ensure that some critical evaluation is performed systematically and in the long term, it will lead to more robust conclusions and more robust decisions.
Thank you for your attention and feel free to contact us if you have any questions.