The following transcript was generated by AI and may contain inaccuracies.
Paul Gullon-Scott: Hello and welcome to Forensic Focus Podcast. I’m your host, Paul Gullon-Scott, and today we’re diving into a conversation that places both innovation and investigator wellbeing at the heart of digital forensics. In this episode, we’re going to showcase the capabilities of Semantics 21, a platform that is transforming the landscape of digital forensics.
While cutting-edge technology often takes center stage, today we’re placing special emphasis on something just as critical—the protective features built into the platform that support the mental health and wellbeing of digital forensic investigators.
Joining me to showcase this is Tom Oldroyd, who is the Director of Strategy and Sales at Semantics 21. Together we’ll be discussing how the platform enhances efficiency, reduces unnecessary exposure to harmful material, and introduces forward-thinking tools designed with the welfare of investigators in mind. Whether you’re a practitioner, a manager, or simply interested in the evolving world of digital forensics, this is an episode you’re not going to want to miss.
Paul Gullon-Scott: Thanks for joining me, Tom.
Tom Oldroyd: Hi Paul, good to see you again. Thank you for having us on the podcast. We’re delighted to showcase what the team at Semantics 21 have been designing. I’m quite lucky because my background was in policing for 17 years, where I ran a digital forensic unit as a sergeant.
Prior to joining the police, I was always a geek and worked in the pharmaceuticals industry, so naturally going into policing, I fell back into digital forensics. The subject we’re talking about is digital crime, and sadly, most of that digital crime revolves around the protection of children and CSAM investigations.
I’ve had years of working with all the different forensic tools and I’ve been lucky to work with various companies. Now I’m the Director of Sales and Strategy for Semantics 21, a UK-based business with the sole mission to help law enforcement rescue as many kids as possible and design cutting-edge technology that does that job efficiently and cost-effectively.
We have a good understanding of the dangers that digital forensic investigators face doing their job. It’s probably one of the worst jobs in policing—the most challenging, but also one of the most rewarding, which is why I know so many people want to do it. But it does come with significant dangers, so we’re really glad to show what we have on offer.
Paul Gullon-Scott: Thanks for joining me, Tom. It’s fair to say that there is no other job like a digital forensic investigator’s role, is there? They are exposed to this material on a daily basis.
Tom Oldroyd: Definitely. Having worked in various specialized units within police, I remember when you worked on a firearms team, you had special priority bonuses of a thousand pounds. If you worked in a body recovery unit, you had another thousand pounds, and if you worked on a surveillance team, you had another thousand pounds. All those jobs had those bonuses for a reason—because they were difficult to do.
When I turned up in the digital forensics team dealing with CSAM investigations, I wondered where the bonus was, where the extra perks were for us having to do a job that I personally believe was far more damaging and dangerous than the other roles that were getting those extra payments.
I think we don’t respect digital forensics enough and the types of work that people do. I think it’s very taboo—historically, it’s been “keep those people that deal with computers out of the way and locked away in a cupboard.” But the job is now all digital, and sadly a lot of that involves CSAM or extremist content media review. It’s a tricky challenge to face today.
Paul Gullon-Scott: I think even right now, the mental health effects of working as a DFI and the dangers of working as a DFI are hugely underrated. Do you agree with that?
Tom Oldroyd: Yeah, definitely. I think it’s one of those areas where we’ve never had the statistics or data to back up the types of volumes of crime that people are having to review. We know there are backlogs, and it’s one of those things where we expect there to be a backlog of work within digital forensics.
That just means there’s never really downtime for those people doing the work—it’s job after job without that decompression or relaxing stage, without the time to take a break. That doesn’t really happen in the digital field.
Whereas if you look at football violence or other sorts of crime, which are very dangerous and hands-on, they do have times to take a break and step away from those tasks. We don’t see that in the digital world—we just see the demands getting more and more.
Paul Gullon-Scott: That’s backed up by research which has been done in the field, showing that the amount of grooming cases and sharing of indecent material has grown exponentially. I read a report not so long back which identified that the distribution of indecent material over the past four years has grown by over 300%. That’s a frightening statistic.
Tom Oldroyd: It is, and I think the scary thing is we’ve all probably thought we’d gotten over the hump—that we were actually seeing we were at the peak of demand with this type of crime. I don’t think we are. I think we’re still seeing the world become more digital, and we’re now seeing the introduction of AI, which allows people to do things a lot quicker and on a larger scale.
I don’t think we’re actually anywhere near the peak—I think those numbers are still going up and they’re alarming. I think we’re going to continue to see this trend even with the amount of effort the police are putting in today.
Paul Gullon-Scott: I agree. With the introduction of the internet and everything being so accessible, especially across boundaries and borders internationally, it just makes it so much easier for individuals with an interest in children to share that kind of material, doesn’t it?
Tom Oldroyd: It does. As a company, when we work with big tech firms, we’re quite lucky. When you work in law enforcement, you’re pretty much gatekept away from talking with people like Meta, Microsoft, and Google—they’re very nervous around dealing with law enforcement directly.
But having left and now working for the private sector, we have open relationships with those companies because they’re doing the right thing. They want to protect their networks and customers; they don’t want to have criminal activity on their platforms.
When we talk to them and see how they’re dealing with the scale, we see it’s increasing and how people are commercializing CSAM offenses because they know there’s revenue to be made from offending and attacking a child abroad. Someone’s making money out of that, and that’s where they’re exploiting these platforms.
The platforms are doing the best they can to chase the money to see who’s paying for this abuse to happen and obviously put their rescue missions in place. We normally find those countries aren’t as advanced as the western world, and obviously criminals exploit that.
Paul Gullon-Scott: So can we have a look at Semantics 21?
Tom Oldroyd: Of course we can. Let me show you what we’ve been busy working on over the last few years. Semantics 21 was really founded with the mission of taking artificial intelligence and empowering government departments with reviewing big-scale media.
As we’ve grown the company, new technologies have come on board, and we’ve seen new technology advances that we can take and benefit our users with. We’ve looked at what’s been done historically, and while we would never criticize any of the efforts anyone’s ever done previously, we want to advance and make things even better.
When we look at databases and when we look at PhotoDNA and how AI can be used for good, we need to use technology and we just can’t stay still. It’s really nice working with a company that’s got that ethical approach, where we can keep building solutions and work with big tech providers to help us.
As a small British business, there’s only so much we can afford to do, but when we ask for help from teams at Amazon or Google and say we could use some advice on how to process things more efficiently, because they know we’re here on a mission to rescue kids, we do get help from those big companies.
The main product we have is called S21 LASERi-X. The idea is that we can take all digital media from as many platforms as possible. If you’re using MSAB or Cellebrite to download mobile phones, we can take your UFDRs. We can take your GrayKey extractions, and if you’re using Axiom for computer forensics, you can do your computer analysis on those devices, export the media into this program, and we can utilize the AI to start doing that heavy lifting.
If you’ve got 5 million pictures to look at, which is probably an average-sized case these days, you really don’t want to be going from page to page—we need to apply intelligence so we can find the key evidence as fast as possible and reduce exposure levels.
One of the key things we find is that everyone internationally is doing the same type of investigations, so we know that databasing is the quickest way of finding indecent images—it always has been. The hash value of a file is really important, and in our software, we have the ability for you to create as many databases as you want.
They’re not encrypted, and that’s done on purpose so you can connect other forensic tools to what we use—MySQL or SQL as a backend database system. That allows you to add your Project VIC databases, your CAID, NITRA database, and all your own local databases run normally by your project or country.
When you’re bringing digital evidence in, we can do that comparison. Those databases have been a really good idea to have people collectively join those hashes together in one set. Now at Semantics 21, we realize it’s a good idea, but we should be doing something better because we know everyone globally is trying to tackle the same thing.
Although an indecent image may be created slightly differently abroad, it really makes no difference—we can still utilize that data. What Semantics 21 has created is the Global Alliance Database. It probably doesn’t get enough credit for how clever and intelligent the system is and how easy it is to use.
Simply, you would have your local database here—your MySQL database—and you would export to the Alliance. You agree to the terms and conditions, everyone has to have terms and conditions, you choose what data you want to share, provide an organization name and details, and then you have optional export options.
If your agency only wants to share the hash values and the labels that you’ve applied for your country, that’s fine. But if you are prepared to apply flags, metadata, GPS data, device make and model information, or notes, then that can also come into the database that we control.
It’s called the Global Alliance Database. You would generate an export—it takes a few seconds and is really quick. That would create an exported file which is encrypted and remains encrypted, so even when it’s transferred to us, it remains encrypted, and this is all offline.
You send us that file, we send you the master file back, and again, you agree to the terms and conditions. You then navigate to where that file’s kept—I can navigate to the Global Alliance file, click okay, and that’s now added 3.1 billion hashes that are community member hash values.
We’ve now got this deployed in numerous countries across the world. This is legally held content, this is CSAM material, this is AI hashes, this is revenge pornography flagged files. This allows us to create a truly global offline, end-to-end encrypted system.
When we talk about speed and efficiency databases, that’s what they’re very good at doing, and this is where we see that reduction in exposure levels because we know that when we’re looking at files that have been labeled internationally as legally held, they probably are legally held materials.
You still have to verify them as a human and double-check, but it means you’ve now got access to a database of 3.1 billion hashes—all law enforcement data, contribution-led, and it doesn’t cost anyone anything to access it. Our database is limited to our software at the moment, but we have reached out to the big forensic providers to say there’s going to be an API connectivity into the database.
Ultimately, we’re just doing this on behalf of the community so this will become tool-agnostic, because we know everyone’s on the same mission, but no one historically has joined all the dots together until now through Semantics 21.
Paul Gullon-Scott: Tom, can I just ask—you say there are 3.1 billion hashes stored in your international database. How does that compare to what’s held in CAID currently?
Tom Oldroyd: If we look at UK Home Office CAID, it’s probably one of the biggest databases available, and I think they stand at around 120 million. So we’re considerably bigger. As I say, we aren’t a company that’s criticizing or saying don’t use CAID—I’m not saying don’t use Project VIC.
What we’re saying is this is evolution, this is next generation, this is secure and encrypted. There’s no risk of losing a file that could have historically happened because back then when those databases were created, privacy concerns weren’t as tight as they are today, and obviously legislation wasn’t as good as it is today around data protection.
That’s why we had to evolve with time. When we look at the import of 3.1 billion hashes, it takes less than a second. If we look at CAID today, if we try to import the 120 million, or if we take Project VIC America which is about 90 million, it will normally take you 10, 12, 15 hours to ingest that data as a JSON file.
That’s not good—we don’t want investigators having to wait and waste that time having to import data. There are better ways of working today, so that’s where we’ve evolved to.
Paul Gullon-Scott: So Semantics offers a global database which contains 3.1 billion hashes and it’s time-saving when it comes to the import of those hash values.
Tom Oldroyd: Exactly. What that means is when we now bring in mobile phones or computer evidence or cloud data and we bring it into the software—no matter what the source is, doesn’t matter if it’s Cellebrite, Magnet, whoever you’ve downloaded the device with—we now add that media. Its first job is to compare against the database because it can check the hash values and see if they’ve been seen anywhere in the world before.
If they have, it’ll put them in a filter and say this file’s been seen in Australia and it was graded as a legally held image, or this image here has been seen before and it’s been graded by a Canadian investigator as a Category 1 image.
Category 1 in Canada would mean that it’s a CSAM image. Even though in America and Canada they normally have a very different category numbering system where Category 1 would be sexual abuse of a child, whereas in the UK we would say that could be an A, B, or C image because we break down the types of CSAM images differently.
All I would need to do is say this is a Level 1 in Canada, so I can look on the screen and say okay, in the UK that would be classed as a C and then categorize it as a C. But time to evidence is incredibly quick because we’re now utilizing the data and the efforts that somebody else has done to benefit our cases.
We’re seeing with big agencies across the world that normally when they add a mobile phone or computer today into the software, between 15-70% of the data is pre-labeled. We’ve actually had customers contact our support team to say they think the software’s broken because there aren’t many images on the grid—they’re not seeing the whole 300,000 pictures that they expected to see.
When we’ve said “just tell us what’s in the filters,” everything there is labeled because the database has been so efficient, they only have 50,000 files to actually review. The rest have been labeled already, and that is a huge time saver. This is where we say when we want to look at overworking and burning out our staff, that could be because they’re not working very efficiently because historically we haven’t had these sorts of setups.
Paul Gullon-Scott: So in theory, it not only expedites cases, but because of the vast hash database that Semantics provides—and you did say they provide that free, didn’t you?
Tom Oldroyd: Yeah, it’s completely free. All you have to do is contribute—you have to be part of the community. The reason being is when you contribute, you agree to the terms and conditions of the usage of the system just to make sure we comply with GDPR rules across Europe, and the system is also compatible with US and Canadian laws for data handling.
As I say, we can’t see inside the database, so it’s an end-to-end encrypted system. That’s not a “get out of jail free” system—it’s what’s internationally accepted now when we’re doing police-to-police data sharing.
Everyone that has access is from an approved country, and the database even has an expiry—after 90 days, the database will self-destruct. If we find that there are issues with a particular country or their political stance may change, it means that the database only has a lifespan of a short period of time.
Then all you would need to do is redownload the latest version again when it’s all privately controlled access and fully audited to make sure the database is going to the right people. It’s encrypted with the software so that people can’t break in and steal those hashes because we appreciate some of that data could be deemed sensitive.
It’s gone through its full audits—I’d probably say it is the most secure CSAM hash intelligence database in the world, and it’s also the largest.
Paul Gullon-Scott: So it not only expedites the case, but because of the vast hash database that it actually holds, it very quickly identifies and automatically categorizes the indecent imagery. From a psychological point of view, that reduces the exposure to DFIs immeasurably.
Tom Oldroyd: Yeah, definitely. I’ll show you a case and workflow that we would normally expect people to do for a small computer export, and we can prove exactly how fast this can run. We fill in normal case information just so we can track what the software is doing.
If the investigation isn’t CSAM, you just tell the software this isn’t a CSAM investigation. The reason we have that is some of our customers aren’t in the sex offenses—they might be tax office or border force—and they don’t want references to child abuse showing in their computers.
So we actually pivot the software to remove any reference to child abuse so that if you are dealing with an investigation that is non-child abuse, you are not going to get references to looking for indecent images or the CSAM detectors.
Even the design of the software takes into consideration the end user to make sure we’re building things as efficiently as we can. In here, we’re going to select an X-Ways export—we think X-Ways is a very good carver from all of our testing, so we always maximize evidence recovery.
We’ve got support for your UFDRs, your GrayKeys, zips, and there’s a processor where you ultimately just build up your evidence. If we have multiple exhibits—phones and computers—we put them into one product because we can then deduplicate and use those same databases.
You’re not having to review the iPhone in Cellebrite Physical Analyzer and then look at the iPad in Axiom where you’re probably seeing duplication of evidence, which means you’re getting double the amount of exposure. There is method in our madness for the way we work.
In this case, we would run the database—we simply say turn on the database. We can double-check to make sure which databases are enabled, and we could say this is a CSAM investigation from NCMEC. I could very simply navigate to my NCMEC cyber tip and say okay, and that’s now passed all the relevant data out of that NCMEC cyber tip.
That’s going to be one of the early searches that the system’s going to run against, so I’m not having to manually look for the data from the cyber tip that’s come from NCMEC—the system’s going to do that automatically for me. This reduces the demand on you as the investigator.
You’ve got Project Arachnid, which is the Canadian Centre for Child Protection. They provided us their asset database, so they obviously trawl the web looking for indecent material. They’ve worked with us and we now have access to their asset database—it’s encrypted and locked into the software.
We want to make things as efficient and quick as possible. We can go through these options for some of the AIs—color analysis, object detection. We work with the sex industry, the adult pornography industry, to look at the types of content that they’re seeing.
We’re looking at sex positions, sex toys, sex objects because when we have crossover child abuse with adult imagery, we need to know what they all look like and what sort of objects you may need to search for to speed up your analysis. We have a dedicated CSAM and adult porn detector, an AI designed by us where we’ve worked with international agencies to get representative samples of genuine CSAM that we can teach the AI to learn.
During the research, the team found very early on that CSAM in the UK actually looked one particular way, but when we worked with Latin American customers, we found that CSAM they were exposed to, or Asia Pacific region, was different. To teach an AI model purely on Western CSAM would introduce that bias, so when you’re trying to use an AI somewhere else in the world, or CSAM being shared across borders that we’re not used to seeing, the AI would have difficulty detecting it.
The team here—their expertise is AI engineering—can take all these things into consideration. By simply saying CSAM detector, we can see there’s a graphics card on my laptop, and this software works on just basic normal hardware—it doesn’t need to be anything that’s a rocket ship.
We’ve got junk filter technology so we can filter all that low-level risk material—things that may come from social media, videos that are very short that have probably been carved, little tiny one-second videos, anything that the operating system may have identified. Everything that’s low level, which we call junk.
We do that on purpose because sometimes we don’t need the AI to run on the junk, so we want to be cost-efficient with your energy and time. What happens on a Friday afternoon when you are working Monday to Friday in a digital forensics lab—you really shouldn’t be looking at CSAM unless it’s operationally required that you need to do it urgently.
What we would advise people to do is on that Friday afternoon, look at the junk. We know you’ve got to look at the low-level, non-risk material or low-risk material, so pivot the search instead of looking at CSAM instantly—sweep the search because then at least you can have a bit of decompression time.
You can get through the junk files and still grade them as legally held and get them out of the way, but it’s not something you’re going to see that awful video or picture that when you’re leaving work and going home, that’s the last thing you remember and you haven’t got your peers around you to speak to or have that safety net. It’s probably not the sort of matter you want to talk to your family about because it’s not something that normal people really need to know about.
Paul Gullon-Scott: What you’ve just said about you’re just about to leave work on a Friday and the images and videos that you’re normally exposed to aren’t something that you want to take home and think about over the weekend—that’s reflected quite strongly in research that’s been done which suggests that at the end of the day, DFIs shouldn’t be grading CSAM so they don’t actually take those images home with them.
Tom Oldroyd: Yeah, definitely. I think we’re starting to see that everything we do in our software we track, so the database—it’s optional for users to have, it’s not a performance managing system, it’s an exposure management system. We want investigators to be able to have that transparency of what they’re exposed to.
As a unit manager, having been there, I never knew when I had 50 CSAM cases and I divided them out to each of the staff—I never really knew who had a negative case or who had the world’s worst case. We talk about it, but I would never know the volumes of data.
When I came to Semantics, I was like we need to track that—we need to know what the exposure level is and what the numbers look like so that managers have good oversight. That’s what we do with the software.
Everything we’re doing here, adding the media and grading, gets tracked in the database. We have a separate application called the Wellbeing Monitor, which I’ll show you, where as a manager I can see every investigation, every exhibit, I can see who’s graded it, how long it’s taken them, and I can see the demand and the risk.
Like you say, it’s that risk that we’re now identifying to say Friday afternoon, don’t grade outside the operational hours of 9 to 5 because when people are grading late at night on their own without peers around, that could be quite damaging. You are concentrating too much on your work and you do need to sometimes just pull away from the screen a bit.
Paul Gullon-Scott: When you’re talking about cases, cases don’t mean individual items that have been seized—many cases contain multiple items. So the exposure potentially can be huge per case, can’t it?
Tom Oldroyd: It is, and that’s one of the things we’ve noticed when an organization or police forces have downloaded a phone using one product and then a computer with another. If we look at the iPhone, the Apple infrastructure—if you download my devices within the house, I’ve got two phones, three iPads (not that I’ve got lots of money, but I’ve got old iPads), I’ve got the mini iPad, I’ve got two Macs in the house.
You’re probably going to find exactly the same photos because photo sharing is on. If you downloaded them and looked at them in individual forensic products, you’re going to get that duplication.
Whereas bringing it in together, you’re still going to be looking at well over 150,000 files that are privately owned family photos, then you’re going to have all the internet cache. It’s normal to now be hitting cases into the millions, and we have had cases where people have had to come to us and use our software because they’ve used another product that couldn’t deal with the volumes of 20, 30, 40 million images.
For us, we’ve designed the software to deal with scale, and we’re blown away with the numbers of pictures that people are having to go through.
One of the features here on the screen is to set the threshold. This is something the UK government decided to bring in—when you graded to a certain threshold, it made no difference to the charging decision. When you had that large collection, the person wasn’t going to get any longer term in prison because they hit the top threshold.
This was added into the product to say we can do that calculation for you and show you how far you are through the grading to hit that percentage. Then the software will say you’ve hit your thresholds, you can stop.
Obviously, that comes with a big risk, and there’s a CARAT system here where you would carry out that risk assessment to say does the person have access to children, are they in a position of trust, do they have previous offenses? Really, I think we would say you can have your spider senses on here—if you think you’re coming across first-generation material, don’t do the thresholding.
But when we see that there’s just volume crime, as people would call it now, CSAM investigation, we built in the threshold system for that particular purpose just because of the sheer volume.
We’ll go over this very quickly, mindful of time, but we prioritize victim rescue—that’s the difference between our software and other solutions on the market. We can categorize, but we have a lot of clever capability to find those victims.
Obviously, we know with the scale and the numbers of crimes going up, there are going to be more victims out there that do need to be rescued. Very simple, human-readable answers and questions that we ask for—photos of your victims, your suspects, and the devices you expect to see, even the areas you think they may be offending and the type of offender they are.
When you’ve answered those questions, you click the go button and the software then takes all that intelligence you’ve applied to your case, and that’s how we rescue victims a lot quicker than we historically have. As I said before, this is just a very small case we’re showing you—it’s 42,000 files.
On my laptop, the import speed is about two and a half million pictures a minute, so it’s incredibly fast at bringing data in. In the background, you’ll see that the images are loaded onto the grid, so I can now already start to go through page after page, or if I wanted to look at the videos, I can even hover over the videos and see every frame of the videos because the software’s been designed to be quick and efficient.
We don’t want you to have to wait for delays. You can see we’ve compared to some of the databases already, so we’re identifying files that are legally held, low risk from the NSRL. We’re now calculating the PhotoDNA to look for files that are visually similar based on the PhotoDNA scores, and that’s how we can stack images on top of each other.
We’re not having to see all those images that are so similar on the grid. In the background, you can see the images are automatically being graded. This is a test demo case, but the database is fast—this is comparing not only against my small half-million files in my local MySQL database, but that is also now comparing against the 3.1 billion hashes that we have in the Global Alliance Database.
This is where we see a huge benefit, and this is where people really are excited by the development and the speed that the database is actually growing to allow us to find those files.
All I need to do is simply expand out the box here and we could see a breakdown of my Global Alliance scores. We added an NCMEC cyber tip, and these are the results from that NCMEC cyber tip. Time to evidence is far quicker than we’ve had before—this is where we can then say that we know these are CSAM materials.
We could then go and look in the same directory, we could look at the same time and dates to see that sex offenders have a type of OCD—we know they’re probably going to store data in the same location. We can get straight there and make that decision of this person needs to be interviewed now, they need to answer why this is on the machine.
Hopefully get that quick conviction so we’re not having to drag out an investigation. On the left, this is the grades—this is where we would now see instantly a breakdown. This is a test case, but this is where we would see a breakdown of those international grades.
If I wanted to only see legally held content that has been identified by Canadians, I simply put a tick in the box, and if I wanted to see who graded it, I click on the world icon. This will now give me a breakdown of who has seen that file and when.
We’ve got a popup here that tells me I’ve got some project hits as well. Instantly, within a few seconds, we’re finding the key evidence, we’re drawing the connections globally to where the image has previously been seen without really any stress and hardly a great deal of work.
My mindset is when I’m clicking on to show me Category A, B, and Cs, I know what I’m going to see. Yes, the database may have errors—we fully understand that humans make mistakes. If you come across an error in the database, you just simply correct it and the database will update the next time around.
This is where we see big data can make a huge amount of difference. As we said before, Paul, the Friday afternoon when you only want to look at the legally held data, just simply click what you want to see and get yourself in the mindset that you are only looking at the data that you’re selecting from what everyone else has come across.
When you are adding those labels and grades on, it’s all been stored here in the wellbeing statistics so that we could see a breakdown of cases I’ve been involved in, my previous cases in the last 30 days or last 100 days, and that is what’s being tracked. That obviously allows us to add an extra level of security to the end user as well.
Paul Gullon-Scott: Can we see the wellbeing monitor again?
Tom Oldroyd: Yeah. In the bottom, we’ve got—this is where we want to always be transparent with people. There are a lot of safety mechanisms built into the solution, whether it be you may want to have limited distractions, so we can remove all the buttons from the screen—you just get the grids.
We can scroll up the grids if we need to, we can put auto scroll on. This is one of the features that really we probably don’t take into consideration enough, but just to be able to put things on like auto scroll—the stress of not having to go through page after page. We can now sit there with a cup of coffee, we can go through the images, and when we see the relevant image, we can then stop.
It’s little things like that we add into the product, but when we look at what we’re tracking, we can look at the statistics of the grades you’ve done—your As, Bs, and Cs. We also have the ability that we have play wellbeing videos after so many minutes of you grading, and it’s an intelligent counter, not just a counter that happens every 30 minutes.
We know when you’ve been grading material and we then say the recommended time is 30 minutes—take a break. As part of that process, we also say because we know you have been seeing indecent images, to cleanse your mind is to actually watch three separate videos of different subject matters.
That has been proven to help decompress and help you forget what you’ve been looking at because you’re concentrating on these videos—kind of like cleaning the RAM in a computer system is how we put it. There are overlays in terms of how you would want your images to be—would you want them to have a glass overlay, do you want images to be pixelated, would you want the border outline applying, would you like blocks, would you want edge so we only see the edge files as well.
We can also turn on grayscale mode, so if we only wanted to see images in grayscale, we could do that as well. It just means we’ve got a lot more flexibility in the security of the user.
The reason why we look at grayscale—we always tell people don’t grade with grayscale turned on, do that when you’re creating your reports—is because the human brain can detect things incredibly quickly without really fully understanding why. One of the things we always talk about is if we see someone with a large cut on their arm and lots of blood, if we see it in color, the hairs on our arms stick up and we realize that’s danger.
We see that same image in grayscale, and we don’t actually react to it in the same way. It’s the same with CSAM material—if you’re grading, you need to see it and you need to see it quickly so you can apply the label and understand the context of that picture. Is that a live victim?
But when you’ve already graded that and you are looking to create your reports, you can put privacy mode on so you don’t see any of the images—you just see the privacy mode message when you’re doing your reports—or just put grayscale mode on. That means you are not going to have that memory and recall of the picture as quickly.
Everything we’re doing here is in little bits to help you and protect you. When you are grading and doing all your work, you don’t really notice this, but everything’s getting saved back to the local database. The local database allows investigators or managers to have a separate application—this is what we call the Semantics 21 Wellbeing Monitor.
This would allow you as a team manager to track that exposure level of all your investigators. We could see a label breakdown, we can see label breakdown per user, we can see average scores per grading per hour. This data is in its infancy—we’ve got the data, we’re now trying to really work with some agencies to track what we can learn and where we can identify risk patterns and how you can use it.
This is where it really is useful—we don’t, this isn’t a performance managing software, that’s what we stress. This is there for agencies and customers to be able to say to their SIOs, to the senior management, “Look how much material the team are going through,” and maybe that’s there to justify additional equipment.
Maybe it’s there for justification of additional staff or for decompression days. I think police forces are realizing it’s important the job gets done, but if your staff aren’t prepared to do the work anymore, or you are starting to see a drop-off in performance because you are burning your staff out, or they’re leaving and you’ve now got to go and employ new staff and train them, that’s where there’s that hidden cost.
You’ve got to put the protection mechanisms in place there to protect your organization and your staff. I know Paul, we’ve spoken previously, and we suspect this will happen more—organizations will be fined for not protecting their staff. It’s going to happen more and more often.
Liability levels are going to be very high, and I think sadly, we’ll end up seeing that happen more often before people really do take a stand and realize that there is a real danger when we’re dealing with CSAM investigations to staff exposure.
We’ve never had the figures, and until now, this is where we can now start to create a breakdown per number of staff, the times and dates that they examine, which exhibits yield the highest number of images that contain CSAM and dangerous material. What we’re hoping to do is work with agencies to say can we get the investigator wellbeing monitors to start driving some safety mechanisms that organizations can put in place—stopping the grading of indecent images at high-risk times, stopping on the Friday afternoons when we start to see a dipping performance from staff.
Not to say “Hey, you’re not working fast enough,” but to actually look and say is there a problem there? Are we identifying a trend that this person’s probably overexposed and getting burnout? I know that’s where your passion is—I know you’ve put a lot of effort into that and will continue to do so—but this is the idea of having the data that we can start to make those informed decisions with.
Paul Gullon-Scott: I think the development that you guys have put into the wellbeing monitor is phenomenal. I haven’t seen another piece of software on the market which has put this amount of development into this side of things. Having that in the background where the managers can then very quickly search to see what level of exposure a DFI or a team of DFIs have had gives them an early indication of DFIs who may be becoming susceptible to the known stresses and do something about it at the earliest opportunity.
I really like that wellbeing monitor—I think it’s a fantastic development.
Tom Oldroyd: I think, like I say, we want to do more. We’re trying to prioritize how do we tease out more material that we can get from the end users—that performance feedback. We need their input on how they actually feel while they’re doing their investigations.
But also, this doesn’t need to be limited to our tool. One of the things we’ve always said as a company is we don’t like this idea of companies having a monopoly and closing their ecosystem. I think when we look at CSAM, we all have to put our big boy pants on and work nicely as a community and have the export and import capabilities across the whole infrastructure.
For us, when we look at the wellbeing side of it, yes, we’re collecting the data for our product when you are grading with our tool. I don’t see a problem that if you are using GrayKey or if you are using Cellebrite or any of the tools, why can they not actually provide us the data that they see when you are labeling with their product? Could that not feed into the database that we have?
Could we now actually start to have that cross-company acceptance that we all need to work together to protect the users? That’s where I’d like to see this go—I’d like to see all of the products feed into a solution that, I’ll be honest, we don’t make money off it. The Wellbeing Monitor we give to managers for free—it’s part of our offering because we know that the data for us is useful to identify at-risk patterns, and we’d love to see other vendors start saying people can categorize and grade with our solutions and we’re happy just to have a connection to your database.
There’s nothing valuable there, but it’s the exposure levels that hopefully managers can then identify from all of the products, not just ours.
Paul Gullon-Scott: I’m going to ask you the big question before we call it a night. What is the price point for Semantics 21?
Tom Oldroyd: We are open and fully transparent in our pricing. The price I normally advertise in dollars because the biggest market in the world is the US. S21 LASERi-X, the main product—the highest price you’ll ever pay is $2,000.
If you pay for that, you’ll get all the AI, you’ll get the databases, you get access to the Global Alliance Database, the wellbeing solution is provided within the solution. The data and the wellbeing monitor for your managers is included within that price as well.
You install the software, there’s no servers, there’s no cloud requirement—you just install it on a standard forensic machine and you’re good to go. Normally around about the £1,500 mark or $2,000 mark. But again, some site licenses are a lot cheaper.
We’ve just donated over a million—over this year we’ll have donated a million and a half in pounds in software to developing nations where obviously they either don’t have the finances or special project teams that we work with. Homeland Security in America for the Hero Project, where ex-military vets that are injured that have now gone into police to help rescue kids and they’re dealing with CSAM investigations—we’ve always said they will never pay.
They get our software completely free of charge, and they always will do. That’s because we like the mission—it’s people that are coming from horrible environments of war and they still want to help do a public service. It’s only right that we help them out as well.
We try and keep our prices as low as we possibly can. We are a business, we do have to operate and pay our salaries and keep the development team going. We are lucky—we do get a lot of help from big companies, big tech firms that see what we’re trying to do and they do try and help us out as well.
Links to academia is definitely one of the things that we’re really pleased with and happy to work with because they’ve got some very bright, talented minds there that we want to help, and they want to help us look at where do we go in the future.
Paul Gullon-Scott: Just before we go, one other question that’s just popped into my mind. If the users have a particular requirement for the software, how flexible are the developers around including that requirement?
Tom Oldroyd: Unbelievably. We rely on the feedback of users to say what do you see that you like, what technology trends are you seeing, how do the offenders operate, and let us know. For instance, QR codes—we were told by one of the tech firms that QR codes are being used to hide messages on online platforms.
We were like, that’s quite a clever, nifty way of trying to get around and obfuscate your conversations. We all know text messages are being read by the big platforms, but QR codes weren’t. Literally within minutes of telling the developers, “This is what bad guys are doing,” we had already a proof of concept of how to read and detect QR codes, to translate QR codes, and probably within a day it was in the product and available.
We have Swiss customers that have just come on board that said our labels don’t look like that anymore—we’ve changed our grading labels. Within half an hour, we had a version of the software with all their brand new labels so they could continue to use and grade using their label system.
I don’t think you will get that level of support or customization from many companies in the world within digital forensics, but we have to react as fast as we can. We’re always open to new ideas, new suggestions.
We’re just in the school badge lookup for the whole of the United States—we’re now 130,000 schools for all their logos and badges we’ve added in. We’re now getting other countries approaching us saying they love the idea and have the same problem. Would we be prepared to research all school badges for their country as well? Of course we will.
We’ll follow the demand, and we’ll add things in that customers need.
Paul Gullon-Scott: Actually, on the subject of the school badges system that Semantics has developed, for those who are interested, Forensic Focus has just released a great article specifically about that. If you go to the Forensic Focus website, you’ll be able to read all about that in depth.
Tom Oldroyd: One thing we missed off the article is if you are not one of our customers and you have a logo or badge that you are seeing as part of your investigation, you may use one of our competitive solutions—message us, tell us what the description of the badge is, or find an agency that has our software. They’ll do it for you, but just describe the badge to us and we’ll tell you which school it is.
This isn’t a lock-in where you have to buy our product. We’ve done the research, we’re sitting on it, and it literally takes our team a couple of minutes and we can help you locate a school badge even if you’re not one of our customers. Don’t be shy about approaching us even if you’re not one of our active customers—not a problem at all.
Paul Gullon-Scott: I know there are formal AI-generated or AI applications built into Semantics 21. Sadly, we haven’t got time to show them all tonight, but one of them that particularly pops into my mind is the AI application which also helps identify areas.
Tom Oldroyd: Yeah, so the location lookup is amazing. When I was first shown that technology, I was very skeptical—most cops are. I was like, “How the hell does that do that?” It’s completely offline where you would expect the system to be online.
If anyone’s seen it, it’s called AI Location Prediction—it’s a completely offline system, it’s part of S21 LASERi-X, the same program. You can either select an image in your case, or if you have an external image from social media, maybe outside in a field or a town center, and all you do is provide the image to the system and say “Where is it?”
The AI has five models—the five models then examine the content of the picture, not the EXIF data. We don’t need the EXIF data at all. If you think we’re cheating, take a screenshot and literally it will look at the detail of the picture and try and work out where it’s from.
As long as the five AI models are quite close together, it’s a strong probability it’s in that position. Last week in America, we had some American investigators do exactly the same thing as I would have done—prove it. They took a photograph inside the foyer of a hotel and said, “Okay, whereabouts is this?”
Literally, the AI had detected it was about a mile and a half down the road from where the actual picture was taken. The accuracy was amazing in the middle of Florida. This thing is designed for historically military intelligence type investigations, but now available to all domestic police.
There are cases where we have child abuse cases—where is the child in the UK or are they abroad in Europe somewhere? If we’ve got a series of pictures, we can start to build a picture of where that child potentially was being injured or being attacked. Or if you’ve got a suspect that’s just been stupid, who’s photographed themselves on social media and they’re on the run from the police, you can probably work out where they are and then go and get them.
It’s incredibly clever technology. We set the challenge to people—we don’t believe us, trial the software, give it a go. It’s just part of the package. We won’t tell you it’s guaranteed every time, but so far I think we are probably up to 60 cases now where we’ve had successes in locating suspects or victims that we’ve been made aware of.
The technology was released at the beginning of the year and we get customer feedback of “it worked, we were really impressed, it found the suspect, it found the victim,” which is great for us to hear.
Paul Gullon-Scott: That’s fantastic, Tom. Thanks very much for joining me tonight on the Forensic Focus podcast. The software is obviously available through Semantics and people can contact you how?
Tom Oldroyd: Yeah, so contact either sa***@*********21.com or to*@*********21.com. If I want to get bombarded now with lots of spam from people, or just go onto our website and obviously we can communicate with you straight away.
Any of our customers, we now offer our customers WhatsApp support, so it is just a text message and we communicate via text message. We’ve gone like with teenagers—we communicate on WhatsApp. Just to make life a little bit easier to deal with support portals, let’s get rid of that. Let’s make it frictionless and as easy as possible for everyone.
Paul Gullon-Scott: That’s fantastic, Tom. And long may the development of the wellbeing aspect of the software continue. Thanks very much for joining us tonight.
Tom Oldroyd: Yeah, lovely. Thanks for having us, Paul.
Paul Gullon-Scott: Thanks Tom.