
Discover expert strategies for cross-border investigations in Exterro’s on-demand webinar.
The following transcript was generated by AI and may contain inaccuracies.
Massimo Iuliani: Welcome everybody. Just a few words about me before starting. I’m not an engineer, but I studied mathematics and have a different theoretical background. I’ve worked extensively in image and video authentication topics. I worked with the University of Florence for several years on projects funded by the European Commission and DARPA, all related to authentication and reverse engineering of multimedia content.
Being in the university, I worked extensively in research and I’m co-author of several papers in peer-reviewed journals related to multimedia forensics in general. I also have experience as an expert witness in multimedia forensics in Italy. I think this was really useful for me because I was able to look at different points of view since this topic brings together scientific research and very technical issues, along with the need for people to understand and use these tools and explain them in an easy way.
Before starting, I’d like to explain why we’re making this webinar. The main issue is related to the spreading of these new technologies that allow people to create good manipulations, even if you’re not an expert. You can use text-to-image tools, ask for anything, create good manipulations, and edit specific pieces of images to create very realistic manipulations. On the other side, we really rely on our eyes and our senses to determine what is real and what is not.
Based on our experience and some tests we’ve been conducting, we’ve noticed that people are not very good at determining what is real and what is fake. Just to give you an example, here we have a couple of images. If you want to guess which is real and which is fake, you can try answering in the chat to share your thoughts. We can now create highly realistic images, but sometimes we can make confusion between what seems synthetic and what simply has high resolution or high detail.
Take 10 seconds to try to guess, and then we’ll see another example to build the case for why we need several tools available to determine what is real and what is fake. I see there are some answers in chat. One is real, one is fake, and don’t worry if you made a mistake because we don’t expect people to be able to guess correctly.
Let me give another example to have fun and understand how our brain works when trying to determine what is fake and what is real. We are Italian, so we like to eat good food and drink good coffee. We also need to take lots of pictures to invite you to drink and eat in our restaurants. Again, you can try to determine if they’re real or fake. I understand they look good.
We don’t really need to understand now if we are good at this or not, because these are just samples. But I think it’s good to understand how our brain works when we try to say, “Is this coffee real or is it fake?” How do we decide? Do we decide based on quality, details, resolution, contrast, or something else? As you can see, we can have very different opinions and we take time to decide. In this case, unfortunately, they are both fake. I’m sorry because they look very good.
These examples are just to show you that it’s very hard. We conducted a big test on this with 20 images, asking people—we don’t care if they are naive or extremely expert in multimedia forensics—to choose whether the image is real or not, just based on visual analysis. Here’s the result we achieved so far. As you can see, we have results from zero to 20 correct guesses, and the distribution shows that on average people get 10 out of 20, meaning half the results are correct. So it’s like flipping a coin.
This means that unfortunately, we are not very good at determining what is real and what is fake. If you think you can do something better, we would really appreciate if you can try the test and see if you can achieve better results. You can scan this QR code, or maybe Michelle can share the link in the chat. You can take the test—it’s completely anonymous and will only store the score. Of course, we’ll provide you with your score.
It’s good to understand how good we are, and the truth is that we are not good at this task. That’s why this is our starting point: we are not good at determining what is real and what is fake. This webinar will show some new techniques that can be used to authenticate images. Since this topic is very complex, we also need some background information, so we need to show at least the basics and principles of how we work in image authentication for different cases. Then we’ll open Amped Authenticate and see examples in practice.
Of course, we will not go into all the details. I will try to find the balance between explaining the basics and principles, but we cannot go too much into detail because there’s a full week-long course on these topics. We’ll try to find the balance. Furthermore, we will see how to combine different tools to provide lots of information related to the image lifecycle, and we’ll also see how to describe and properly interpret some results that we can achieve from our forensic analysis.
Please feel free—as Michelle was saying—if something is not clear, if you’re curious, or if you want to share your experience on some topics like this, you can use the chat. Don’t worry, I will try to answer at the end.
Massimo Iuliani: What is the main forensic principle when we take a look at an image? If we are naive, we just look at the image content—the visual content of the image. But if we are forensic experts, we need to remember that behind the image there is a huge amount of information. The image, first of all, is a file—a digital file that has a container. It’s a package of data, and this package contains data in a structured way depending on the type of file.
Furthermore, within this package that we can imagine like a box containing things, we need a list of information related to the data, and these are the metadata. It’s like textual information related to the image content. For instance, if we acquire with a specific camera, we expect to find in this textual information the brand, the model, the acquisition date, maybe GPS coordinates, technical specifications of the lens, the firmware—lots of information that can be used to analyze the image and check the consistency between this metadata and the image content.
Then we have some more advanced information called coding information, which is what our computer needs to decode the image and read it. Image pixels are generally not stored as raw data, but we encode them in a specific way. When we double-click with our mouse, the data are decoded based on the coding information to show the image. This coding information can be customized based on the brand, model, or specific software used, so they can leave huge and relevant traces to understand the image lifecycle.
Finally, last but not least, we have the pixels—not the decoded pixels, but the encoded pixels. So it’s a data stream that is encoded based on the coding information. We have lots of information available to analyze the image. So how can we use them when we have a malevolent user that tries to edit or synthetically create a digital image?
What happens is that all this data can be modified, can be partially created, or can be partially deleted. For instance, if you modify an image with software, some metadata will be changed. If you create an image with an AI-based system, some metadata can be created based on the system you’re using. If you manipulate an image with standard software or AI-based software, afterward you need to save the image again, so you are altering the coding properties and compression parameters of the image.
Furthermore, if you modify some specific portion of the image—for instance, if you want to remove a part or an object, or you want to modify a face—you are altering the pixel statistics. When you acquire an image, the process of acquisition introduces several correlations among pixels. When you modify a portion of the image, you are altering, removing, or adding some specific artifacts or patterns. You can add correlations, remove correlations, remove sensor patterns, and so on.
Furthermore, even if we exclude digitalization and focus on the digital domain, we have to remember that an image is a 2D representation of a 3D scene. When we acquire a 3D scene onto an image, we apply some projective transformations that are subject to geometrical rules. We expect that some shapes must satisfy some rules. Shadows must be consistent according to specific rules that link the light source with the cast shadows, reflections must be consistent, perspectives must be consistent. Similarly to the previous examples we made, we are not very good at determining if the perspective is correct or if the lighting is consistent.
So we have a big world of things that we can analyze. Rather than taking a look at all these things, we’ll start from some quick cases. We will touch on some advanced tools that we released in Amped Authenticate, together with other available tools that can be combined with them to provide much more reliable information and findings on the image.
Massimo Iuliani: Let’s start. Let’s suppose that this image is provided, and this image is supposed to show a famous person at a specific moment in time. We want to use this image as evidence that this specific person was there at that specific moment. Let’s try to analyze it and determine which tools we need to use in this case.
We’d like to understand if the image is native and reliable in content—meaning it hasn’t been modified. We want to check if the metadata are reliable, so we can verify, for instance, if the image is expected to be captured from a specific camera and we see in the metadata that we will find that specific camera. Then we expect that if the image is native, we expect a specific resolution, some specific metadata, creation date, modified date, compression schemes that must be consistent with the specific brand and model, and a specific file structure from that brand and manufacturer.
How can we analyze this? Of course, we cannot have the experience of knowing all these details for all models. In Amped Authenticate, we can exploit two tools. One is to use an internal database—we have more than 14,000 images belonging to different models and reference cameras that can be compared with the questioned images. We can also use external services to download images and filter those images based on some parameters of our questioned image, to find images that should have the same structure as the questioned image, and then we can make decisions.
Let’s try. We’ll open Amped Authenticate and load our first image. Let’s go and check immediately the metadata. I go to the metadata and I can see clear traces of expected original metadata. For instance, I can find specifications of the lens, which means this belongs to an iPhone 13, and we see that this image is full of metadata.
We can also check, for instance, if the modified date and the created date are the same. As you can see here, the modify date is the same as the create date. The image is supposed to belong to an iPhone 13, so metadata seems to be consistent with a native image. But let’s go deeper.
For instance, let’s take a look at the thumbnail. We know that when you save an image, within the image itself we save other small images that can be useful when, for instance, on our computer we need to see the preview of the image. So you don’t need to decode the whole image—you have these smaller images that can be decoded for the thumbnail preview. Let’s check the thumbnail of this image. If I check this thumbnail, I can see that the content of the thumbnail that is supposed to be the same as the image content is very different.
I think this is a huge problem for the integrity of this image. Maybe the only check I would do is to verify if this could be a bug of the software. Something I can do is download through—for instance, in Amped Authenticate we can download similar images from Flickr and Camera Forensics belonging to the same model to see if this may happen. I already did this and found a few of these images, so I can load them as reference.
I can see, for instance, that in this case I have a thumbnail, and the thumbnail of course has the same content. Then I can compare the metadata again, but I don’t find something really strange. There are a few differences that should be explained, but the problem is not clear. I can do something else. If I check the EXIF data, I also notice that we have GPS coordinates, so I can try to see where this image has been taken.
It’s easy with Amped Authenticate because I can go to tools and show image location on Google Maps. I can see that this image has been captured here—we are at Budapest, so we can go here with Street View. Now this suggests something to me because if I look at the thumbnail, it seems that the content is very similar.
Let’s open again, let’s open the thumbnail, and maybe I can remove this image and I can see that here I have a river, maybe here I have a bridge, and here some stone—I don’t know what, maybe a statue. If I check again, I can see that I have the river, the bridge, and possibly the statue. In this case, I verify that the GPS coordinates and the metadata are strongly consistent with the thumbnail, so this hidden content that is very different from the original content.
This result, in my opinion, at least based on my expertise, strongly supports the hypothesis that the image is not reliable at all. One explanation could be that we took an image, then we copied the metadata of another original image onto the questioned image, but by mistake we also copied the thumbnail. This would justify the fact that the GPS coordinates of the metadata are strongly consistent with the thumbnail rather than with the image content.
This is just an example of how we can combine several tools, even available tools, to determine if an image has been modified or not. In this case, we noticed that the model was compatible with some metadata that we found, we saw that the thumbnail content is different from the image content, and we verified that the GPS location was consistent with the thumbnail rather than with the image content. So we can say that the image is unreliable.
Massimo Iuliani: Let’s move to another case in which we try to combine available Amped Authenticate technologies with new interesting tools. Let’s say that now the question is very clear and very trendy: Is this image synthetic or is it authentic? How can we work? The dream of everybody is to have a tool where you can drag and drop the image and the tool will say, “Don’t worry, it’s synthetically generated. Period.” And you’re done. Unfortunately, it can’t work like that for several reasons.
But we can certainly use a deepfake detector to determine if there are some compatibilities in the low-level traces of the image. We can determine if we can find the footprint of some synthetic generation models. This is one key point, but there is also another point. We have reflections. Reflections are a very peculiar thing to find in an image, and based on what we were saying, this reflection must satisfy some geometric rules.
I’ll try to explain in one slide which is the principle. This is an image—an original image with an original reflection. What happens? Let’s take a look at this drawing here. We have an object that is reflected through a mirror. When you look at something through a mirror, it seems that the object is on the other side of the mirror at the same distance from the mirror. If you connect any point of the object with its mirror part, you will have lines that are all parallel because they are all orthogonal to the mirror.
When you acquire these parallel lines through your camera sensor, you have that the parallel lines will converge in the image. For those of you who are familiar with vanishing points, this is very trivial. For those who are not familiar, please think about when you drive on the street—the lines on the street are parallel in the 3D world, but if you look at them with your eyes or in your image, you will see that these lines will converge. It’s exactly the same principle.
So you have parallel lines in the 3D world, but you have a vanishing point—converging lines in the image. You expect that in this image, for instance, if you connect the ear of the girl with the reflected ear, the line connecting these two points will intersect the reflection vanishing point. This means that if I connect several points with their reflections, all these lines will intersect at the vanishing point.
This is in theory in the analog world, but since here we are in the digital world, we have that we can select a point up or down, and maybe different experts choose slightly different points. How can we solve this issue in the forensic world to guarantee repeatability? We say, “Let’s consider this hair, the edge of the hair here. Then we’d like to connect this point with the corresponding reflection point of the hair, but which one?” We select a wedge—an interval—and this determines a wedge in which we should find the reflection vanishing point.
If we do this with several points with their reflections and we intersect all the wedges, in the end we expect that the intersection of all the wedges will contain the reflection vanishing point. Yes, this image is real—sorry, I’m reading the question. It’s real, it’s just to show how it works. Indeed, in this real image, the reflection vanishing point is contained in this yellow region.
If we find a wedge that is incompatible with the intersection of all the other wedges, this means that the image is unreliable because it does not satisfy the projective requirements. A long explanation to try it on this image. Here we have a couple of tools: one very advanced tool for diffusion model detection, the second model-based for detecting the consistency of reflections in an image, and then we can combine these results.
Let’s open the second image. First, let’s open the diffusion model deepfake detector. This tool is designed to detect images that are compatible with any of the most famous diffusion models, which is the most important technology to create deepfakes. We use it, and as you can see, the system is trained on some of the most famous diffusion model-based tools for generating images like Stable Diffusion, DALL-E, Midjourney, and Flux.
Here you have a result of compatibility with any of these models, plus of course “not compatible with a diffusion model.” Since these tools are data-driven, we must remember that if a new tool comes on the market and the detector is not still trained on it, maybe it is not able to detect that specific model. In this case, we expect that the image falls in the “not compatible with diffusion model” category, meaning that the image can be AI-based with some unknown models or can be real.
But in this case, fortunately, we see a very strong compatibility with Midjourney—it’s 0.97. We shouldn’t read this like a probability—it doesn’t mean 100%. But we can say that the tool is strongly confident that we have a huge compatibility between the questioned image and Midjourney. This is a huge clue. We’ll try to combine this with the reflection analysis, so we can open the reflection analysis.
I already started it, but we can do it together. We can restart the project. How does it work? Let’s say that we have some points that we can connect. I would say the lips, maybe the corner here of the lip. Let’s say from here, we can connect this—I don’t know—here, or maybe here. Let’s create a wedge like this.
Now we know that the vanishing reflection point should be somewhere in this wedge. Let’s do the same with other points. Maybe we can go with the eyes and start from the corner here of the eye. We can link—I don’t know—here or here. There is a question: “Is there a certain numerical range that we can trust?” Not exactly. This technology allows you to decide which is the range. If you’re unsure, you can do like this and say, “With this, I’m sure.”
Maybe we can delete this. That’s why we have wedges. So now I would say something here. I can show how to make this line because now we can choose another point. Let’s use this connection point here. I’m looking in the middle of the mouth, let’s say here. This point is the reflection of this point, but which exact point? To avoid misinterpretation, it is good to decide a wider wedge to be sure that the corresponding point is contained within it.
Now, as you can see, the system became unfeasible. Furthermore, I can keep adding points. For instance, if I select this point here, this should be connected to some point here, at some point on the external side of the ear. So maybe between here and here, but the system is already unfeasible.
Until we have a couple of points like this, we can see the feasible region, meaning that we expect the reflection vanishing point to be somewhere here. When we put too many constraints, if the image is real, we expect to have the intersection; otherwise we don’t. In this case, we can see that the reflection is technically speaking inconsistent.
This is a huge clue because although we have the deepfake detector that is sure, or almost sure, that this image is synthetically generated, if I have to go to court, on one side we have the tool that is very effective because it gives you the answer. But the inconsistency of the reflection is extremely powerful because it is model-based, it is easily explainable, and on this side is a stronger weapon to say that the image is inconsistent because it’s not data-driven, but it’s model-based.
Very useful to see how we could combine these new tools—one that is focused on AI detection, the other one that is more general, analyzes the physical inconsistency, but is also model-based and has a strong background. This specific tool—we made lots of tests, a formal description of the reflection inconsistency analysis, and these results are published also in a paper that you can find in the slides where we also discovered that different synthetic generation models have different capabilities in generating consistent shadows and reflections. So the answer is that yes, it’s fake.
Just to summarize what we understood on this example: AI-based tools, like physical-based tools, cannot really provide a probability. It’s up to our expertise to combine the results and decide which is the weight in deciding the answer. Furthermore, especially for the data-driven tools, we have to remember that if you’re taking an image from the wild, from anywhere, we should consider that the AI-based tool—the data-driven tool in general—can lack some information.
We should take care in using these results in general, while the physical-based tool is extremely effective, independent of compression, independent of the image lifecycle. If you find any inconsistency, it’s a physical inconsistency—it’s model-based, so boom, you are done. Even if it’s more generic—you cannot say it is generated with Midjourney—different tools provide different information that can be combined for an overall result.
This is just a note that we’d like to share because it’s important to be informed when we use any data-driven tools. Remember that when you have numbers that describe a confidence or a compatibility that is the output of a tool, they cannot be generally linked to probability. If we have a tool that provides 0.99, it doesn’t mean that we have a probability almost one of something. The short answer is that we can never link this output to probability.
If you’re curious, we had a long webinar on this, so you can scan this QR code and take a look at it. It is generally recommended when we use any tools to use compatibility-related words rather than probability terms to avoid misleading results and sometimes also big mistakes. The tools work well—we have to work well also in the interpretation.
Massimo Iuliani: Let’s make another example. In this case, I would like to start from the ground truth, so we see how the manipulation is built, and then we see how to detect it. I think this is really interesting because it is good to see also how images can be created to create a really realistic manipulation.
Let’s use a text-to-image tool, which means that you have a prompt, you put a prompt and say “please imagine this,” and the tool provides you the image. I want to imagine a group of refugees facing the sun, and here’s the result. But now let’s say I want to modify the information within this image, so I’m not satisfied, and I add the prompt: “Please add a tent in the upper left corner of the image.” And the tool adds the tent.
I’m not satisfied yet, maybe with the identity of the person in the center, so I ask, “Please add a woman in the middle with a different dress.” And boom. Now this is the fake image. We don’t care exactly what we’d like to explain with this image. How can we analyze this? Of course, by using a deepfake detector, but also in this case I can notice that we have shadows. Here you can see these people with cast shadows, and also the woman has a cast shadow here.
Can we again combine deepfake detectors with shadow analysis? Let’s go and check this out. Again, I will try in one slide to explain how this works. The main principle is that if we have a light source and then we have objects, the cast shadows are built in a specific way. If you connect the light source with any point, the connecting line will also intersect the corresponding point of the cast shadow.
Why is this very useful? Because if you link a point with its corresponding point of the cast shadow, the line will also intersect the light source. If you do this with several points and their corresponding cast shadows, in the end you expect that their intersection in a real image will intersect at the light source.
This is a very trivial case in which we connect cast shadows with the corresponding points, and as you can see, we almost get the light source. This is useful similarly to reflection because it is explainable, it is model-based, and it’s also immune to compression because even if you have a very low resolution image or strongly compressed image, this is a geometrical property. This means that it still stands after the compression.
Let’s make an example of how it can be used to detect the fakes. For instance, if you have a real image, again, similarly to reflection, we’re not using lines, but we are using wedges to avoid misinterpretation or disagreement among experts. Here you can see that the region contains the light source.
Let’s take this Midjourney image. If we consider the people and their corresponding cast shadows, if you intersect the wedges, you can find an intersection. So in some way, Midjourney is creating a sort of consistency in this image between shadows. Fortunately, Midjourney still doesn’t know that the light source should be in this intersection, so in this case we can conclude that the image is not reliable.
In this case, I’m not opening Amped Authenticate to save a bit of time, but if you analyze the cast shadows—you create the wedges here and here—you will find that there is no region. Again, in this case we have two different tools: the physical-based tool that is saying the image is not reliable, plus the deepfake detection tool that is saying that the image is a deepfake.
Maybe we can check what Amped Authenticate says about this image. As you can see, the image is strongly compressed here. I think this is not the original image, but it’s a recompressed version. Let’s see if we are lucky, meaning that Amped Authenticate still finds traces of synthetically generated images. As you can see, it finds high compatibility with Flux.
So on one side we have a strong hint that the image is synthetically generated or manipulated, together with a model-based tool that is strongly effective in giving reliability to your results. I see a long question—maybe I will read it later. Is that okay for you? A lot of questions. Please give me some minutes so I will close and then we’ll try to answer them.
Since this webinar is more designed to understand how we can use tools and advanced tools, if you want, you can check on the blog for other tricks to spot deepfakes. You can scan this QR code because there are several bugs that can allow us to detect if an image is generated by a specific architecture because they have some specific defects.
Massimo Iuliani: Let’s make one last example that is related to checking integrity of an image. We have this image and we are asked to determine if the image is camera native or it has been processed in some way. Is the content reliable or not? Again, here I would like to show you another advanced but very generic tool.
We need another piece of theory that we’ll try to summarize in one slide. When you acquire an image—when you click on your smartphone—several things happen in your camera. The 3D world passes through the lenses, the optical filter, a color selection pattern. Then the image is digitalized, so we have sampling, we have interpolation of the colors, we have in-camera software processing, we have a specific compression pattern, we add specific metadata and so on. In the end we have our image.
Why is this important? Because all these pieces leave specific clues in the digital image that can be used to verify if the image is native or not. If the image is native and it has never been touched, we expect to find exactly those traces. If you modify the image in any way, you will apply some additional processing. For instance, you remove something, so you’re changing the statistics there, or you are resampling the image, or you are adding things, or you’re cloning things in the image, or you’re changing a face.
So you’re processing a part of the image, changing the statistics of the pixels, changing maybe the perspective if you’re changing the position of some objects. Then you need to save the image again, meaning that you’re applying further compression to the image. This means that you are partially deleting the traces of the original image and you are adding some specific clues that are related to your processing and to the manipulation itself.
During this process, we delete original traces and we add some other traces. This is extremely useful to check the image integrity, especially because if you manipulate an image and you compress it again, you have an image that has been subjected to compression because you acquired it, then manipulation, then compression. When you have a manipulation between two compressions, we don’t care if you use the most reliable and effective AI-based tools—there are some technologies that allow you to spot it independent of the technology.
Let’s check how it works. Let’s open Amped Authenticate and let’s load this last example. Of course we can try to visually analyze this image. I see here something strange, but I don’t rely on my eyes. First of all, we go to the metadata. First, the file format and Amped Authenticate quickly give me some hints on where to start.
The first thing to be noticed is that the create date is different from the modified date. This is immediately a hint that the image is not native, but we know metadata can be unreliable. But let’s start from here. Then we see that the image is expected to belong to a Google Pixel 7A. We have an unexpected compression scheme.
Let’s open the compression schemes. Here, Amped Authenticate provides, based on the database that we have available, which compression schemes are compatible with the one of the questioned image. We have thousands of images belonging to several models, and we find compatibility in the compression scheme between the questioned image and these models here.
As you can see, we can also find here a Google Pixel—a Google Pixel 4 XL. We don’t have in the database Google Pixel 7A yet, but it seems that the compression scheme is compatible with the Google Pixel. So the compression is compatible with the Google Pixel, but the image has been processed after the acquisition. What could be the reason?
Let’s take a look at the metadata. Of course, the analysis could be longer if you don’t know where to look. Let’s take a look at the metadata and—oh, here I see “Edited with Google AI.” So this is a hint that either within the camera or after the modification, the image has been processed in some way. We don’t know exactly what happened, but we have now two hints that the image has been processed.
Maybe it has been processed with Google AI, we don’t know, but if it has been processed after the acquisition, it means that the image has been compressed twice. If it has been compressed twice, we expect to find some anomalies. Of course, we would need the full course to understand which anomalies we should find, but I will show you and quickly explain.
When we look at an image in the frequency domain, we can see this kind of plot. When the image is compressed twice, we expect that this plot exposes peaks and valleys like here. As you can see, we have peaks, valleys, peaks, valleys. So it’s exactly what I expect when I see an image that has been compressed twice. This is another strong hint that the image has been compressed twice.
So we are building the story: the image has been acquired, then since the modified date is after the create date, it means that something happened, maybe with Google AI because we’re reading the metadata, and then the image has been compressed again. So we have two compressions, and this is confirmed by the analysis of this DCT plot.
If it has been manipulated in the middle, we can use a tool that localizes manipulation when an image has been compressed twice. Remember, I don’t care if you used AI or not, because this tool analyzes the inconsistency between the two compressions. It is robust to any AI technology in this sense. Let’s try to use it—it’s the ADQ JPEG tool.
Boom. Very clear. We can find a piece of the image with a compression scheme that is completely inconsistent with the rest of the image. If we swap—now, maybe now that I have the hint of the software, maybe I can go and visually see if I see something strange. I have to say that visually speaking, I cannot see—yeah, maybe here I can see some traces of compression, some edge, some strange edge. But I couldn’t say that this is manipulated from these edges.
But fortunately, this tool that is very effective, especially when the compression quality is high, exposed that the image has been likely processed in this part. Again, here we combine things to reach an explanation of the lifecycle of the image.
Just to summarize, we analyzed metadata and saw that Google AI was used and that the image was modified because we have a modified date after creation date. The frequency analysis revealed traces of double compression, and then the manipulation was detected through the ADQ JPEG tool. We also found the original image, and as you can see, we have this thing here.
This case is very interesting because we were able to determine even a small manipulation. This is not always the case—of course, it depends on several factors. In this case, the manipulation was made after the acquisition through some internal software within the smartphone. We didn’t go outside of the smartphone, but please remember that we’re starting to see that AI processing sometimes happens inside the camera.
Within your click, you also have some AI-based processing. That’s why we shouldn’t rely only on tools, but we also have to use our knowledge to understand the background information and see what to expect.
I’ll just give you a couple of examples. One is this Samsung phone that uses super resolution and scene optimizer where you get an image like this that you see here, but then the optimizer provides a very detailed image. Are these details real? Who knows? It’s not the topic of this webinar, but it’s something that we should consider with attention.
Another clear example is again Samsung Galaxy S21 S23 with this remastering feature. We got this picture and then the tongue was changed into teeth. Of course this is a change of the meaning of the content, but the image technically speaking is camera native. What is the boundary between integrity and authenticity? It’s something we should consider because this boundary is getting thinner and thinner.
Massimo Iuliani: Just to summarize what we did, we analyzed a number of cases, and we saw that each case required some specific weapons. Fortunately, Amped Authenticate is well equipped and we keep updating it with the most advanced tools. We saw cases in which, for instance, we found inconsistencies between the metadata and the image content because we had hidden content with GPS coordinates that were consistent with the hidden content but not related to the image content.
We saw cases in which deepfake detectors can be used and combined with physical-based detectors to spot that the image is completely unreliable because they were synthetically generated and manipulated. We also saw that format analysis in the last case could reveal AI traces and we could also spot the manipulation. I think it’s important to note that the last case was very useful because we used a tool that is not explicitly designed for AI to detect AI, but can detect any manipulation—AI-based or not—when the manipulation is done within compression.
So it can be used in several cases. If you are thinking, “Do I really need all this technology?” If you didn’t make it yet, do the deepfake detection test, try to visually detect deepfakes, and then if you get 20 out of 20, yes, you are our expert. Thank you very much.
The following transcript was generated by AI and may contain inaccuracies.
Today I’m talking about Remote Mobile Discovery, but specifically the way I want to address this is through our Exterro forensic agent. As many of you may know, this agent is our solution for forensic endpoint analysis, investigation, and collection.
We have been updating, working on, and adding features to our forensic agent for many years. This came over during the AccessData acquisition, and since coming to Exterro, we have not slowed down on updates to our agent. Hopefully, you guys have seen that over the last few years. Our agent can be installed on Windows machines, Mac OS, Linux, and what we have now added within the last few months is the ability to collect remotely from iOS devices as well, utilizing our Windows agent on an endpoint.
The cool thing about utilizing the preexisting agent is that all the features we’ve built and upgraded over the last few years—such as zero trust compliance, on and off network collections, analysis, and preview—all come along with it and add a lot of power to our iOS capabilities now. We’re calling the iOS portion “Remote Mobile Discovery.”
So what are some of the characteristics of Remote Mobile Discovery that we’re releasing right now? It’s agentless—and you’re thinking, “Wait a minute, you just spent an entire slide talking about the agent.” What we mean is that our solution with Remote Mobile Discovery, which is now available in both FTK Central/Enterprise and our EDDM platform for the e-discovery side of the house, doesn’t install any agent on the phone itself. Your custodians, employees, or users don’t have to have anything installed on the mobile device for you to be able to collect that information. So it’s super handy there.
It’s a single platform, meaning you don’t have to use one solution to collect and another solution to review. You can keep it all in-house, all in one review platform and collection platform synced together. You should schedule a demo if you haven’t seen it yet. We’re running a bunch of those, and we have some webinars coming up later this month and in April as well that show an actual run-through dedicated just to this feature.
Once the device is collected, FTK or EDDM will automatically kick off and start processing that data, so you don’t have another step where you have to hand it off to someone else. It’s just going to process so it’s ready for review.
One of the things we’ve worked on as well is that while, yes, we of course support plugging the device into the custodian’s endpoint (their computer, laptop, whatever that may be) to collect that way, we can also collect wirelessly if the phone and the endpoint computer are on the same network. That allows you to initiate collection without the user having to plug into the computer. This is especially useful if you’ve disabled USB ports and other similar restrictions.
This is available for on-premises installations such as FTK, FTK Central, FTK Enterprise, or our SaaS solution with EDDM and also FTK as well.
So how does it work? How would it look for your user? How does this play out?
First off, you will need an agent on the endpoint. That’s our Exterro forensic Windows agent sitting on your employee’s or custodian’s laptop. No agent is ever installed on the iOS device. One time—whether that’s when you’ve hired someone and need to get them their laptop and company phone, or when you pair the phone and laptop back at your IT lab before deployment—the devices will need to be trusted together. After that trust is established, you can collect as many times as you want without having to replug in and trust the device to the laptop.
To initiate a collection, let’s say I want to collect certain information, like SMS messages. I initiate that, and we only do consent-based collection. What will happen on the user’s phone is they will see a prompt saying, “Please enter your device PIN to authorize this collection.” The user puts in the PIN, and then the collection begins.
That information is collected to the laptop and then, using our agent, sent back to your legal department, IT department, or forensic department for analysis. As I mentioned earlier, the information is automatically processed, so there’s no middle step. You just have to wait for the collection and processing to finish, and you’re ready to review the chats or whatever you’re looking for. It’s super simple and smooth.
The thing we’re going for here is to make it easy so you’re not moving information between platforms. That’s where things get messy—you introduce risk and additional costs by moving things between different infrastructures. This way, you can keep it all in-house.
As I’ve gone through, this is not just about collections. It’s a big part of it—you need to be able to get to that data, react to it, and review it. We have a very purpose-built review system built into FTK and EDDM for the various types of artifacts that you would need to review and bring together for your report or deliverables.
You can export this information if you need to look at it in a different software suite, deliver it to counsel, or if you’re just doing a preservation. We support that. And of course, we have various reporting formats and capabilities to create and edit your own custom reports. So it’s not just a collection system—it’s the full package of collection, review, reporting, and export.
So that’s Remote Mobile Discovery—our built-in ability to reach out, preserve, and review that data. But let’s say you already have some other solution for mobile devices in-house. We can ingest their data as well to combine it with computer data that you may already have. We can ingest Cellebrite data, GrayKey from Magnet, extended XML from in SAB, and of course, we can get Android backups as well and load them in. You don’t need to run it through their software first for parsing or anything like that.
We’ve focused a lot of effort on our own native app parsing lately, building that up so you’re able to use our timeline feature, filter features, and all the different tools to isolate information from one or all of these sources. That’s one of the cool things about FTK especially—you can bring in data sources whether from a computer, phone, cloud resource, or whatever, and view all that data side by side.
So don’t forget about our mobile data ingestion. We’ve continued to support that, and we continue to add more capabilities. I believe Android physical backups were added in our last update as well, so we’re not giving up on supporting all sources of information. Again, we want to enable you to review that data in one spot.
If you have any questions on Remote Mobile Discovery or mobile capabilities in general, by all means set up a call for a demo. We’re happy to walk through it.
The following transcript was generated by AI and may contain inaccuracies.
Martino Jerian: I’m Martino Jerian. I’m the CEO and founder of Amped Software. I’m an electronic engineer. It’s important because this is a pretty legal presentation, but I also have former experience as a forensic expert, of course, in cases related to images and videos. And I’ve been a contract professor at various universities, but now I’m fully focused on Amped Software, as you probably know.
And yeah, about us – we founded the company in 2008 in Italy. And since a few years ago, we also have an office in the US. Our software is used by law enforcement and government agencies and private labs all over the world for working on image and video forensics. And our vision that stands behind everything we do is the concept of “justice through science,” which I think is very important and related to the content of this webinar. And here in this beautiful picture, you can see the entire team on the top of the mountain at our AMLO meeting that we’ve done in January. So it pretty much represents our mood.
Okay. Why this presentation? As you probably know, unless you are living under a rock, AI is everywhere or almost – not very much in our software yet, and for a reason. Law enforcement applications are a big part of the Act, a very big part, and we as software vendors, we develop software and from this point of view, we are subject to the Act. But also you, as I assume most of you are our users, are subject to the AI Act, and you should be aware of potential risks of using non-compliant technologies, or also when you are using our technologies, what are the things to be aware of?
It’s also important for non-European organizations. I see in the participants a lot of names of people I know from outside of the European Union. This is pretty important because the AI Act is a European Union law, but such as the GDPR privacy regulation, if you are from outside Europe and you are working with customers in Europe, or you treat the data of European citizens, you need to be compliant with it.
The fact that you are not in Europe doesn’t exempt you from respecting it in those instances. And also, we expect, as the GDPR – the privacy regulation – has been copied, not copied, but of inspiration, let’s say, in many other states and countries, we can probably expect something similar to happen for the AI Act.
As you’ll see in a few minutes, non-compliance fines are huge. So what’s the objective of this webinar? First of all, it’s a big study you may have seen on our blog. I will share the link at the end of this presentation. I did a lot of work for our kind of personal use as a software company, to understand what of the activities that our users do are subject in some way to some of the regulations of the Act.
And again, a big disclaimer: I am not a lawyer and this is not legal advice. It’s my reflection – my reflection on a very long and complex law and yeah, as such, maybe this webinar will be a bit different than typical webinars from us with a lot of nice enhancements, license plate examples or other hands-on software. So it’s quite a bit dense, I would say. But of course, you can watch a one-hour webinar or read 150 pages of the law as I did multiple times, so you can choose.
Okay, the big marketing of the European Union says this is the first regulation worldwide, and it has been advertised a lot like this. And I think this is a common way of saying that Europe is an innovator in regulation and a regulator of innovation. And I think these two definitions are similar – they are pretty much on spot, and we’ve been the first, we keep this – we started probably.
Okay. First of all, we start with a very brief overview of the AI Act in a nutshell, as we see here. So what is it? It’s a law – the European law of about 150 pages. So there’s a lot of stuff. It’s been published in July 2024. If you have been following my LinkedIn account, I shared multiple times because that is where the news – it has been approved many times because actually the approval happened in multiple steps. So again, it’s the fourth, fifth time that we see the news about the approval, but the final one, the real one was July 2024.
Most parts will be compulsory by 2026 and 2027, it happens in steps, some parts are already, let’s say, applicable, as we’ll see later, and there are some exceptions. It does not apply to use within national security, the military, research, and partially for open source software. And it’s pretty interesting from our point of view because some of our users are borderline with some of these, so sometimes it is a bit difficult to distinguish where law enforcement and public safety finishes and national security starts.
It probably depends on the kind of organization and activities, but sometimes the lines are blurred. And the penalties are very big because the penalties for non-compliance can be up to 7 percent of the global turnover of an organization or 35 million euros, whichever is the greater of the two.
And it’s important that this is not profit, but turnover, and it’s global, not only of the kind of European, let’s say, headquarter of a company, but of all the offices around. So this can potentially make a company default. There are some categories at a very high level defined in the AI Act. First of all, there are the prohibited AI practices that are, of course, prohibited – they can’t be done. Then there are what they call the high-risk AI systems, and they are – they can cause some risk from different points of view. So they can be, let’s say, used, but according to some compliance requirements that we’ll see later.
Then there are what are usually called low-risk AI systems. Interestingly, they are not explicitly defined. There is not a definition, or the low-risk AI systems are not even mentioned in the AI Act, but they are implied by difference. Anything else that is excluded from the other categories is low risk, with the exception of, let’s say, what they call in the law “certain AI systems” with some definitions, let’s say, and they can somewhat be approximated with the generative AI tools like those that were used to create text with AI, create images, videos, audio and stuff like that. And they have some transparency obligations that we’ll see later. And finally, what they call the general purpose AI that are essentially at the core of many popular applications that can do many different things, and they need also to adhere to some rules.
Going through the law, we will go through some important definitions. First of all, the first article, the purpose – you will see over the presentation, the italic font. This means that this is being copied and pasted from the law. And I highlighted some important words. Essentially this defines the overall idea behind the law. Here you see that very much in line with the European Union fundamental values.
Here, the objective is to have human-centric and trustworthy AI, and above other things, the objective is to preserve the fundamental rights, democracy and the rule of law. And then there are many other important things – safety, health, environmental protections, but essentially a good part for this reason is important for our field. Law enforcement use is a relevant part of the law.
Second big, let’s say, definition is AI system. It’s pretty difficult and they define – I put different points just to make it clear. Actually, it’s a single sentence in the law. It’s “AI system means a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment and that, for explicit or implicit objectives, infers from the input it receives how to generate outputs such as prediction, contents, recommendation or decisions that can influence physical or virtual environment.” It’s pretty bad.
I think everybody can – at the beginning, I was trying to study this and see, from this, even normal software could be here, but then luckily, they released some guidelines about what is specified better, what is considered an AI system.
So it’s a multiple-pages document that goes very much in depth with examples on the points that we’ve seen before, but essentially the important thing is that more clearly it defines what we normally consider AI in general, even though there are many different kinds of AI, not only generative AI, which is, or deep learning, which are the most popular nowadays.
So essentially a critical aspect is the fact that is right here: “AI systems should be distinguished from simpler, traditional software systems or programming approaches. It should not cover systems that are based on the rules defined solely by the natural persons.” Okay, so this means that with software, where it is the human that programs the rules, this is not AI – to put it in a very kind of informal way – it is when the system, usually with a data set, learned what are the right parameters and rules, basically learning from data.
Then there are definitions of the various subjects that are involved in the AI Act. So normally I call ourselves vendors. In the AI Act, they call us providers. So in our case, developers of the technology, and then there are end users, and users over the law are called deployers. Okay. And then there are others. Others are normally called operators, and it’s other entities. It can be the provider, the deployer, but also manufacturer, representative, and importer distributor.
So all these in general are called operators, but the vast majority of the things we will focus on will be provider and the deployer. So we are providers, you are deployers, and then there is a definition, very precise, about law enforcement, and they define this like “any authority or competent for prevention, investigation, detection or prosecution of criminal offenses or the execution of criminal penalties, but also any other body or entity that has been assigned these duties.”
What does it mean? That, according to my interpretation, of course, what we will see over the law about law enforcement is actually applicable also to private forensic labs exactly in the same way, because these private labs we assume are being assigned by the public authority to do this kind of job as well. Another definition that I pretty like is deepfake.
I usually write deepfake altogether without the space, but they prefer this form and it is the way it is. And they define it like this: “deepfake means AI-generated or manipulated image, audio or video content that resembles existing persons, objects, places, entities or events, and will falsely appear to a person to be authentic or truthful.”
Interestingly, here, we see that it’s not much a technology – of course, it’s done with AI-generated or manipulated, but also the context in which we evaluate this image. And I think it’s pretty much in line with the definition of the SWGDE, Scientific Working Group on Digital Evidence, the definition of authentication, which is the process of defining that the data is an accurate representation of what it purports to be. Okay. So again, here, there is not much about the technology – it’s just doesn’t really represent the truth.
And so we can make a couple of examples. Okay. So probably, Midjourney – it’s one of the many applications where you can do text-to-image. Okay. So I asked Midjourney to “create an image of a night-time realistic photo of a group of a cop looking at himself in the mirror seen from the side.” And I get this image, which is pretty much, yeah, realistic photographic style.
So if I pretend this image to be a picture, a real photo of a real person, this is a deepfake. Okay. But then I use the same technology for doing this, a drawing in the style of a seven-year-old dinosaur riding a motorbike with some other technical features, okay?
This is clear that it’s a drawing. I’m not pretending that a real dinosaur is riding a real motorbike, okay? So it’s not a deepfake. But if I pretend that this drawing has been done by my seven-year-old son, for example, maybe this is a deepfake because the context is different. Probably we can discuss on that – it’s a very philosophical thing. But this is just an example of the idea behind it.
Over this presentation, we will evaluate typical image and video forensic activities and see what the AI Act has to say about them. This is a very generic application, even a video search and content analysis. So a video summary, find all people with a blue t-shirt or the red cars in a video, find all explicit videos, pictures with drugs, guns, so content analysis, okay? This is one possible application that we will study.
Then face recognition on recorded video. Typical question: Is this the same person? This is me, believe it or not, many years ago. Yes, it is the same person. And this is a typical question that can be probably solved by AI, but should it be?
Then, license plate recognition on recorded video. You may have already played with our tool DeepPlate that from a very low-quality image, it estimates what are some possible character combinations recommended for investigative use, of course. So this is another topic that we are investigating.
Then image and video redaction, pixelating, blurring, putting a black rectangle over sensitive details, another very common practice.
And then there is image and video authentication. For those of you who are familiar with Authenticate, you know that we provide many different tools inside it. There are traditional tools based on traditional image forensics papers totally unrelated to AI, but since a few years, we also added some AI tools to complement the traditional ones because it’s pretty hard, even though not impossible, to fight AI without AI.
So this is our image generation tool. So you see on the left my colleague Marco, which is clearly not an image generated with GAN (Generative Adversarial Network), while the other person has been created with the website thispersondoesnotexist.com and of course it’s detected as such. And then we have our classical image and video enhancement. Typical example of a blurred license plate. You already know everything about it, probably. So this is, of course, a very important topic to investigate in the light of the AI Act.
And then let’s go into the depth of the AI Act. Now we’ll compare this list of typical activities with what the AI Act says that is prohibited. Will any of these activities that we do be prohibited to be done with AI? Let’s see.
So what are prohibited practices? And I simplify them a bit. Of course, there is the law. If you want to go into the very nitty-gritty details of it, behavioral manipulation – so using AI to unconsciously change the attitude of people, social scoring – evaluate the behavior features of people, predictive policing – it’s partially prohibited, not in all instances, but some parts are prohibited, emotional recognition in work and education – it’s prohibited, biometric categorization with some specific purposes that we are not going to go much in depth here.
This is interesting – scraping of facial images for face recognition from CCTV or the Internet. This part of the law, I think, has been written with a specific use case in mind. And I read up the name of the company, but there is a pretty well-known company that created a database of faces scraping Facebook and other sources and that have been fined for a million dollars/euros by many European countries because it’s totally against our privacy regulations.
So they did put this in the AI Act too, and then law enforcement use of real-time biometric identification in public. The keyword here is “real time,” okay? Doing forensics, we are not much interested in real time, but on recorded video.
So there are some exceptions to this prohibition, okay? Oh, yeah, by the way, these are already enforced since this February, okay? This stuff is already forbidden. You cannot do it in the European Union. What are the exceptions for the real-time biometric identification? The most common application is of course, face recognition – cases of abduction, trafficking, sexual exploitation, or missing persons, imminent risk for life or terrorist attack, and identification of suspects in serious crimes that are punishable with detention of at least four years.
Okay. This was a big part between different member states of the final negotiation, because someone wanted more power to investigate. Some were more protective of privacy. So it was a big discussion over the last time of the last part of the negotiation for the law.
So interestingly, let’s go over very quickly our topics: image and video search and content analysis, face recognition on recorded video (on recorded because the other we’ve seen is prohibited), license plate recognition on recorded video, image and video redaction, image and video authentication, image and video enhancement.
These are not prohibited, so first step – it’s okay. Then let’s see if some of these activities are under the high-risk category. These are defined in the Article 6 and Annex 3. Putting together, it’s a bit complicated, but it took some time, but I did it. So what are high-risk AI systems? Safety component for some products like cars, biometric identification – not verification, but only identification, biometric categorization in general, or some specific instances are prohibited.
We’ve seen also emotional recognition prohibited in work with education, but in general high-risk, critical infrastructures, education, employment and workers management, medical devices, access to services like insurance, banks and stuff like that, law enforcement and border control, justice and elections and voting. I highlighted here the part of interest potentially for us.
So what I did, I went to study those specific particles. So the first is biometrics. So I think the first thing which is written everywhere, it’s “insofar as the use is permitted under relevant Union or national law.” This means that the AI Act is not the only law that we have in Europe or in other member states. So maybe according to the AI Act it is allowed, but there are many other laws to consider. That’s important. This does not supersede other laws. And in general, are remote biometric identification systems. Okay.
So what does it mean? That face recognition on recorded video is considered a high-risk activity. Again, on recorded video, on real time, it’s prohibited. Okay. And remember it’s not just recognition in general, any biometric identification system – of course, face recognition is considered the most common and critical from this point of view.
Then we have law enforcement as a category again, if the use is permitted under other laws. So the first point, which is very interesting, is “AI systems intended to be used by or on behalf of law enforcement authorities.” Then other test is “polygraphs or similar tools.”
I studied a lot of these because I think probably we need some more precise definition if it’s only related to polygraphs or there is more, because if we interpret this literally, it can be much more, let’s say, much wider. But it seems to be into kind of lie detector stuff. And then this is also interesting: “AI systems intended to be used to evaluate the reliability of evidence.”
And what is an example of this? Image authentication, of course. So these we will check later. But essentially, we already have a hunch that it’s about deepfake detection or in general, image and video authentication done with AI since it’s to evaluate the reliability of evidence. And then we have another section, which is border control, migration, asylum, and border control management.
Again, there is the section, the same as polygraphs or similar tools. And then we have “AI systems used, blah, blah, blah, for the purpose of detecting, recognizing, or identifying natural persons.” Okay? So this is very wide. And it’s interesting that essentially, border control is also – someone that implies – it’s related to law enforcement, but they have stricter rules. Okay. So this is pretty important for our analysis.
Then there are justice and voting. And so I went here, since what we do is related to justice, but essentially the only part that could be somewhat related is this one, “to assist the judicial authority in researching and interpreting facts in the law and in applying the law to concrete set of facts.” And this is what we call the robo-judge or the AI judge. So it is not related to what we do with videos essentially.
So after this deep dive into the high-risk activities, oh, yeah, there is the derogation because even if you are in one of those cases, you may not be subject to it if the AI system is intended to perform a narrow procedural task, so it’s not doing the entire job, but just a small part. If it’s done – if it’s used to improve the results of an activity completed by a human, and/or if it’s a preparatory task to an assessment that is done in other ways.
Okay. If you think you’re subject to derogation, you need to document and do an assessment before placing the system on the market. Okay. And this is something I discovered pretty recently. On the first analysis, I didn’t notice this. This is interesting. There are systems that are so-called “grandfathered.”
What does it mean? That if there is a high-risk AI system that has been put on the market before the 2nd of August, even if it’s a high-risk system, it does not need to be compliant unless there are big changes to it. And unless it’s been used by public authorities, then you have time until 2030 to become compliant. This is pretty interesting.
Okay, so our typical image and video forensics activities, high risk? Let’s see one by one. Image and video search and content analysis – in general, this is very wide, could include some of the various activities. But in general, imagine a video summary, find all cars and stuff like that – is not a high-risk activity. Maybe an exception in the context of border control, but likely would be derogated as a preparatory task, like search for all the cars and then the human investigates more. And this is for recorded video. Real-time analysis again can be a bit more problematic, especially if it’s done for profiling, which is an entirely different matter.
Face recognition on recorded video – it’s a high-risk activity. Very clearly, also other biometric IDs are a risk and let’s remember that real-time biometrics and scraping of face recognition database from the Internet are also prohibited. Very important.
Then license plate recognition on recorded video – it’s not a high-risk activity. There isn’t anything written around there, but there has been a nice paper written by Larget at others in 2022. They work on a draft of the AI Act and they came to similar conclusions to what we do in many aspects, but they had a different idea about this in the sense that they think that also other systems like license plate recognition or other kind of photographic comparison should become high-risk because they are – they can be used for identification.
Essentially, we can tie a license plate to an individual that owns the car. So they expect this to be considered high risk, but there is nothing written in the law as far as I’ve seen, that can do this. Of course, things can change.
Image and video redaction, of course, as expected, is not a high-risk activity. Image and video authentication, yes, because we already seen it’s probably a high-risk activity when done with AI, of course, not with other techniques, because it’s used to evaluate the reliability of evidence, okay? And also, the authors of the mentioned paper agree on this.
Pretty clear again. This is more part – if you think about our use case, because in Authenticate, we have many other tools. And in any case, the result, the final analysis, it’s put together by the analyst – it’s not that an image is automatically classified as fake or not by an AI. And that’s it. It’s just a tool that can, that is being used by a human. So maybe some derogation may apply in this case.
Then we have – I’m pretty big on image and video enhancement. According to this list, it’s not a high-risk activity. Maybe it could be high risk when done with AI. Again, traditional algorithms, for example, those that we have in Amped 5 are not based on AI. So we are not even remotely discussing that for now. But maybe it can be a problem in context related to border control, maybe derogated, but that’s a longer discussion.
Okay. But one thing about AI enhancement I want to mention is that it’s pretty risky. I’ve been discussing these over and over. This is an example that I show in other webinars that I think is pretty impressive. It’s on the homepage of this tool that is linked here. It gives impressive results. Amazing that are good for your vacation photo, definitely. But if you think about this as evidence – you can very clearly see that it’s changing the brand of the car when enhancing it, as it’s totally making numbers and letters of the license out of nothing.
If we are working with evidence, this is very risky because it can create something that looks like a legit image, legit high-quality image, and where we can put our trust. But it’s actually not, because it’s an image that has been AI-generated and AI-manipulated. It’s very risky and very interesting. It was this case – it is somewhat recent. It’s about one year ago. So far, there has been the first big case in March 2024, where some videos have been disqualified in court in the US because they’ve been enhanced with an AI tool.
Essentially as the law, based on previous cases, works in the US this sets up pretty strong precedent. So AI enhancement is not acceptable as far as these – it was, if I remember correctly, a Frye hearing, several experts were called to testify on it. And also after these, there were quite a few interviews with experts, the field of discussion on legal journals. And it was pretty clear that for various reasons that were mostly legal at this point, and the acceptability of the science, not kind of a pre-conceptual things about AI – in any case, this was not deemed acceptable. And I pretty agree with that, from my position.
What are the requirements for high-risk AI systems now that we’ve identified? So it’s a lot of articles of the law, we’ll go through the main points, but of course, if you need to make your software compliant, you need to do a lot of work. So the first part is data and data governance, essentially keeping under strict control datasets used for training, validation and testing. Okay.
So you need to track very carefully the process of data collection, origin of data, and the purpose of data collection, because maybe for privacy reasons, you are authorized to use those images for marketing, for example, but not to train an AI, then the examination in relation to possible biases that can have a negative impact on fundamental rights. Okay. So the data set should be built in a way to minimize bias as it’s written in the next point.
And then, of course, the data set should be representative. Let’s think about face recognition. If I don’t train the system on a data set which is relevant and has more or less the same proportion of people of different ethnicities, for example like in the country where I’m using it, then the result will be wrong. It happens already.
And it should be free of errors and complete – which is, databases are huge to be complete and free of errors – it’s quite a challenge and this puts a kind of a bigger responsibility on the vendor, because for AI, the data sets are the most complicated and the biggest thing that we have to create, and this should be checked under control. It’s pretty correct.
Then record-keeping. Okay. The system should have logging capabilities. For example, recording the period of use, the reference database, because the database that they’re using now, maybe it’s different from that of five years ago, the input data that has led to a match, natural persons involved in the verification of the result, because they must be verified. I see here a lot of privacy complications, saving all this data. Again, having one without the other is not always easy.
And then another big thing about AI is transparency. So they should – there should be enough information given to deployers to understand the output of the system and to use it. So for example, what’s the purpose of the system, when and where it does work, how robust it is, in which situation it can give wrong results, and things like that. Again, where can it be misused or when the conditions do not allow to use it. And then there are a lot of other parts, like providing information that is relevant to explain its output, performance in specific personal groups, specification for the input data, information to interpret the output.
All of this is not easy. In fact, they put kind of workarounds – “where applicable,” “when appropriate,” “where applicable,” because this is the big problem with AI. Especially the ability to explain its output. Very rarely we are able to explain the result given by AI. They are like a black box. That’s the main issue, let’s say for AI, and then we have the human oversight. So the big point is they should be effectively overseen by persons, so this person should be able to detect the anomalies and unexpected performances. They should be aware. So there should be education.
There is what is called automation bias – our tendency to trust instinctively, maybe too much, the result given by a machine because normally machines work better than humans, maybe, and also to be able to interpret the outcome. Okay. And of course, they – the human should be able to override it, not to use the AI system or reverse its output and interrupt the system with a stop button. Maybe it doesn’t make sense for some applications, but for something that has a physical impact, a stop button is pretty important.
Okay, so we’ve seen very quickly the compliance, the main compliance topics for high-risk systems. But there is another category, what we call earlier “certain AI systems” that we can oversimplify – it’s AI image and video generation tools, okay? What are their obligations? And there’s right here in the Article 50, okay?
They say “providers of AI systems including generative composed AI system generating synthetic audio, image, video, or also text, shall ensure that the outputs of the AI system are marked in a machine readable format and detectable as artificially generated or manipulated.” Essentially, any output created by generative AI should be digitally signed or watermarked.
This is already done by certain mobile phones that already put some information. Also, some of the image generation tools, they put these watermarks or metadata that, of course, are not foolproof always. What I’m mostly worried about, even though it’s not our topic, is text content, because how – there are somewhat some ways to watermark text content, but yeah, it’s not as easy.
So we prob – the – we solved the problem. We did fix, right? Because they are all watermarked, so we can easily find – no problems. Of course not, because first of all, this is a European law and not all the providers are in Europe. And of course not everybody respects the law. Otherwise, we won’t be here. And then of course there are a lot of open source tools that make these unenforceable. Because even though an open source library or tool has the, let’s say, the watermark feature, being open source, a programmer can easily remove it.
So there are some exceptions. Even though we have AI-generated or manipulated images, we don’t need this kind of disclaimer, this transparency, if it’s just an assistive function for standard editing. For example, I use a traditional brightness and contrast adjustment, but the optimal settings have been found by AI. That is an assistive function – where authorized by law to detect, prevent, investigate or prosecute criminal offenses. The most obvious thing I can think of is a fake social media profile done with AI-generated images and maybe about the text.
Of course, if I’m a kind of undercover agent trying to investigate something with a fake bot or whatever, I cannot put “I’m a fake.” And this most important for us exception – “when they do not substantially alter the input data provided by the deployer or the semantics thereof.” And we can have a big discussion.
So this is one of the tools that we tested for AI enhancement. So we took a picture of Tommy Lee Jones. We made it smaller without scaling and then we made it bigger with an AI tool. Okay. So if you have this picture on the right, you can see it’s a pretty high-quality picture. Okay. And I can also say that it resembles a lot the picture on the left. Okay.
But on the other hand, if I need to use these for identification of a person, this is completely unreliable, completely dangerous because it looks like something I can trust. Of course, from the picture in the middle I can’t say much, but at least I wouldn’t over-rely on it.
On the picture on the right, I seem to have some good material for investigations, but actually the nose shape is wrong. The pretty peculiar eyes of Tommy Lee Jones are completely changed. There is also a distinctive trace, a kind of red mark on his forehead and it’s gone. So you can see these – this – does this substantially alter? This is a very bad definition. Okay. So this is important.
So this was for who creates the tools. This is for deployers. So you should disclose that the content has been artificially generated or manipulated. Then there are exceptions. For example, if you do something that is artistic, satirical, and this would hamper the enjoyment of the work, then it’s attenuated, the requirement. Again, when authorized by the law, you can avoid it, but still responsibility on your end too.
Okay, so let’s come to the conclusions. So I want to compare these with the preexisting position that we had on AI. I’d been studying this topic for years. Probably now this post is from 2021 that you see linked here. If you have been following this, you know that what I did, I was dividing these into kinds of categories. If we did evidential use, so to use the result as evidence, or investigative use, like a lead, like intelligence, stuff like that. And then I divided enhancement – so when I have – I’m producing another image – or analysis – went from an image, I get a decision like face recognition.
And in general, my big no has always been using AI-based enhancement for the reason that we have seen on the previous slide, no, because it’s not explainable. And there is a bias from the training data. For investigative use, we could probably use it. But I need to be sure that intelligence doesn’t become evidence because then it’s a big deal.
So like putting a watermark, something writing, “not for evidential use” or something like that on the image. And I need to educate operators about risks, because if you over-rely on this image – you look for this guy and you find that actually Tommy Lee Jones and it’s different – then you can go in the wrong direction.
But what regards analysis – you can probably use both for evidence and intelligence with safeguards. So only for decision support. So the human must always have the last word. I should know the reliability, when it works, when it doesn’t, what is the success rate, and mitigate the bias of the user. How can we compare with the AI Act?
Of course, they are minor kind of rules of thumb, very specific for our field. The Act is huge, much more detailed – it’s a law, which is not a blog post, let’s say – but you can see there are pretty – the key points are more or less in line. You see “only for decision support” is the article for human oversight. “Know reliability” is transparency and provision of information to deployers.
“User bias mitigation” is in the article about AI literacy that we didn’t go through, but it’s also another part of the AI Act and the risk management system. One thing that I didn’t write about that I think is pretty important is the data governance. Of course, it’s not much about how the user uses the system, but how I train the system with the proper data sets.
So let’s summarize what we’ve discussed. So probably – I say probably because again, this is not legal advice, this is my interpretation – probably high risk when performed with AI: image and video authentication, not just the deepfake detection, but also the detection with AI of traditional projects, and also face recognition on the post analysis of recorded video. Again, in real time, it’s prohibited, same for some specific situations. So these are the two things to be aware of.
Probably not a risk when performed with AI: image and video search and content analysis, license plate recognition (for now, maybe it will change, but for now it’s okay), image and video redaction, and image and video enhancement. But, as you have seen in the last part, according to the AI Act, image and video enhancement is okay, but the image must be marked as AI-generated or manipulated. This is very important. And again, the AI Act allows it – it doesn’t mean that it should be used for evidence. This is just one part of the law, and it’s not specific about, let’s say, investigations.
I want to leave you with some final notes and thoughts. Maybe it could be also – we can have a small discussion over the chat. I didn’t manage to keep up with the chat, of course, because I was speaking. So again, I’m not a lawyer. So for official legal advice, discuss with your legal counsel. Things can and will change because the technology is changing so fast and there are so many things that the law needs to define for actual application, as right here.
There are – even though some parts are already in effect, especially the prohibited activities – to actually be compliant, there are many more guidelines that we expect to come because now it’s too bad. Okay, and it’s, of course, it’s not time yet, and there is one of the questions that I was asking myself.
So let’s speak, for example, about deepfake detection. Okay. There are lots of websites, tools that claim to be deepfake detection with different rates of success. Okay. Let’s say I’m developing one of these tools that I put on the wild just for, maybe for people just to test stuff on social media, and you, as law enforcement, use this tool, which was not, let’s say, developed explicitly for law enforcement use, but still can be used.
So we have identified that deepfake detection potentially is a high-risk activity, but this developer didn’t follow the AI Act because it’s in some other part of the world and didn’t imply it to be used by law enforcement. So who is responsible for using it? Probably you, but also this developer? I don’t know, but it’s something that left me thinking.
Then, this is a big one. We’ve seen over and over speaking about the fact that compared to many other countries that are very aggressive towards adoption, in Europe we have this Act. You have seen it’s pretty stringent. Okay. So is it a risk for innovation? I think so. But different countries have different values and focus. And if you go to read the fundamental values of the European Union, the focus on privacy, attention to human rights, fundamental rights and things like this – I think it’s very coherent.
So you may not like it because you would like some – this group of countries to be more aggressive and not be limiting the technology. And we can have different opinions. I can be more conservative, you can be more aggressive. It doesn’t matter. I think it is coherent with the fundamental values of the European Union.
And again, yes, probably it is a risk for innovation. And I think related to this, I think it’s very important with adoption of AI – I think this is the kind of big thought I want to leave you with. Let’s imagine this AI becomes a kind of an oracle. It’s an oracle, which is a black box. Okay. That is almost always right. Okay. Let’s say 99.99 percent of the time, it’s right.
For law decisions, putting some person in jail for life is probably more right than people – they’ll say it is for now, but let’s put aside and let’s say that still, you trust it, but you don’t have a way to verify when it actually works or not. You don’t know how it works inside. Would you trust it for critical decision or not? This is almost always right, but you don’t know why and how – would you trust it or not? And I think this is a big question that I leave to you to reply because everybody can have different opinions on this and that’s it.
Here we have the QR code and the link to my big blog post with more or less the same content as this webinar. I hope you enjoyed – I know it’s a heavy topic. It was heavy for me to study, but very interesting. I hope I did make it a bit clearer and yes, thanks for being with us. Let me check the comments.
Oh, I see. There is a lot. Yeah, we have a question from Carrie. Okay.
Okay. Three minutes ago, one of the last ones. Okay, let me check. Thank you. Thank you. Okay, can you speak a bit about the AI Act implication for DeepPlate? As I said, according to the AI Act, for now, there is no – it’s not a prohibited and it’s not a high-risk activity. Okay. On the other hand, as you – let me go back to the slides so it’s clearer.
Okay. It’s a tool that’s for analysis. Okay. So we don’t do enhancement with DeepPlate. To be transparent, actually inside DeepPlate, it also creates an image. And in early versions of it, we also showed the image. Internal version for development. What is the problem? That once you show the image to a user, and it was a very nice image, you give too much trust to it. Okay. While if I just give you the numbers with a level of confidence, you end up more skeptical, as you should be.
We implemented DeepPlate with all the safeguards. First of all, there are disclaimers everywhere that it is only for decision support and to minimize bias. We also tell you, you first need to analyze yourself the license plate and then use the tool. So you are not biased by its results. And then about the non-reliability, we did a lot of testing with real license plates. We are seeing if we are able to publish a paper on that as well. So we put some safeguards, but again, for the AI Act, it is not an issue at all.
Matt Finnegan: Hello, everybody. My name is Matt from Oxygen Forensics. Thank you for taking the time to listen to this webinar today. And the topic that we will be talking about is Oxygen Analytics Center, or OAC for short. OAC is one of our newer products, so you may not have seen it or heard about it before. And the aim of today is really to provide a general overview of what the system actually is, but also what it can be used for, where we think it fits within that digital forensics ecosystem of of tools.
So to give you a brief idea of the things that I’m going to talk about today, I’ll talk about what OAC is in a nutshell. And also a little bit about the architecture of the system, how it can be deployed, where it can be deployed. I’ll move on to, you know, logging into the system. And the first thing that I’ll talk about when I do that is actually the system of user roles and permissions that we have, which is really integral to OAC.
I’ll then talk a bit about how you can load data into OAC and a few different ways that we can do that. And then finally, I’ll give you an idea of of what the data actually looks like within OAC and some of the functions that you have there. So to go back to the start, and I’ll try and sum up what OAC, is in a, in a nutshell. I think the easiest way to do that is to say that it’s a client server based application.
And the idea is that you can upload data from digital forensic sources into this central server, and then you can access that data through a web browser, which has a number of advantages that I’ll come on to talk about in a moment.
So you can see on the screen in front of you that I have the OAC login page. And I’m going to log into OAC in just a moment. But before I do that, I want to talk briefly a little bit about the architecture of the OAC system. So it can be read in terms of deployment. The end user can really deploy OAC wherever they like. It can be deployed on premise, so it can be deployed on bare metal or virtualized environment on premise. Or if people do want to deploy OAC into, you know, a cloud environment or some of the virtualized environment, they can do that as well.
The installation process is quite simple. It’s just a single executable installer file. And once you have that, you can install it wherever you like. So we want people to be able to install it on premise if they want to because that can be quite important. But equally it can be installed absolutely anywhere.
So I’m just going to log into OAC, and I’m actually going to log in as this administrator user that comes as default with OAC. And you’ll see the reason for that in a moment. When I log in as the administrator, it gives me access to this administration page. And by default it will actually take me to this users page within the administration section.
So this is the users page that you can see here. And this concept of having multiple users is really, really quite core to OAC. And it’s its role. You know, one of the primary use cases, for OAC is to give other people access to digital forensics data easily and quickly as well.
You know, if we think about when digital forensic examiners do extractions, then they process and decode and parse data from those extractions. And, you know, that is generally done in quite specialized digital forensics tools such as, you know, Oxygen Forensic Detective. But once the digital forensics team has done that work, there’s usually a requirement to share the results of that work with other people, who are not necessarily digital forensic specialists. So those are the people. For example, they could be, you know, police personnel that have been assigned to review information from digital forensics data, be it images, messages, web browsing history, etc., and they might not be digital forensics people.
You know, their day job might be completely different, but they’ve been pulled in just by the fact that they were working on a particular case to review some of the digital forensics data from that case. But it could be it, you know, a lot of other people that would need to look at digital forensics data as well.
For example, that could be more specialized analysts if if we’re talking about a law enforcement specific example. Again, it could be analysts whose day job is to work with cell phone records and Cd-R. And so they might just need access to a list of the contacts that were found on those extractions. And they could be more general analysts who are looking at data from a number of different sources, including digital forensics data.
And it could be people outside of that law enforcement sphere. It could be in legal proceedings. Quite often third parties require access to digital forensics data. So there’s a whole raft of people that kind of act as customers, of the digital forensics teams that are producing this data, but they’re not necessarily all technical.
They’re not necessarily all digital forensics specialists. And the way that we would share data with those people traditionally is through the medium of things like PDF reports, spreadsheet reports or possibly, using something like the Oxygen Viewer, which is a free, standalone, portable, tool that can be used to view digital forensics data.
And those methods – they all work. But there are disadvantages to them. You know, probably the biggest disadvantage is the time in the overhead that can be involved with generating those reports and getting those reports to people. And particularly if it’s something like the viewer program where the, the end user then has to set up the viewer on that machine and import the data into the viewer to be able to look at it.
And all the while, when you’re using those methods, everybody’s looking at that data in isolation. And that brings me onto one of the key features of OAC is that you can have all of these different users created. And if those users are logged in at the same time, they will see the work that other users are performing.
So it is a, a multiple use, collaborative environment. And that is becoming more and more important actually, with the sizes of cases and even just the amount of data that can be on a single device. You could be talking about a case with 20, 30, 40 different devices or extractions from different sources. And each one of those extractions could be huge.
That could be thousands of images, millions of messages on each one of those extractions. And increasingly, we see that the the task of going through all of that data is assigned to more than one person. So you quite often have review teams of ten, 20, 50 people. So one of the ideas behind that we see is that instead of everybody working on that data in isolation, you can create all of those people as users within the OAC system.
And then as long as they have access to the server, they can log in and work on that from any machine that is capable of running a web browser, which is pretty much everything. Right? It’s going to be every computer, but also phones, tablets, laptops. And so there’s real big advantages to having this system that you can log into through a web browser.
So to talk about the way that users and roles and permissions to data work and you can see that I’ve got a number of users already created in my OAC instance that I’m running here. And I want to talk about actually two separate things here. So the first is roles and the second is permissions to access data.
So if I talk about permissions first, we appreciate that especially in a system like this where data is held centrally, you won’t necessarily want to give every single user access to every single device or case inside the system. So you can assign particular devices or cases to particular users. And actually if I go in and just edit this particular user, and go to the permissions page, sorry, the data access page, you can see an example of that.
So there’s a concept of departments, cases, and individual devices here. So you can create departments and you can put people into those departments. And then you can also put devices or cases which can contain multiple devices into that department. And people, if they’re in that department will inherit the permissions for the cases and devices that fall within that particular department.
So as an example, if I select this one department that I have created for counter smuggling and within that we have this particular case, I don’t say these particular devices, but you can also give people access to data on a one off, case by case, or case by device, sorry, device by device basis.
So if I just wanted to give somebody access to a single extraction, I could do that as well. And I’m going to give this user just access to this one extraction here. The other concept that we have within this user section is the idea of permissions. And to demonstrate that if I, want to go and add a new user, there’s a number of different roles I can select.
Now, some of these are predefined roles. The first three, are predefined within the system when it’s installed. And the fourth is actually a custom role that I’ve created. So I’m just going to go back and go on to the role manager, which allows you to create these custom roles, and you can name them whatever you like.
And this is not really about, defining what data people have access to. This is more about defining what functionality and actions are available to those users within the OAC system. So there’s quite a detailed matrix here which allows you to give people access to particular sections of the system. So as an example, if you had an analyst whose job was purely to look at communications data, you might only want to give them particular views for contacts and communications, as I’ve done with this user.
And also actions. So, you know, the ability to create or view or edit tags, are things that can that can be defined within this role manager. So I’ve only given this user permission to view and create tags, but not to edit or delete, already existing text. So there’s a lot of flexibility in terms of what data people have available. But also what actions they can take and what views they have available within the system.
You know, there could be different reasons for that. It could be that, for legal reasons or compliance reasons, and people shouldn’t be allowed to just do broad searches. So you can remove those search abilities. And there’s that kind of reason. But it also may be the case that if the users who are using the system are not as technical – if looking at digital forensics data is not their day job – you may just want to simplify the view inside the system so that they don’t get distracted or overwhelmed with the amount of different options that are available.
And it will simplify their workflow because there will only be a few different sections that they can actually go to. So as an example, and I’m logged in as this administrator user, and I have access to all of the different views and analytics and search functions within OAC. So you can see I’ve got a number of different views here. A number of different options for searching. A number of different analytics etc.. And if I was to log out now and log in as this user, where we’ve more tightly defined that user’s role to just be around communications and contacts.
It’s like, bear with me when I log in as this user, we have a much, much more simplified view. So we do still have the ability to look at the cases and devices that we’ve been assigned to. But within that view section, we really only have the ability to look at contacts and communications. So I think this is quite important to be able to control and restrict what what users can see and do, but also to make it more simple for users. If it’s somebody who just needs to look at communications, you can make their life much easier.
Particularly if they’re not that familiar with the tool by just giving this communications view to them. And then when they log in and it will be quite easy for them to see where they need to go, in order to review or analyze that data. I’m going to look back in as the database administrator user because there’s a few the things I want to show you inside the full view version of OAC as this user.
So the the next thing that I would like to talk about is how do you actually get data into OAC? How do you load data into the tool, and what does it look like once you’ve done that? The really two main methods to load data into OAC. And the first one is actually through Detective. So if I open up a copy of Oxygen Forensics Detective here, you may have noticed if you are a user of this tool, over the past year or so, if you right click on a device, there is a new option here, which is export to Analytics Center server.
Within the settings in Detective, you can actually put the IP address and username and password for an OAC instance into detective and actually link the two together so that you can push data directly from Detective if you want to. And so once you have an extraction loaded into Detective, you can right click it. Or you could do this on an entire case basis as well, and export data directly to the Analytics Center server.
And it allows you to export data selectively as well. So if you do have somebody who just needs access to those contacts, you could do that or you can send everything from the device. But having that selective extract, selective, export to OAC, I think is quite important because again, with the volumes of data that we’re seeing in cases and devices nowadays, if you can just send the data that needs to be looked at, or that is the most important always, or maybe just the most relevant, for the person that you want to give access to, it will significantly cut down on the time that is taken to do that. And you can also do export based on tags. So, it’s important to note that all of the tags and key evidence, styles and notes from detective will transfer across into OAC.
And you can even do the export selectively based on particular tags as well. The second option to get data into OAC is actually to do it through the web browser. So if I go to the data view, or the data page and go in the load data view, you can load data from a few different formats, through the web browser.
It is, I think, worth saying that anything that will go into Detective can also go into OAC. So this is not just a tool for analyzing, mobile phone data. All of the things that you would be able to load into detective, including mobile phone data, but also computer data, cloud data, downloaded accounts data, warrant returns, vehicle data, drone data.
If it will go into Detective it will also go into OAC. So this is not just a tool for looking and analyzing mobile phone data. It’s more broad than that. It’s digital forensics data. Generally in terms of how you can view the data. So there’s a lot of different options here. And I’m not going to talk about all of them.
I’m just going to give a general idea. If I go into this devices page where I can see all of the devices that I have access to as my current user. If we open the statistics, this looks quite similar to that dashboard that you get in Detective. And as I start to click around these different devices, I get some statistics, about the top ten contacts, top ten apps, top ten groups, etc., etc. And I also get the device information that I would have in detective as well.
So if I wanted to look at the model or the IBI or the SIM card history potentially, I can find all that data in this page. If I then want to start looking at the data on a device, there’s a few different options. And like I said before these can be restricted based on the users roles.
One of the views that we added in the last release is a really, quite useful one, actually. It’s a general communications view. And this is a really a quite simplified view of the communications data on the device. And I just want to show this one to you as an example. So the idea is that you can, you know, look at the different accounts for messaging applications that might be on that device.
So for Telegram, for example, there’s two private chats within this account. And then if I click on one of these I just get a really simplified chat bubble view of that particular chat thread as the user would see it on the device in those chat bubbles. Really, really easy to view.
The general UI layer that we have actually within that we see quite closely mirrors Detective as well. So we usually have filters on the left hand side, some content in the middle and some details over on the right hand side. Everything that you can see within OAC is an item you can tag. So I can add tags to things. And like I said before, other users will be able to see those tags as well.
So as I start to type things or even add notes to things, when other users are looking at that data, they will see those tags, and they will also be able to filter on those tags as well. So as an example, I could go through one day or one particular shift and tag or mark things as key evidence and then say to the next person, “I’ve worked up until here, if you filter based on my tags, you will see all of the things that I’ve marked as being interesting or where I’ve gotten to in my current work.”
I just want to show maybe 1 or 2 other views. So if I go back to devices, and maybe look at a different extraction that has a little bit more data in it, you can see there’s lots of different views. If I wanted to view the files to do an image review, I can do that. I can just look at a list of the contacts and I have a mapping tool as well.
Maybe this is the last view that I’ll show within here. And we’re not really going to have time to go into the search functions and the analytics within OAC. This workspace view and this is a aggregation of all of the parsed decoded data on the device.
So you can also view communications within here although it’s I think easier to view those within the communication section. But what you’ll see in this workspace view is a really, thorough set of filters, for the different categories that we have. So this is not just the messages. It’s also the web browsing history, user searches, etc., etc. It can really be thought of all of the parsed data on the device.
And then we can start to filter based on those particular things. So we could just filter on one category and we can add a lot of additional filters as well. But just as an example I’m going to apply a filter to this web history. And we can see the web history from that device at a quick glance.
And again, as we’re going through and reviewing this, we can mark things as key evidence or add tags. The purpose of this is really to give you a very quick overview and general overview of the system. I’m not going to go into the particular analytic views or search functions that we have. Hopefully this has been enough of an intro to give you an idea of what the system is. Where we see it fitting – particularly with that kind of review use case, giving other people access to data easily. The idea that you can just send somebody a username and password, and an IP address or a web address to your OAC server, and within seconds they can be looking at that data.
And you could have even pretext, the things that you want them to look at. So all they have to do is filter on those tags. You know, that’s that’s a very different and much easier workflow than the traditional export to a PDF or spreadsheet or export to the viewer format, email back over to them or Dropbox it over to them and then it could be quite a big file. And then try and explain to them how to load that data into the viewer and then try and point them at the right thing. And this is a much, much easier process. It’s as simple really as giving somebody that username and password. And you can already have predefined in their roles and permissions what they have access to. And what different functions they have access to within the tool. So I’m going to round off the webinar there.
There’s much, much more to this tool than I’ve had time to show you. So if you do have any questions or you’d like to see it in a bit more detail, please feel free to to get in touch with us at Oxygen Forensics, and we’ll be happy to do a more detailed demo or chat in more detail about your particular use case and see if it can fit that.
So thank you for taking the time to to listen to this and watch this webinar today.
Rick Barnard: My name is Rick Barnard. I’m a member of the leadership team here in the North Americas, and I’m hosting and moderating today’s global webinar, focused on the introduction of Nuix Neo version 1. 3.
I want to first address a few housekeeping items. First, there is questions and answers available. You can submit your questions via the “ask a Question tab”. We will answer all those questions at the end of the segment or at the end of the presentation. We welcome all your questions, so please submit them throughout the presentation.
At the end of the webinar, we’re going to take a moment to rate, or we hope you will take a moment to rate this presentation and provide feedback using the “survey tab”. A recording will be available of this presentation and share it with you and we welcome you to share this with your colleagues, after the conclusion of the webinar.
Discussing today’s agenda, we’ll start with quick welcome introductions. So welcome. Thank you for joining us. I’m joined with two of my colleagues who are going to lead the presentation.
First, James Sillman, who’s the director of product management here at Nuix and Stephen Stewart, who is the America’s Field Chief Technology Officer. They will be discussing Nuix Neo, going into detail in regards to version 1.3 release, specific features and capabilities and enhancements that are available, and detail how those relate specifically to investigations use cases.
Then we will conclude with a pathway for those of our existing customers that currently use Nuix to Nuix Neo and the options that are available. We will conclude with questions and answers. So, at this time, I’d like to proceed and turn it over to James Sillman.
James Silman: James. Thanks Rick. As Rick said, my name is James Sillman and I work here at Nuix on the product team. So, to give you a little bit background about myself, my background is in computer forensics and I primarily worked in the government and corporate space doing investigations. I was previously a Nuix customer before joining Nuix almost about eight years ago now.
So as Rick said, I just want to briefly introduce you to Nuix Neo, talk about what it is, and then I’ll talk to you about some of the exciting enhancements we have as part of this release.
So, what is Nuix Neo? Nuix Neo is our unified platform that helps organizations solve their most challenging data problems. A true end to end platform, at its heart, the world’s most powerful processing engine. Enterprise automation and AI built in that allows you to work faster, easier, and smarter. So, when we say faster, what do we really mean by that?
Customers today are limited by the number of workers they have. This causes constant trade-offs in priority and reduced delivery times. With Nuix Neo, we’ve eliminated this restriction with unlimited workers, so you can now process more data faster, reducing time to results.
Additionally, manual workflows. They are time consuming and error prone. I remember as a customer having to log on late at night or on my weekends to start to process new data, start OCR, or other activities. All so that we could maximize machine time and our workers.
Nuix Neo’s enterprise automation capabilities allow you to develop workflows that not only automate Nuix, but your entire enterprise. This ensures consistent, repeatable, defensible processing, minimizing machine downtime, and maximizing data throughput.
So, when we say easier, what do we really mean? Traditional tools, they’re very siloed in nature, you’ve only got one use of being able to access them at a time. Our web first approach and our collaboration tools are built into the platform, making it even easier for investigators to work together and collaborate and access Neo from anywhere.
Additionally, artificial intelligence is huge right now. Rightly so. It has the power to transform our workflows and make our lives easier. The challenge is it’s prohibitively expensive to hire data scientists and the tools that we have today that are available aren’t customizable and a bit of a black box. So, you don’t even know how it’s making the decisions.
With Neo, we’ve democratized artificial intelligence. Our no-code AI model builder allows anyone to develop their own AI models, within minutes, to suit their business needs and deploy them within seconds. That’s if the hundreds of models that ship out of the box don’t already fit your needs, each one of those is customizable.
Additionally, our AI allows you to understand how the decisions it’s making are done and so you can easily defend the results. And finally, Smarter. I’ve already mentioned that you can build your own AI models with our no-code UI or you can leverage the hundreds of models that come out of the box. Our AI allows you to prioritize what needs to be looked at first.
Allowing you to understand the types of documents you have in your data set. What are people talking about, what kind of PII or PCI you may have or why your risk is. So today you can easily check a box to find all your PDFs or emails. Now you can easily check a box to identify your contracts, your legal documents, or where people are talking about politics or terrorism. All of this enables you to work smarter.
The Nuix Neo platform is backed by the Nuix engine and allows you to bring in over a thousand file types of unstructured data. However, Neo is a true end to end platform, from data identification, collection, enrichment, and intelligence extraction through our AI, all through to review, all automated.
On top of our platform are our Tune solutions with data privacy and Investigations available today and Nuix Legal coming soon. Each one of these solutions is tuned to solve your use cases. We have additional extensions that enable you to augment a Neo platform depending on your organizational needs. Being able to identify what risky data exists on your user’s endpoint or machines. Our threat detection and data protection capabilities as part of adaptive security or extensible SDK for you to customize the Neo platform.
I’m hopeful with this brief overview you can see that the Nuix Neo allows you to drive to outcomes faster. Pushing your data through, regardless of where it lives, through our engine at scale, and using AI to understand your data and extract vital intelligence to cut through the noise. All automated and all ready for review.
So, I want to share with you some of the exciting updates that we’ve made as part of this latest version of Neo. We’re going to talk about our knowledge graph and how it’s redefining data analysis. How our enterprise automation can reduce your backlogs. How our solution packs reduce time to results. Enhancements to our connectors ensuring you can get access to your data anywhere.
We’ve expanded our forensic ecosystem. We’ve got new UK and Australian AI models to help to extract intelligence easier. And finally, I’ll talk to some of the existing Neo capabilities that we have that massively improved the lives of our customers.
So, one of the most exciting aspects of this release is our knowledge graph. So, as we know, relationships are key in any investigation. Questions like, how are these two people connected? What connections to these individuals have in common? That’s where our knowledge graph really shines. It allows you to uncover hidden patterns and connections in your data that traditional investigative methods might miss.
Our knowledge graph empowers you to extract meaningful insights, identify trends and make smarter decisions faster. The intuitive user interface allows even non-technical individuals to do deep data analysis and ask complex questions. So, they always say, a picture’s worth a thousand words. So, a seemingly simple question, how are these two nodes connected?
Traditionally, this would be a painstakingly manual process, having to do search after search while trying to uncover a path between those two nodes. What happens when you uncover a new person of interest? Now you just start again. This process is time consuming and doesn’t really work well with being able to quickly iterate on different questions that you might have as part of an ongoing investigation.
Using our knowledge graph, we can select those two nodes, simply right click and choose shortest path. Within seconds, we’ve uncovered not only that these two entities are connected, but by what means. Now, can you imagine having to do this manually when there’s multiple degrees of separation?
Another example, can you spot the hidden patterns in this data? Again, a simple right click using the All Neighbours algorithm, and we’ve managed to cut through the noise, and you can start to see patterns in the data. For those familiar with graph databases, you may be aware that traditionally, for you to do what I just showed you, the data all needs to be pre-processed ahead of time, with all those relationships defined up front.
Traditionally, this is done in a spreadsheet. It’s extremely time consuming and expensive because it’s a manual process. With Neo’s Knowledge Graph, we’re leveraging the power of the engine and what it does best, taking the unstructured data and then normalizing it. We then layer on top our AI, which helps extract that intelligence and enrich that data.
All that data is then loaded into the graph, all automated, no manual work up front, and all tuned to solve your specific use cases. Our Knowledge Graph works seamlessly with Neo, being able to quickly send results back and forth, providing you with traditional review capabilities of the Nuix platform, enhanced with the power and visualizations of the Knowledge Graph.
So now you can take any data and uncover hidden relationships that would otherwise be missed. Another exciting enhancement as part of this release is a monumental leap that we’ve made in our automation capabilities. We know that data volumes today are growing in variety and variability and are increasing at an exponential rate.
We know that there’s an increased pressure for organizations to deliver results faster, driven by the advantages and technology and the 24-hour news cycle. We know that organizations are being asked to do more with less and the skilled labour shortage is causing a backlog of cases. Our enterprise automation allows you to automate hundreds of steps, but not just across Nuix, across your entire enterprise, allowing you to collect, process, and push data review all automated.
Our enterprise automation comes with granular security controls that ensure consistency by preventing incorrect usage, predefined out of the box workflows to reduce time to value. All customizable so you can fit it with your business needs. Easily extensible scripting capabilities and the ability to hook into your enterprise system to make more informed decisions.
And finally, the ability to manage cloud resources. Being able to spin up and spin down machines in the cloud to take advantage of those unlimited workers and to cut through your backlog.
One of the guiding principles we’ve had since the inception of Neo is how do we reduce time to value and time to answers? You’ve already seen this with how Neo is faster, easier and smarter. But as part of that guiding principle, we’ve developed what we call solution packs. Every Neo solution comes with a solution pack tuned to those specific use cases.
These solution packs are made up of three main areas. We’ve distilled best practices when it comes to processing, automation, and review to take the guesswork out of it. These include things like default processing profiles, search filters, metadata profiles, dashboards, and more.
Each solution pack also comes with AI models designed to identify data relevant to those use cases. This is in addition to the hundreds that already ship about the box. And finally, each pack comes with automated graph analysis playbooks that leverage the power of the knowledge graph to help you identify hidden connections in what matters most in your use cases.
Each of these solution packs is fully customizable and provides a great springboard and allows you to just add data. More and more of our customers data is moving to the cloud and the cloud makes deployment and management of enterprise applications seamless and easy. The challenge is these services don’t make it easy for you to export your data.
Not only are the tools provided complex leading to mistakes. Exporting data and especially in large volumes is time consuming. That’s why we’ve continued to enhance our connectors to take advantage of the new capabilities released by these vendors. With each of these connectors, we’ve meticulously optimized them, ensuring each one is simple, intuitive, and performant, so you can get access to your data faster.
I’ll talk more about Microsoft 365 shortly, but with both Slack and Gmail, we’ve implemented Search Ahead. So now you can reduce the volume of data that you need to explore and review, saving you time and costs. And these are just really a subset of the connectors available within the Nuix Neo platform.
We know from speaking with our customers that the collection of Microsoft 365 is paramount. We know that collection of this data, be that emails, Teams data, SharePoint, or OneDrive is complex. Microsoft offers multiple ways to collect the data. There’s the graph, there’s Microsoft Purview, as well as traditional PST exports. It’s time consuming.
Manual collection has multiple steps, each requiring the user to sit and wait while each stage completes, moving on to the next. And error prone. These tools are confusing and lead to mistakes that can cause huge amounts of rework. All of this is in addition to the Microsoft continuously making changes to their platform.
Other tools provide you with a single way to collect data from Microsoft 365. Not great if that doesn’t align with your organizational needs or security requirements. Neo offers a comprehensive set of capabilities to collect Microsoft 365 data that can meet your organization’s needs. We’ve automated the collection as well to reduce human time and errors.
We’ve expanded our forensic ecosystem. We know customers today use other tools as part of their workflows, especially in mobile forensics. Once you’ve collected data in these various tools, our customers run into several challenges. You can’t search across your disparate data sets, which results in missed connections.
You can’t dedupe across tools, resulting in additional review and lost time. You can’t easily collaborate on complex investigations in these siloed tools. That’s why customers love the holistic view they get from Nuix. The ability to bring in data regardless of source. You can search, dedupe and collaborate all in one platform.
In this release, we’ve enhanced our support for ExpertWitness format, the latest version of Oxygen Mobile Forensics. You can also bring in your existing Magnet cases, including any work that you’ve done, such as tagging, into Nuix. This is on top of our existing support for tools like Cellebrite, MSAB and Hancom.
Being able to extract intelligence from your data set is critical. We know that. We also know that traditional ways of identifying PII or PCI, such as rejects, are fraught with issues. It’s complicated writing. Rejects is highly technical and error prone. This leads to lots of false positives or worse false negatives where you miss critical pieces of data.
By leveraging the AI capabilities in Nuix Neo. We’re able to reduce false positives as Nuix NLP understands the data. So rather than the system blindly assuming that any nine-digit number is a social security number, our language models understand the context. Is this a nine-digit number near someone’s name inside of a pay check? Probably a social security number. Is this nine digit number in a Guardian magazine? Probably not.
This reduces the time spent on review as well as your overall risk. So, in this latest release, we’ve developed new models to identify PAI, PCI and banking data specifically for the UK and Australian markets. This is in addition to the hundreds of models we ship with today, each one customizable in our no-code UI with the ability to understand how AI makes its decisions.
Finally, I just want to touch on some of the key features of the Nuix Neo platform that we’re seeing transform our customers workflows.
I’ve already touched on it, but our AI enabled workflows, each solution comes with equipped with hundreds of models out of the box, including ones that are custom tuned for data privacy, investigations, and soon legal. This allows our customers to drastically reduce the time it takes to identify imported document types, topic types, risky and risky documents as part of their data set.
Search while loading. So today you have to wait until tasks such as. Processing or OCR have finished to start to review your data. It slows down time to results, but also forces customers to start to complicate their workflow by creating subcases so they can continue to review their existing data, while data is processing.
I know that when I was a customer, I remember having to continuously create subcases and having to move those so that could be added to compound cases. This really complicates workflows and causes extra burden on administrators. With search while loading, you no longer need to wait until tasks like processing or OCR have finished to start to look at your data.
While it’s only available as part of Classic, now every Neo customer can review their data while it’s still processing.
We’ve made it even easier to onboard users to the Nuix platform with single sign on across Neo. Not only can you quickly switch between capabilities within the platform, but our enterprise authentication also makes it easier to hook into your organization’s existing security infrastructure.
We’ve added RSMF support with the increase in chat data and the complexity ease it brings; we’ve added support for RSMF that enables you to seamlessly bring in and export chat data. This enables for the interoperability between Neo and other platforms. And finally, Enhanced Promoter Discover makes it even easier to move data from ECA all the way to review with all the control you need, all automated.
Overall, these are just a small subset of capabilities that we’ve introduced since the launch of Nuix Neo. I’m excited to hand over now to Stephen Stewart, who’s our field CTO, and he’s going to talk to you about our NEO our NEO investigation solution.
Stephen Stewart: Thanks, James. I’m excited for the opportunity to talk to everybody today. Again, James, that was an excellent overview of, of Neo and how we’re thinking in a broader sense about the Neo platform. In a really grounded a couple of fundamentals, around faster, smarter, easier and solutions.
To get us started, I’m sure, many of you know me, my name is Stephen Stewart. I am Nuix’s chief technology officer or field chief technology officer. I’ve been around Nuix for close to 16 years at this point. I’ve really have spent my entire professional career in the document management, archiving, discovery, investigations, and big data space.
Really what I find so interesting about the culmination of all of those different experiences. is it all comes down to getting to know and understand the data. In almost every fashion, you’re basically conducting a large-scale investigation and that large scale investigation really is about how do you answer those fundamental questions of who, what, where, how, when, and why.
As you start with those basic questions, and you expand that out into all of the disparate data types that our investigators are confronted with on a daily basis, across the thousand different file types and the 10 dimensions of data and all those different things, the goal is the same.
You’re really very much trying to just understand what happened. Pull those pieces together and really get a true sense as to exactly what happened when and where. That’s no more prevalent than the amazing work, that all of our customers and partners are doing out there in the wild.
I take tremendous pride at being at Nuix over the years and understanding our software’s relationship to tremendous events that have made the national stage. It’s interesting to be asked questions and say one year and then several years later, understand the nature of that question.
In the top left, mob storms the capital, that’s January 6th here in the United States. Some number of months after that, I was asked a super random question that came out of nowhere. “Hey, we’re trying to find flags in some data that we have. Do you have anything that will help with image classification?”
The answer was, yes, we have an image classifier, it does these specific things, here’s how you might be able to use it and they were like, okay, great. And then that was all we heard. Then come back a year later and they’re like, “Oh yeah, we use the engine and the image classifier”. We were looking for flags and it was this great experience, it cut us from like 6 million images down to like 100, 000 that we had to look through and it was fantastic.
Similarly in the upper right operation Ironside, a global law enforcement engagement. Several years before it was made public, I had a random question about, “Hey, what if I put 12, 000 mobile phones into the Nuix investigate canvas. What would happen when it basically all of a sudden would it tip over? Would it work?” And I was like, I don’t know, let’s try it. Let’s see how we can do these. We can change these things.
Lo and behold, Operation Ironside comes out and it’s contextualized as to what the types of problems that our customers are trying to solve. The reality is it’s out there and you guys are working with tremendous volumes of data trying to solve that. At the end of the day, it comes down to crunching through some numbers that the scale and complexity of large investigations is increasing. Not only is it the size and scale of the investigations, but it’s the dollars that are involved.
The UK actually estimates 219 million billion dollars could be lost associated with fraud. That’s just a staggering number and that’s just within the UK. In the US it’s anticipated to be closer to $364 billion. So, with that in mind, those are just staggering. That’s nearly half a trillion dollars or more than half a trillion dollars that could have been lost associated with fraud.
The problem is, is that it’s very difficult to investigate. Some research says that there’s up to 25, 000 plus devices in the backlog. Again, this is the basics of how do you hit the singles to accelerate the investigative process? How do I allow organizations to work through their backlog more quickly and more efficiently?
It’s certainly not available to hire another 10, 000 investigators to look at this. You’ve got to come up with ways in which you can work faster, smarter and easier. Not to mention the idea that within that there’s such a small amount that’s actually being investigated. You take that backlog; you take the small percentage relative to something like the 40% of number of matters or crimes that are needing to be investigated.
We’re facing a global investigative crisis and we’re basically falling further and further behind. So, in order to start to think about that you have to think about how do you scale and scope and approach your investigations. High volume, low complexity in one corner, low volume, low complexity in the other. And then in the upper right, you’ve got the high volume, high complexity.
High volume, high complexity are your most difficult ones. There’s a lot of them that you have to work through. You’re dealing with a lot of devices and a lot of items. But what is that high volume, high complexity? Is that only the top tier serious and organized crime, the multinational aspects, all of these other elements, what is it that characterizes that?
So, when you think about it, you could actually roll back the clock on one of the very first and largest investigations. In law in UK law enforcement into basically one of the earlier serial killers, Peter Sutcliffe, the data was incredibly complex.
There were over 250,000 people were interviewed, 32,000 witness state statements taken and 5. 2 million car registrations checked. As that investigation took place, they generated so much paper that they had to reinforce the floor. Now, in every matter, that’s a hugely complex investigation.
There were hundreds of people working on it over the years, and a tremendous volume of effort went into trying to figure that out. If you now roll that forwards and start to think about basically a single mobile device. The one that each of us probably has in our pocket, maybe more. As I sit here doing this presentation. I’ve got my mobile phone on my desk. I’ve got one laptop in front of me. I’ve got all of the other laptops around my house.
Investigating a single suspect could lead to hundreds of millions of messages all contained within a single household. So now everything is basically a high volume in terms of data quantities and high complexity because you’re trying to understand how all of this stuff interrelates. You need to be able to feed it into a system that will allow you to do index processes, extract that information and get to those outcomes as quickly as possible.
We take that step back and we think about Nuix Neo and solutions this industry has known Nuix for decades as one of the leaders in digital investigations. Really the sweet spot of where Nuix excels is around the complexity of being able to pull lots of different data, disparate data sets together.
James in the feature overview of Neo mentioned being able to pull in data from Magnet cases or eyewitness cases or other auction mobile forensics. The ability to aggregate and roll up data collected from all different sources, Purview, Google, mobile phone devices, you have the arts, etc. To be able to present it into a single view is really what differentiates Nuix and the Nuix Investigation Solution. Going from labs, smart labs all the way up.
But when you think about life cycle of a typical investigation. How do I start to think about reducing that time window? You see the longest bar there, the DF, digital forensics and investigation. That is basically the long pole in the tent.
How do I crush that data, understand it, handle it in a consistent, repeatable and defensible process, and again, then move it downstream into the legal? Within that, there’s a huge amount of repetitive process and the key thing is, how do I shorten those windows? And you can shorten those windows pretty easily.
You can shorten those windows with operating smarter. So, basically use AI to be able to find the answers more quickly, faster, by being able to run horizontally more operations, and easier by being able to automate a consistent, repeatable and defensible process that allows you to ensure that all of the work you’re doing across all of your data is well understood and easy to manage.
So, with that, you take that idea of what is available and then what is being done by organizations are out there. Police Scotland are great testament of what they can do and how automation can dramatically improve the process. They took this upon themselves and built their automation from the ground up to be able to quickly drive that process.
They’re now looking at all of the other ways in which they can expand that automation and take it to the next level in and around, adding AI into their workflows and other things.
Vodafone on the other end went for no backlog with 99 percent processing. This way, these organizations are essentially chewing through that device backlog, that we heard about a few minutes ago, of about 25k devices. Starting to work through the automation, making it faster, smarter, easier. Is really at the heart of what Neo is designed to offer. And so, when we talk about Nuix Neo, specifically Nuix Neo Investigations, it’s about “how are we thinking about taking these investigations to the next level?”
People have used the Workstation and Investigate, and they have built labs, and they’ve scaled it out and managed it. So, with Neo, it’s about bringing all of those components together into a single unified platform that is basically enabling automated case specific workflows. Those automated case specific workflows are very much about how can I take every investigative bit of intuition that each of your investigators has and memorialize that into a specific workflow step?
How can I then repeat that every single time such as when it’s tagged and processed, etc? How can I then scale that out horizontally so that I’m going faster, not just per unit, but faster across all of my horizontal resources to basically drive that. AI Powered language models to be able to get you to that answer more quickly and then that last bit around greater insights through the link analysis.
At the end of the day, it all comes down to where James started about being able to take huge volumes of information and drive that through to data insight as quickly as possible.
Those data insights are so much more than just a search and tag. They are how can I understand the hidden relationships? How can I prioritize what I need to look at first by doing a risk scoring associated with those items? Pushing that as part of the Neo platform and Investigations is really what we’re driving here today.
When I think about what does it mean around driving this from the bottom up with investigations, it’s how can I think about taking the Nuix Neo Investigative Solution and thinking about larger, more complex investigations and all the people that need that. We’re talking about public and private sector. We’re talking about regulatory investigations. We’re talking about fraud and a national or global scale. We’re talking about serious and organized crime.
In each one of these individual layers, they are building up to deliver elements of fraud. If you can start to understand how fraud and how that money flows across all of these and the hidden relationships, you really can start to see how the overall Nuix Neo platform and our specific focus on fraud for investigations can drive tremendous acceleration across all of your investigations.
So, back to the faster, smarter and easier, and we really reiterate this because by being faster, smarter and easier, we think we can provide tremendous value. That tremendous value is pretty simple. How can I get more data through more quickly? How can I work through my backlog? How can I start to handle data in real time?
So, we start to think about some of the added capabilities around. It’s the real time analysis. So, many of your use cases may deal with collected or static evidence. Others of you may want to be able to stream license plate reading information, other financial transactions or other third-party information into the system so that it can be correlated.
One of the most exciting things about Nuix Neo investigations in the knowledge graph is being able to look for hidden relationships beyond just the evidence, but being able to start to think about leveraging reference data that can then further inform your investigation. Ask those simple questions about how far away is this from a known bad transaction or a known bad actor?
All about making things faster, and easier. When we think about easier, easier is all about getting through the work with less friction. So how do I make it easier? I make it easier by automating the process. As I’ve said, I’ve been at Nuix for 16 years and in those 16 years, I’ve pushed a lot of buttons on workstation.
I’ve also done by hand some of our gigantic investigations and supporting different financial institutions. The largest one was 340 terabytes of legacy email data that resulted in, I think it was, 3. 2 billion emails. The reality is that took an inordinate amount of button pushing and that button pushing, even though we automated specific elements, required us to try to run amongst us almost for twenty-four by seven clock for, I think, 45 days. That was brutal.
I now look at what we’re doing with automation and the ability to report across that process, automate those workflows and drive that. Let’s just say, I could have gotten a lot more sleep over those 45 days if I’d had Nuix Neo today.
Making that process easier and focusing on things like how can I leverage all of my internal subject matter expertise, as opposed to always having to go outside, I can start to do things like build AI models using our no-code user experience. I can interoperate with a seamless web-based user experience that I can single sign on across my various things, kick off my jobs, get email notifications. All of these things are about making this process much easier.
The last bit is around smarter, so faster, smarter and easier. When I think about smarter, obviously you want to work smarter, so things like automation, scaling that horizontally to get more done more quickly, but then that ultimately lends yourself to how can I use AI to point me to the items that matter most?
So, depending on your use case, here today we’re talking about investigations, we focused a lot of our time and attention around fraud. Things like identifying financial transactions, looking for financial documents, ledgers, all of those types of things, and basically being able to effectively pre-tag those.
If you think about the experience of your most seasoned investigators, they probably have a pattern that almost in every single investigation or an investigation of a specific type, be it fraud, be it serious organized crime, they know what they look for every single time and that’s where they start.
What this allows us to do is take that intelligence and wisdom and start to build models that can basically pre-tag, pre-tune and push that information forward as you start to think about it. When you start to understand and elevate those scores. Again, that idea of faster, smarter and easier.
What drives that is really the enrichment. So, when we think about the workflow, the data rolls through the Nuix engine, text and metadata are extracted. It’s been pushed back into the Nuix case when it’s pushed back into the Nuix case, it becomes searchable across all of the Nuix aspects. So, you can search it from workstation. The engine Investigates via the API. You could promote that information out to Discover and you can promote it out to other third parties.
So, by enriching those individual items with classifications, categorizations, facts that have been extracted, prioritize around risk scoring, and then help you visualize this it really allows you to work much smarter.
One of the elements is the classification. So, what is this document? Is this a financial ledger? Is this a tax document? Is it a credit card agreement? We had someone yesterday asking us about trying to be able to use these classifications to rule out garbage. They were tired of looking at help files.
So, they said can you build a model for that? Sure. That’s really easy and one of the most important things to make it easier is the no-code model builder. The no-code model builder allows your subject matter expertise to take, tune, and even augment our existing models, but then easily build new models from scratch.
Nuix’s and Nuix Neo is all about pushing the AI as far to the left or as far forward in the process as possible. A lot of the other investigative platforms out there will push the analytics to the right. And they’re purely reliant on using analytics to accelerate how you review documents.
We’re using AI to accelerate what you look at first by telling you what it is. Each one of these elements is stackable, so they basically accumulate value as you go through.
If I talk about, what is a document? And then I ask the question, what is this document about? Is it about sports? Is it about politics? Is it about terrorism? You can then use these layers to start to pull different information together and start to think about how can I take and tune. So, as it relates to things, within the Investigation Solution packs from a categorization perspective, it’s looking for things like pressure, rationalization and opportunity.
So, how can I start to understand and categorize data based off human language, and it’s not just keywords. It’s using the vectors in the contextual information to really start to understand how that information works. You then take that next step and you start to think about extraction. So how can I extract this information in a meaningful fashion?
This is what James was talking about. The idea that regular expressions and false positives are worse than false negatives. Anybody that’s ever tried to create a regular expression for a social security number and doesn’t want to miss anything. That’s pretty dicey. You can basically pull in every single nine-digit number out there.
But what you can’t do is contextualize it. Well, at least until now, with Nuix’s cognitive expressions. What this allows you to do is take a whole bunch of regular expressions, or easily add in your own. Some of the work that we’ve done with the release of Neo 1.3 is we’ve started to dial in those regular expressions and cognitive expressions for the UK and Australian markets.
We started with the US market, so we have a robust base, and we’re now augmenting them around different things like UK driver’s license or Australian identification cards to ground them.
The value of the cognitive expression, which is the other part of the fact extraction piece, is how do I then say, what contextual information around this lets me know that this is a social security number? Is it something like SSN, or is it national ID, or what are these other aspects?
You then take that around things like credit card information. So, you’re looking for security ID, you’re looking for CCV, you’re looking for other things around that that contextualize. This contextualization and this fact extraction is really what drives our knowledge graph expertise.
For years, we’ve been talking about how to do this better. Through the addition of NLP and our context aware entities, we are really able to nail a highly reliable and curated knowledge graph. So, I’m going take that, I can understand what the document is, I can categorize it, I can extract the information out of it.
The next thing is how do I start to think about prioritizing. Prioritization goes to things like risk scoring.
So, in each one of those instances, let’s take, for example, a US tax document. A US tax form, let’s say a blank W2. It’s not that interesting, but I would like to know what it is. If I have a populated, that means W2 form that has PII and PHI. That is more interesting, that is more potentially risky.
If I have a PDF file that has a hundred pages of W2 forms and a hundred people’s PII, that is incredibly interesting, and I want to prioritize that by telling you to look at that first, and I can use the combination of scoring with NLP as well as Classic search and tag to basically be able to bubble that information up to the top.
This idea of really being able to drive that and push it forward to understand what needs to be looked at first. We’ve put this into the investigation solution for Neo, but we also have a framework that allows you to take your own explicit knowledge and augment it in a really, really simple and easy fashion.
As we start to go through the multidimensional text analysis, is then how do you pull this stuff out? That idea of running all of these systems under the hood, to really then drive it to, hey, look at me first.
So, when we get to the, hey, look at me first, we’ve actually built in logic that applies this, bubbles this stuff up to the top. So, as your investigators land in Nuix in the the web application, the Investigate, they can immediately go and look at this stuff first.
So, taking our knowledge, our experience, working with subject matter experts in the industry to really help and drive that prioritization, delivering dashboards that allow you to accelerate and understand that information and most importantly, prioritize.
When we thought about that fraud timeline from a couple slides ago, it’s all about how can I compress time? Not only how can I process the data faster, but once I process that, what can I do to it to enrich it to basically point the investigators to look here first. Instead of having to start from A and go to Z, they can basically jump right to M where the most relevant information may lie.
They’re then welcome to go back from A through M, if we can point them to what they need first, through AI automation in a consistent, repeatable, defensible fashion with all of the other aspects around explainability, specificity and transparency in our AI. These are all really valuable elements that help you get to that answer more quickly.
The last piece, and James touched on this, is really how you start to think about understanding and visualizing this information. James touched on the knowledge graph. The knowledge graph, for me, is an incredibly exciting advancement in our technology. In practice, we’ve tried it in the past using things like regular expressions to drive a knowledge graph. Or we’ve created a knowledge graph based purely on communications data so To, From, Cc, Bcc.
The reality is they were okay, but the question from everyone in the field is how do I correlate and understand the relationships across all of this data? How do I link email communications to financial transactions to a list of people and known bank accounts? How do I put that all in one system, and basically say how is Stephen Stewart related to Keyser Soze? My favourite fictional bad guy.
Before, as James said when he was talking about the features, that’s really, really difficult. It’s a lot of interrogation of information. It’s a lot of repetitive searches. It’s a lot of redoing documents.
Wouldn’t it be better if I could basically feed all of that information into the Nuix Neo engine. Extract that information, normalize it and then create a highly curated knowledge graph. That highly curated knowledge graph has well defined nodes and edges. A well-defined schema that can be targeted directly from the engine or targeted through NLP, so that you can start to build that information up and really start to visualize how this information is connected.
So, on the screen, we’ve got a screenshot of looking at Bitcoin and asking how Bitcoin is connected to open-source intelligent information. This is really about how can I do this without creating the gigantic hairball that many of you would have seen if you’ve tried to use graph technology. It’s just hasn’t been smart enough until today.
But now, with Nuix Neo Investigations, we’ve got the smarts, we’ve got the engine that could extract that text and metadata. We’ve got the schema and the playbooks. That allows us to define and drive a really accurate and highly curated knowledge graph.
When we think about a little bit of a recap, within the investigation solution, the idea is to make it so that you can just add data. We’ve got the solution pack that includes targeted investigative NLP models, targeted things for extraction of PII and PHI, metadata profiles, processing profiles, workflow, automation, steps, visualizations within and dashboards within Investigate.
Then the playbook that basically allows you to drive the customized linking associated with the knowledge graph. So really powerful, powerful ways that you can take it out of the box, or you can tailor it to your specific data in your specific use case. Really take this solution framework to the next level and beyond.
It’s all rooted in our core tenants of Nuix Neo. It’s really trying to make it your work faster, easier and smarter, because ultimately, the challenge that you’re facing is backlogs. Pressure, time, resource, the desire to actually get home and see your family. All of those things can be facilitated through better automation, easier user experience, how can I get you to those answers more quickly?
Whether you’re in the lab trying to process that data and get it prepped, or you’re the investigator trying to answer time critical and sensitive investigative requirements. It’s how can we get the right data, in front of the right people, as fast as conceivably possible.
So, with that we really are on the journey. The people that are here today, you guys are definitely on the journey to Neo with us. You’ve been a part of the Nuix Neo and Nuix investigations and how can we think about taking you from where you are today into what we think is an amazing way, to accelerate your outcomes and really ultimately get to those answers much more quickly.
With that, I’d like to turn it over to my colleague, Rick Bernard, to take you through the pathways to NEO.
Rick Barnard: Stephen, thank you so much. Also, James, appreciate it. Great discussion and overview. I’m going to conclude the discussion with options for any customer, including existing customers, on our journey together, to NEO. Then we’ll open it up for Q&A. So, if you’ve had any questions throughout this presentation, take the time now to submit your questions, and we will address them in just a couple short minutes.
I’ll start with the bottom. So, that’s the foundation that we’re building on. Nuix, that you’ve been using for decades, our Classic capabilities. You can continue using those. The Nuix engine, you can build around it, you can augment with, third party products or automation with Nuix automation and EVA to enhance the workflow end to end, from collections to processing, all the way to review.
There’s a lot of enhancements and innovation that we can build around our traditional components and use that as a pathway to build upon to enhance your workflows, your playbooks, your visualizations, your reporting, and build on that to bring enhancements on our journey together. The second option, that we’ll build off of that, is what we’ll call the pathway to Nuix Neo, which, is to extend and enhance.
That gives you unlimited cores, it boosts your productivity, we start building templates and playbooks together. Some of the same elements that I just talked about with extensions from Legal Hold to Purview collections. That gives you access to Nuix’s tailored Advantage offering which is a customer success journey to allow you to unlock the full capabilities of Nuix Neo platform as we move to more usage-based licensing model, also available with Nuix Neo.
The third option is to go to Nuix Neo, which a lot of our customers are doing. You’ve got tailored solutions, whether it be data privacy, including data breach notification or the investigation solution that we talked about today, but it’s really focused on addressing our toughest data challenges in a customized configured solution. It has all the configured workflows, all of the customized NLP models and user profiles to really unlock the power of the Neo platform.
We’ve talked a lot about that, in the last 40 minutes, in terms of all the benefits that it provides and all the features it provides. Unlimited cores for processing and usage-based licensing.
Whether you’re ready for Nuix Neo or you want to build a pathway to Neo, we have lots of different options for for customers to consider. So, with that, I think I will transition to the Q&A session. We’re going to move to a panel view. With all 3 of us showcased here and so I just want to open it up for a few questions.
Hopefully, Stephen and James have been able to catch a quick drink to answer some questions.
We’ve got a number of questions that have just come in and a few that came in throughout the presentation. I just wanted to highlight some of those specifically.
We’ve obviously been focused a lot on investigations today. One quick NLP question. One question that came in was around the customizations that have been built into the investigation solution with NLP. I’ll direct this to James. James, can you touch on some of the entities, topics, document NLP? That have been configured as an NLP model for the Investigation solution.
James Silman: Yeah, absolutely. We were discussing earlier as part of that, that solution pack that comes with the investigation solution. We’ve developed a number of models. Those are things like looking for crypto value, transactions, crypto transactions, crypto amounts.
We’re looking for other things like financial transactions as well, financial documents. Looking for indicators of fraud inside of documents or inside of emails or chat messages. Traditional opportunity, pressure and rationalization. Looking for those types of attributes inside of that data. Looking to see if we can start to get ahead of someone potentially committing a fraud itself.
Looking for things like the 419 scam, phishing emails and a variety of other different things. Aspects that are all associated with fraud. So those ones are specifically for the investigation solution and sit on top of the hundreds of out of the box models.
I know I mentioned it a few times, mainly because I’m super passionate about it, but our no-code UI model builder allows you to build models. You can drag files from your desktop into a browser, you can build your own model. You know your data best, so we can help you on that journey with creating your own models, or you can take the models that we built for you and tune them to your specific use cases.
Rick Barnard: Fantastic. Thanks, James. The other thing we talked a little bit about was the unlimited processing capacity that Nuix Neo provides and the firepower that that enables to process data faster. Stephen, can you talk a little bit about how that coupled with search while loading works and how that translates into a lot of benefits and value for customers investigations.
Stephen Stewart: Yeah, absolutely. Thanks, Rick. Rick touched on a couple of things. Prior to Neo, if you had unlimited workers, you were on your own to automate their usage and so, it’s a little bit of a slippery slope.
With Neo and the automation layer, it is incredibly easy to add resources to a pool and then have Nuix Neo’s automation layers, automate the entire process and orchestrate that process across however many machines you may have. Including do things like an AWS or Azure dynamically spin up new EC2 widths. So, you really have the ability to take advantage of the tremendous horizontal scale that’s available with the unlimited workers.
From there you then think about what other advancements, and I don’t think we’ve really touched on it, is this idea of searching while loading. So, since 7.0 of the engine, when we started offering ElasticSearch as a back in. That was the first opportunity that you could be interacting with a case while data was being processed. Processing in this context I’m speaking about things that use worker activity. Loading data, OCR-ing, exporting, imaging, all those types of things.
With Neo, there’s actually something called the Derby service, and what that Derby service does is it allows you to share and have simultaneous access to an Apache Derby Lucene case while worker operations are taking place.
So, a long-winded way of saying one of the biggest challenges that we have had feedback over, needing to be able to do two things at once on the same case, is now a thing of the past. With Neo, I can be loading data, I can scale it out horizontally, and I can be searching it while that data is being loaded.
When we talk about time to value and reducing the overall amount of time, you no longer have to wait for stuff to happen. You can also do things like do a light metadata scan and then enrich that case. There are lots of ways in which you can leverage the Neo stack to really get to those answers quicker.
Rick Barnard: Awesome Stephen. We’ve been focused on investigations in this discussion. That is one of the solutions or modules available with Nuix Neo. This question is both for James and Stephen in terms of what’s next? What’s the next solution that Nuix system you’re releasing and how does that layer on top of Nuix as a platform and unlocking these different use cases on the same platform.
James Silman: Great question. I’ll take this one first and I’ll let Stephen follow up. To answer the first part of the question, what’s next? the Legal Solution is next. In terms of what does that timeline look like? We are starting to look for early adopters now and that releases within the next few months. So, it is very soon when we say soon.
In terms of how does that work? As we continue to enhance the Neo platform, which serves as a base for all of our solutions, enhancements to the platform bubble up into each of those solutions.
As we add new file types, new connectors, new capabilities, those all manifest themselves in all of the solutions. Then on top of those solutions, we’re layering those solution packs that help identify the different aspects of those use cases which pretty the investigation solution. We’ve got the knowledge graph being able to drive investigations there. Stephen, do you want to follow up on that?
Stephen Stewart: I think one of the key things about the Neo platform is it is a platform that allows us to start to think extensively. If you think about the funnel slide that both James and I showed, it starts with a lot of data and it winds up with insights.
That is a pattern that can be applied to lots of different things. We started with data privacy and releasing investigations. The next is Legal. You can easily see things like consumer complaints or other general-purpose data and analytics processes. So, for Nuix and the journey through Neo, it’s all about getting answers out of your data in a smart and intelligent fashion.
Rick Barnard: We’ve had so many questions that have just come in. I don’t think we’re going to be able to get through all of them as we’re trying to wrap up. So, I think we’ll just take one more question and I just wanted to make everybody on the call aware of how they can get all their questions answered.
You can reach out to me. My name is Rick Barnard. My email is rick.barnard@nuix.com, or you can reach out to your account exec that you currently work with. I just wanted to wrap up with one last question as we reach the top of the hour. Obviously, investigations are very complex. We integrate with lots of different solutions to make that possible.
Just wanted to open up to James or Stephen in terms of what third party integrations you think are super valuable, that we haven’t talked about. That support image analysis, translation or transcription services that are really supporting the most complex investigations that we’ve been talking about today.
James Silman: In terms of services that we haven’t touched on that for investigation specifically, we’ve got several different partnerships. Passware, we partnered with them for a while now, that helps us to decrypt disk images. We’ve got a partnership with T3K, that’s image analysis capability. Partnering with Veritone to provide transcriptions and translations. We’re looking further ahead as well.
We know that media analysis is critical to the investigations and we’ve got some of that, and Stephen’s already touched on that. But we are looking ahead to see how we can more tightly integrate that into the new ecosystem. So regardless of whatever data you bring into our platform, regardless of where it comes from, disc images, Cellebrite data, Magnet cases, being able to search across that in a holistic way is the direction that we’re going.
Stephen Stewart: James, you’ve picked up on the main ones as it relates to an extending investigation. I think in terms of always being able to define a pattern that allows us to handle the what’s next. So, by being able to think about handling the what’s next, either in real time through feeding the data in by Kafka.
Basically, containerising and packaging it into a Nuix logical image file format. The platform itself allows us to consume this information, enrich the items within the Nuix case. Then once it’s enriched inside that Nuix case, it’s been available to all of the other services that are part of the Nuix ecosystem.
Send it downstream, push it out to Discover, access it from Investigate, from Workstation, through the APIs, and build custom reports on top of it. So, for me, the heart of the investigative ecosystem really becomes the Nuix case.
I saw one question pop through, the Nuix case is a self-contained database, it doesn’t rely on a third party external for someone who is new to the Nuix world. But again, it’s just about being able to build on top of the Nuix platform and shove as much data into it as you can to make it searchable and basically drive outcomes and insights.
Rick Barnard: Stephen, one last question. You touched on Kafka. Can you talk about some of the emerging real time investigation use cases or just real time processing that can be enabled with the Nuix platform using Kafka and those types of Services?
Stephen Stewart: So, it’s really just another entry way of getting data into to Nuix.
Some of the advancements in Neo around being able to connect to Kafka means that we can consume JSON, consume web links, and reach out to those URLs. We can also sit and listen to financial transactions. Obviously, you have to tell them about us. It’s not like going in the back door or look for license plate scans.
Essentially, any data structure that allows us to connect Kafka to it can now be flowed into the Nuix engine. We’ve had use cases around customer complaints, where organizations are trying to track different web-based systems.
They’re not necessarily traditional emails, but they have an element of what it feels like a communication and flow this in. The great thing is once you flow that data into the Nuix engine, you can investigate it alongside all of your other traditional data. Your team’s data, your email data, your mobile digital forensics data.
The whole idea of Kafka is being able to add a new, less batch mindset to how you get data in and start to open the door to new use cases that we may not even have seen yet. That idea of it’s not just a batch of data that sits on a share, but being able to interact in real time with databases and being able to drive that data in allows for tremendous power and correlation.
Again, you take all of those elements around analytics for classification, categorization and fact extraction. Layer that with search, but now the knowledge graph, you have a huge opportunity to start to understand hidden relationships across all of your investigative data, maybe you’re conducting enterprise investigations, law enforcement style investigation.
All of a sudden, what you can do with the system elevates.
Rick Barnard: Fantastic. Well, James, Stephen, thank you so much and thank you all for, for attending today’s webinar. We have recorded it, and we will share this recording out to all of you. You can share with your colleague and again, I’m sorry we didn’t get to all everybody’s questions.
Feel free to reach out to us. We’re happy to answer each and every one of your questions. Continue this conversation and continue the journey together. Thank you so much for joining. Have a great rest of your day.
Emi Polito: Hello, everyone. Good morning. Good afternoon. Good evening. I’m pleased to see that we have got people from around the globe that, the joys of being on the Internet and that fantastic. So, welcome everyone. Myself and David from Avon and Somerset Police we are going to give you a little bit of an introduction to Amped Replay.
We refer to it as the forensic video player for law enforcement and applications regarding criminal justice and investigations. So, we’re going to talk to you a little bit about that. We’re going to show you some of the stuff that you can do and how it can help you.
I’ll show you a little bit of the software and then David will talk about how Amped Replay has actually already helped and is helping them with their processing and also, in view to accreditation and level tier access to certain abilities and things like that.
Anytime you have a question feel free to put them in the chat. I think there’s also a Q&A box. I might not see them straight away, but rest assured that I will have a look at them and catch up with all the questions that we have.
So, I’m Emi. I’ve been with Amped for a couple of years now. Initially doing support, now mainly doing training, technical marketing, conferences and things like that. I do like to move around. Every now and then go and travel the world and then see people. Wherever the training and the conferences take me, I always like to catch up with the colleagues and friends all over the world.
I was in the US a couple of weeks ago. There might be some of you guys from there that attended the LEVA symposium. I was in Singapore last week and here I am now online.
I am based in the UK and I used to work for a number of police forces, Bedfordshire, Essex. I also work for a private forensic firm doing video forensics mainly. I’ve got a Bachelor of Technology Multimedia Systems Engineering and I’m also a LEVA certified analyst. I used to work in television back in the days until I got laid off and stumbled into this industry almost by accident and then actually fell in love with it and never looked back.
A quick introduction about our company. We are a fairly small independent company. We have between 40 and 50 people now. An Italian company founded in 2008. We’ve got a subsidiary in the US, its actually separate entity, but still part of the Italian Amped company.
Our mission right from the start, Martino, our CEO started it as a thesis project that’s how Amped started, was to become a one stop shop for any needs relating to image video analysis and enhancement. So, anything to do with forensics, whether that’s analysis, processing, clarification, measurement, presentation, everything like that, we’re into that business alone.
We do that and we’re trying to do as best as we can. Our vision is justice through science. Everything we do is transparent using forensic safe methods that you can report on and you can explain. We have a lot of tools in our product to help our users understand how they process and analyse their imagery and be able to explain it in a court scenario etc.
We operate in more than 100 countries and we have four products which we refer to as being part of an ecosystem. They are all linked together.
They offer a specific need in regard to video forensics. Starting from the Dvrconv, which is effectively a batch forensic video converter. Then going into Amped Replay, which is the product we are going to concentrate a little bit more on today, which is batch is a forensic video converter. It also allows you to do some basic clarification as well as annotations and reductions.
It has the tools to present your evidence quickly, even if you’re not an expert. You’re not a skilled video technician, perhaps you’re a police officer or an investigator. You’ve been tasked with looking at CCTV and maybe producing some stills, some video for the prosecution, for the media, etc. That’s what Amped Replay does.
Amped Five is our mainstream product. Amped Five is mainly for the analyst, for the technician. It takes care of anything to do with conversion, analysis, clarification, advanced clarification techniques, measurement and presentation.
Then we go on to Authenticate, which is it doesn’t process images and video, it’s got analytical filters for the purpose of establishing the origin and the history of images and video and whether they are actually showing what they purport to do. So, authenticity.
That’s our ecosystem there. Why are we talking about video so much? More and more we talk about it when connected to criminal justice is because the opportunities keep increasing all the time. Video evidence is the best visual representation of what has occurred. But it needs to be reliable. Therefore, we need some niche tools that we can use to analyse them.
The issues that we get associated with video are the format is not supported. How many times will we come across this? I’m sure there will be a lot of people in this online room that have got a theory coming from a specific place and it just won’t play. We need proprietary software, we need Codex, we need elements that if we don’t have them, we simply won’t be able to view that video.
There are so many of them. Each manufacturer does it in a different way. We have limited control on correcting the image. We also now have a specific need to redact sensitive information quickly outside the video. Why? Because there are legislations out there, data protection, freedom of information act, which require us to, take out or hide certain elements of the video or the audio as well before they can go to court.
For example, personal details of witnesses, unrelated content and things like that. You will see with Amped Replay, you don’t need to be a video editor to do these kinds of things. You can do them easily, efficiently and safely, most importantly.
So, now other challenges like incorrect techniques, insufficient software to correctly capture individual frames or clips. This is quite a big issue. It’s always been a big issue, but more recently there’s an understanding from the legal community that the acquisition of video is very important.
It needs to be done correctly. Not just for visual quality, but sometimes to retain data associated with the video, for example, timings information, timestamps and things like that. So, we will see how we can do that with Replay, fairly easily and very safely.
It seems easy, but it’s really easy to get it wrong. The problem is that if something wrong happens it might go unnoticed for a very long time. That’s the issue. Proprietary video formats complicate investigations, because you need players and, you can’t even access certain players anymore on the internet because there are IT restrictions in downloading installing software etc.
The image quality is always an issue. We have come to accept that especially with surveillance video, if the video was of pristine quality, then we wouldn’t be able to store so much of it. So, it’s something that we have come to accept.
There’s now also the issue of trust. We can’t always trust it. In fact, I remember when I started, every video was taken at face value. If that’s what the video shows, then that’s what happened.
Now I see the opposite. Now you bring a video to the courtroom and the first question they ask is it authentic? We first need to prove that it is authentic before we can make it into the courtroom.
Then we need to interpret the video in the right way and that can sometimes be a challenge because of technical limitations of the footage. We need to do so without bias as well, which is quite a current issue, very relevant at the moment.
Then of course there is the black box topic. The explanation of technical topics to a lay person, to judges and juries. They need to be able to understand what the image shows or doesn’t show and we, as experts, need to allow them to understand this before they can make their decisions about guilt or an acquittal.
The impact of artificial intelligence, and the balance between privacy and security. All of this needs to be kept in the back of our mind. In the past we always had different tools, there might be people in this room that started with using products that were more designed for broadcast and media and adapted them into our industry.
Then before we know it, we all got all these different tools we need to export from one tool, then import into another and it becomes a very complex workflow.
None of these tools were created with forensic usage in mind, we just adapted them. Integration from one product to another and potentially you could be deteriorating the quality of your evidence as you do so and that could be an issue as well.
Reporting. The reporting is of course very important. We will see how all of our products come with the reporting tools that can assist you with that.
Before we move on, to show you a little bit of what Replay can do, I just wanted to talk to you a little bit on the very first stage of a video forensic workflow, which is the acquisition. So how do we actually get the video in so that we can look at it, analyse it, process it, etc. Especially when it’s in proprietary.
We need to decode and convert proprietary CCTV formats. In all of our products, and Replay is one of them, the only one that doesn’t is Amped Authenticate, we have a video engine, which we now refer to as the Amped Engine. This will analyse a video, both in terms of container and video stream, and perform a process of analysis to see in this video what is actually genuine video elements and what are proprietary elements.
There will be a process of stream extraction and segregation of the usable video elements from the stream, which will allow for correct decoding. What I mean by decoding is reconstructing compressed video, putting it back onto your screen, reconstructing the pixels and all the frames correctly. So that you can view exactly what that video is intending is to show.
Once we’ve done that process, then we will repack it into a compatible video container because sometimes a video could be encoded with a standard compression method such as H264 or H265, but then it’s packed into a proprietary container. In that instance all we need to do is take that video and put it into a standard video container.
This process is actually quite important because doing this process of stream extraction we can also extract the timestamp where present. A timestamp is not just what on the screen is also actually a data set, which you can extract from the file and then you can use it not just to show the time and date, but also because you’re doing a timing or a speed assessment. Therefore, you need to have a correct and genuine timing information about the frames in your video.
This is how the process works. We have a proprietary file with a proprietary container. Inside we have usually a video stream and in this video stream, there are usable and recognizable video elements. So, for example, H.264 or H265 Iframes and Pframes, channel numbers, timestamps and things that are necessary for video playback.
Then we have them, scrambled with proprietary data, which we don’t know what they are. Usually only the proprietary manufacturers know what they are. So, using the Amped engine, we do a process of stream extraction where we segregate this usable video data and we put it all back together in the correct order.
If the video is multiplexed, then we will have a separate video stream extracted by each channel and then we extract the timestamp. Then using FFmpeg once we have extracted the stream, we will put it into a standard container like an AVI or an MP4 or something like that.
That’s the little bit of a background behind video acquisition. You see actually how easy it is to do in Replay. So, what is a Replay? It’s a video player designed specifically for forensic use. It’s the video player for police officers, investigators, members of the criminal justice system and law enforcement community.
It’s got an Embedded Amped Engine, which we just discussed very briefly. It can perform basic correction and enhancement safely and quickly. Now it has got the ability to provide tools and features on a level access basis. Especially in the UK when there’s now a need like for identifying if a person is a skilled and trained enough to be able to do certain things then Replay can be customized based on access to providing or to hiding these features.
We can easily annotate video and redact sound as it doesn’t require a technical background. It is designed to be for police officer investigators who are tasked to work with CCTV even if they are not video experts.
So, without further ado now, I’m just going to launch Replay and show you really what it can do for you.
The first thing I would say, in regards to formats, a lot of us will have to deal with something like this. So, a folder or something like that, a device which contains possibly video, but it’s in proprietary format.
If we double click on this well Window simply doesn’t know what a dv4 extension is. So, it’s asking you what to do. If you try to open it with something like VLC, VLC will tell you it doesn’t recognize this file.
So, sometimes what we need to do is, if you’ve got a player, we can attempt to play that video with the player but, then we are limited in terms of what we can do after we’ve opened it with the player. Can we export this video in proprietary format to a video in a standard format that we can use to send to the prosecution or something like that.
But maybe the player doesn’t allow you to do that. It’s only got a snapshot tool. You can only take still images from that player. Or if they do, they won’t export at the same quality as the original. So, the purpose of the Amped Engine and all the products that use the Amped Engine in Replay is to bypass the player in trying to deal with the video exactly as it is. Exactly as it was recorded.
In Replay is just as easy as dragging a file from whatever location on your computer into the platform and then you’ll see there will be that process of stream extraction that I have been talking about.
So, in this instance here, I’m navigating into a folder to show you that process of clean extraction has actually happening in here. We have also got a timestamp extracted from the video and there has been stream extracted and it’s been converted into, in this case, an AVI.
We have got an audit log with the process that is taking place. This is an FFmpeg log. It show’s you that the stream has been copied over into an AVI container and everything is ordered in there. Now we’ve got our video in here. We’ve got our timestamp that we’ve been able to extract from the video and we can deal with all the things that we need to.
Now, I’ll show you some of the practical stuff that you can do in Replay. For that I’m going to go back and import something where we can start drawing some annotations, do some basic clarifications and things like that.
So, Replay works like this. You drag your video in and it’s right in the middle of your screen. You see at the top, here, you have six tabs. The first two tabs deal with locating video and importing into the platform. You have options to Navigate into folders, look at thumbnails and things like that.
Of course, you have access to most recent projects. One of the good things about Replay is that anything that you do in it, whether you’re taking some stills, doing some bookmarks, doing some clarifications etc, when you close the program, you haven’t lost anything. Everything will still be there.
So, for example, I’m closing my program and reopening. I haven’t actually saved anything. I didn’t save anything and, I just closed it down. But when I go and reopen it, all the stuff that I was working on is still there.
This is designed specifically for if you are doing some work and you are in a rush, you have to go out on a job or whatever. You don’t have to worry about having to save your project or losing your progress. Replay will remember it for you.
You’ve got a play tab here. On this play tab is where you do your triaging of the video. You’ve got a timeline in there and you can scroll through it. You got an audio waveform as well, in fact, you got two. This shows you the audio plots. If there is sound, it will be shown here in waveform.
You got two bars here. One bar is the one at the bottom and will show the audio on your entire video. The top one, you can zoom in and out. Later you’ll see that when we do some audio reductions, i.e, we take some audio away from the video, we can go and zoom into specific segments, even on a frame by frame basis. We can do some real tailored and very meticulous kind of like audio editing because you’re zoomed in very close.
At the bottom there, you would see exactly what section you are zoomed in on at the top. Here you’ve got it highlighted in blue. So, at the moment, we are just looking at this portion of the sound in this clip, in the top bar.
You will see that a little bit later on, but it’s very easy. You just hold the control button on your keyboard, and then with a scroll wheel on your mouse. You just go up and down to zoom in and out.
What else can we do? We can triage our footage. One of the things our users use Replay for is creating a storyboard of events. For example, in this video we have a car robbery. We see bad guy here is coming over. Straight away I can go and bookmark an event. I just click on this button here, add, edit bookmark, we’ve got bookmark in there. You can see it like represented in here as a yellow line in your timeline.
You can navigate through your bookmarks with buttons on your interface there. Then when you when you are on a bookmark and click on the bookmark button again you can add a description, for example, suspect arriving or visible on CCTV. Then click ok and then you can go and have a look at other frames.
For example, here, we got a frame where a weapon is may be seen. So, we can add the bookmark in there and then just add a description UK suspect carrying weapon. Then click ok. Go forward, here there’s contact between the suspect and the victim. possible assault? Who knows? We can bookmark that and then have a look later.
Then once we’ve done this, we can go into our export button here. Go at the top there, the export button in here. In the export tab, you can export your video as an MP4 or something that you can pass on to your prosecution. You can export the bookmarks, so all these images that I’ve bookmarked, I can export them as individual images.
When you export bookmarks in Replay, what happens is that they are exported with the frame number. A video obviously is made of many frames and each one has an individual frame number. So, when we export them, we can simply refer to them as their frame number in here.
All of our bookmarked images have got frame number at the top, a timestamp of when it was created and also got a reference to the original file there. You can always refer back to your original video, even when you export images.
Then you can generate a report. The report is very streamlined. It’s pdf documents, it’s got details of your software, when you created the report, details of your input file name with file hash, size, Codex, resolution and frame rate etc. It’s got the first and last frame and then it’s got your storyboard in there. With your images there is the frame number description, if you’ve put one.
In Replay, it’s very easy to create a storyboard without having to mess around with word document or stuff like that.
Something that we have recently implemented into Replay, which is really useful for triaging, is motion detection. So, motion detection basically is the process of identifying motion in a video. It’s easy and straightforward in Replay in the sense that you can define motion simply as pixels that are recognized as areas or pixel regions moving from and to previous and future frames.
So, whenever that happens, that is detected in Replay as motion. To do motion detection from the play tab, you see every tab has got your video here and on the right side, you’ve got some panels in there.
On the play tab, you’ve got a basic file information panel, some case info that you can fill out. This is for your reports, if you’re producing reports, you can fill out the case information, location, etc. Here, we’ve got a motion detection panel, so it’s fairly quick.
This video is half an hour long, you can see the duration of the video here. Once the process is done, what you will have is your motion detected pixel regions being highlighted with this kind of like transparent red overlay. So that is all the motion that is being detected.
Once we’ve done this, then we can look for specific areas of motion. For example, if we have intelligence that someone has exited this building and has crossed the road, then we can just click on this button here, set region of interest, and then pick out this area in here. That will basically look for the motion in that specific area.
Then we can show and hide the motion plot. So, now what you see in here is no longer the audio waveform. This is actually the amount of motion that you see in the timeline, starting from the beginning of the video and ending at the end of the video.
The bigger the motion activity in a specific frame the higher the peaks you will get. So, what can we do? Well, all these highlighted areas that you see are all being detected as motion. If we were to go through each and every one of them, we will spend all day. It wouldn’t be any easier than to look for any activity in this area than doing this.
So, what we can do is we can set the threshold. As I move this slider in here, you can see this yellow line going up. Okay. That will basically filter out events that have got a motion that is lower in amount to the set threshold.
In this case I would start from a very sensitive threshold and see if I can find some activities. At this level, there’s only maybe 1,2,3,4,5,6 motion events. Now that I’ve done this, I can navigate through them and see. For example, here we have this big vehicle passing through the road, that has got a lot of motion, so it’s being detected. There is another vehicle going through and if I think I am missing something, then I can lower my threshold and see if I’m missing something else.
So, vehicle in there. Let’s go a little bit lower. Okay, there’s no suspect detected in that area, so maybe we can go and have a look at another region. Maybe here and see if we can find something. Here we found some activity.
This system really allows you to be time efficient, but at the same time you can adjust the threshold slider to ensure that you’re not missing anything, that’s the principle. This process doesn’t use any artificial intelligence is just based on standard pixel recognition and detecting pixels that are being moved from one frame to another.
So, that’s how the system works. Once you’ve done that, you can also add bookmarks for your motion events. Once you set that threshold, you can bookmark them and you can export them exactly as I showed you before. Nice and easy. What else we can do?
Replay also has the ability to do multiplexed video as well. For example, we have an executable file. If you double click on it, it will command a player. If you drop it into Replay the process of stream extraction will be attempted. That’s because Replay can see past the container and identify the video frames in whatever the container is.
In this instance, what’s happened is that multiple streams have been detected. Whenever that happens, you have a number of additional buttons in your play panel, which will show your individual streams. Bear in mind, that the camera numbers may not necessarily be the actual camera numbers of the DVR, but they are the stream numbers detected in this process of extraction. That’s why you get camera zero because that is video streaming zero in there.
When we’ve done this, we’ve also been able to get the time stamp that you can see here. The time stamp, even though it’s not on the screen as I was saying to you before, it can be part of the data. In this instance, we will be able to identify that in this process of stream extraction and add it in as data.
Here there’s a robbery going on. What I want to show you on this one is quickly how you do annotations in Replay. I’ll show you the announce button in a second, but I’ll skip that.
You can go chronological in this workflow. You load your video first, then you play it, then you might do some basic announcement, then you might do some annotation and reduction, and then you may export. That’s like the standard workflow, but you can deviate, do annotations first and then do announcements.
I’m going to click on the annotate button here. In here you have got a bunch of options there for annotations. You can use arrows; you can do free text but you need to have good handwriting for that. You can add text, you can add images onto an image or onto a video. You can do some reduction, such as hiding someone’s identity on the video. You can do some magnification, which is actually one of my favourite annotations in Replay and Five because Five also has it. You can also do some spot lighting.
One of the cool things that we’ve added recently into Replay is if you have a timestamp and have extract this data, you just click, quickly drag it and put it onto your video right from the player. Just drag it onto your video and then it becomes an annotation that you can have on your video.
Once you got your annotation there, you get new options available. You can customize the colour; you can make a red for example. You can customize the background, make it bold and do whatever you like. It’s timestamp data that has been added to your video as an annotation.
In this video here we’ve got a view, maybe possibly a weapon there. It looks like a like a pistol, we can’t be sure a hundred percent, but that’s what it looks like from the imagery.
We can do some magnification. Click on the button magnify, and then here, we have settings related to that particular annotation type. You draw it on your screen and that will magnify this region off the video. Then, these blue dot in the middle of this rectangle we can use to move to the area that we want to magnify, for example to show a weapon.
Now we can do some customization. For example, we can increase the magnification size. We can make it bigger and then we can add border. A very good one that I use a lot, in Replay and Five, is the border type “point zoomed area”, which shows you where the magnification comes from in your original video.
If you’re asking why this looks so ugly in the magnification that’s because it’s using nearest neighbour interpolation which is a method of adding new data to your existing data. When we enlarge an image, when we magnify an image adding new pixels to the original pixels. The nearest neighbour adds pixel values that are duplicate values in terms of colour and luminance of neighbouring pixels.
It does so in two dimensions, horizontally and vertically. Your original pixels, which in reality are small dots that you cannot see individually, are represented as squares and that’s what nearest neighbour allows you to do. Using this interpolation method, you can see the original pixels of the video.
So, each one of these little squares are now the original pixels of the video. But when we do this, we are losing what those squares were actually meant to depict, when they’re all combined together. This is why the interpolation method that we would use is this one called bicubic. This no longer duplicate neighbouring pixel values, but averaging values out in between neighbouring pixel values. This will allow us to register the information better in our brain.
For a forensic player, you need to be able to decide what interpolation method that you want to use. You can change the contrast and the brightness etc. You can drag any annotation outside the image area. Then, when you export this the image that will be exported will be all this canvas in here that can make it a little bit easier. We can add arrows. Very easily change the colour, make it blue. Add text as well.
One of the things I really like, it’s a fairly new feature, is when you draw new annotations, instead of changing the colours individually, you can now use the colour picker.
The colour picker changes the colour of your annotation to something that is in the viewer, where you are hovering your mouse over. The mouse cursor basically is the colour picker. So, I click on here on this arrow and my text will go blue.
I’m very pedantic when it comes to centering things and make sure they are in the right position so they don’t look disordered and disorganized. Replay has got a snapping function. When you drag your annotation around it will snap to specific landmarks. For example, in the middle of another annotation, or in the middle of your video or art, specific locations in there.
You can also select multiple annotations and group them together. Right click, group them and now you’ve got your annotation locked together there.
We have got a few questions. I’ll stop for a second and have a look at this question here.
How much does Amped Replay cost? I currently have Amped Five.
Bill, I wish I could have the answer to that question. I’m only a techie guy. So, for all that kind of stuff, I’m sure, Michelle or someone, can point you to the right place and you can certainly find that information out.
Jason is asking if the timestamp is inaccurate, can you adjust it so that it shows the correct time?
That is a good question. At the moment in Replay, You can’t. You customize any textual annotation as you like, but this particular time stamping is what we call a macro.
It’s a dynamic connotation, which is taking data from the actual timestamp, extracted from the file. So, the timestamp extracted from the file is whatever the proprietary has written into that video. It might be inaccurate to start with, because it’s proprietary and it’s misinterpreted. Something that obviously we are very careful to not do.
It may be inaccurate because it’s shifted from the real time. The bad news is that you can’t do it in Replay at the moment, but you can do it in Five. The good news is being able to request to shift it, to put an offset in, so that you can show the real time. If there is a difference between the DVR time and the real time, you will be able to do that very soon. It’s in the pipeline.
A lot of new features that we are applying to our programs are mainly from our users from sources like, the Discord channel or Webinars. So, the fact that you asked here, would be a further incentive for this feature to go into the program as soon as possible.
I see David is also helping as well with the questions. Thank you very much. All right. I think I’ve caught up.
Now, what I’m doing here is I’m not losing any progress, but what I’m going to do is load another video. I’ll show you how easy it is to animate annotations. This is something that in the olden days only a video editor was allowed to do. You had to go to the video unit to do this.
It’s very easy and there are quick and efficient ways of doing it. Automatic ways of doing it and a manual way of doing it. Maybe if the image quality is not good so that the automatic doesn’t work.
So, we’ve got another bad guy here. What happened in this video is clearly they had an eye on the bag for quite some time, and now he’s decided to come and snatch it. As he does, he’s being chased by a very brave member of the public who, I believe, got it back to the owner. So, all good news in this case.
We can track the suspect and animate the tracking. We go to the annotate tab, just like I did before. But this time we want to do it in a video sequence, and therefore we want to animate our annotations.
In this instance, I’m going to go and magnify the guy because I’m also interested in his clothing as that might be important to the investigation. I’m going to do just like I did before, magnification and I’m going use better quality, bicubic interpolation and zoom in a little bit. But I want to make this smaller so the area that I’m looking at is even more zoomed in.
I then move that blue dot there and I go like this. Now for the tracking, as this subject moves towards the victim we want the rectangle to follow the subject around. It’s a process of animating or annotation basically. You can do this with all the annotations and they work exactly the same way. We can choose.
If the quality of the footage is good enough then we can use a method of automatic tracking. This is where we tell the program what pixels to look at on a particular frame, then, throughout the other frames, the program will attempt to look for those pixels in that pattern/configuration and follow them around.
So, we’ll make this border a little bit smaller, it’s little bit too big. Here we go, border thickness slider, make it a little bit smaller. I’ll press the button, track, here and you see two kinds of outline. These are not rendered on your image when you export the images, they are just visual aids. You will tell the program which areas of pixels to look at.
For example, the logo on this t shirt is quite unique. It’s got enough pixel definition in there, there’s good contrast of values, almost to the point where we can see a shape. So, I’m telling my program to follow this around. What I don’t want to do is include the arms, because the arms will move as the subject moves and that may confuse the program in tracking that suspect.
Then with the yellow one, we are telling the program the estimated direction of movement of our tracked area. We are helping our software to look for this particular region, rather than looking up, down, left and right. We’ll just look for that specific direction and we’ll make it a little bit easier and quicker to track.
Once we’ve done that, then we can go ahead and click on the track button. We can also hold the button down, double click on it and then it will go and it will be tracked. This region will be tracked and as you see, the rectangle is moving. The magnified area is on the subject constantly, until the area of pixels that I initially set as my reference is now is disrupted and longer visible because it’s obscured by the arm.
Then the process falls apart. Five is looking for it as well and can’t find it. It’s just keeping it there and the tracking is not working anymore.
Not a problem because up to this point I was happy and now I will compliment my automatic tracking with some manual tracking. Manual tracking involves the use of key frames.
For those of you who are not into the video editing words, key frames are registered positions of an annotation at a particular position in time. I will add the key frame. I’ll right click on this and then I’ll add toggle keyframe and what you see an icon in there is kind of like a visual aid which tells you that there’s a keyframe at this particular frame.
So, what happened is at this particular frame, frame 222 of the video, the annotation is at this particular position and it has been registered. Then I’ll move my playhead across, all the way up to here, which is where I want my annotation to stop. Then I’ll move my blue dots to the new location where I want it to be, at this particular point in time, and add a new keyframe.
Now we have two keyframes at two different positions in time and at two different locations. There will be a linear movement in between the two. So, they will be joined in a line. If you go back, you’ll see that’s exactly what happens.
Of course, this object is not traveling in a straight line. So, once I’ve got my keyframe in there, then I will have to refine it a little bit. Go into this whole section here and keep adding more keyframes. There is a keyboard shortcut for doing this, which is the U button. You will find this so much quicker than having to right click and select the right options from the menu.
I do this for as many times as I need to, to ensure that this suspect is inside of the rectangle. Here I’ll put it in the position in there and put a keyframe in there. Maybe one in here and I’ll put a keyframe in there, up to the point where I am happy. Now I want my annotation actually to finish there, so I right click and select set until this frame. My annotation will stop here and now I’ve got my subject being tracked fairly easy.
You will see that when you start doing this you know a few times it will be very quick and efficient. Now that I’ve done this, I’m going to show you quickly how to top and tail a clip, which you can do very easily in Five.
What do I mean by top and tail? You might have a video that is very long, this video is not very long actually, it’s just short of 20 seconds, but you may have an hour long clip. Of this hour clip, you’re only interested in 20 seconds where something like this may happen. So, you put your playhead to wherever you want your video clip to start, then click on the button, start range. Then you put your playhead where you want your sequence to end, for example here, and then you click end range. Now you’ve got like a shade in here, which is basically your highlighted region.
Then you can look at the old range, stretch it, and see the range that we just selected. When you go and export it, that’s the video that will be exported. It will be the video that you topped and tailed.
Sometimes you may not do annotations. You may just want to top and tail a clip and dish it out. It’s very easy. You import it, do your range and then export your original video. You can also export the actual clip exactly as it was in the original. Just a section of it, if you like, in a process called lossless stream, or you can just export the processes, if you do annotations etc. Just like I did now, then you have your process stuff in there.
I want to show you, and then I’ll pass on the word to David, how easy it is in Five to do an audio reduction. You can do this with both video and audio because Replay supports audio only now. That’s recent thing, before you needed to have a video, but you can just have just audio. When you have audio, you just have your audio waveform.
So, earlier when I said you can zoom into your audio specific audio section, it becomes useful when you want to do some reduction. So, for example, I want to do some quick reduction. What I mean by that is taking away chunks of audio from this file and replacing them with silence or with a standard tone etc.
I go into the annotate tab, to audio reduction, then hold down the alt button on your keyboard and drag the section on your timeline that you want to remove and it will be highlighted. Then you can set the volume of your tone. You can do all that and then that will be replaced. Once you’ve done it, you can hover over it and trim it etc and add as many annotations as you like.
On all of your annotations, you can add comments etc. All of this information, times and comments of your annotations, will be in your reports when you later export it.
Finally, I’ll show you just quickly and easily you can do some basic clarification.
For example, I’ve got this video and I’ll drag it in there. Click on the announce button. In here you can correct things like interlacing or technical issues caused by analogue to digital conversion. Five has got two ways of doing this. It does it automatically. It detects whether a video, for example, is interlaced or not by the metadata in the video and acts accordingly.
The same with aspect ratio. The aspect ratio is the division between the horizontal and vertical resolution of your video. You can set it manually or you can do an automatic adjustment. The data is taken from the metadata of that file. You can do rotating, cropping, all the stuff that you would expect an image processor to do with light correction, sharpening and resizing with whatever interpolation method that you choose.
That is pretty much it for Replay.
I’m going to pass the ball to David from Avon and Somerset. He will give you some really good information on how the program has helped them out and will also show you some statistics, which might be very interesting to see.
While I do that, I’ll keep an eye on the questions. Thank you very much for your attention. I hope you enjoyed it. I’ll stop sharing my screen and I will pass the ball to David, if you can hear us.
David Matthews: Hello. Thanks Emi. I just want to apologize to everybody. I don’t have my camera working today, so you’re just going to have to listen to my dulcet tones, I’m afraid.
Let me start sharing my screen quickly so we can get this show on the road. Good stuff. Let’s get started then. Good afternoon, everyone. Or good morning or good evening. Don’t know where everybody is coming from today.
Just want to say thank you to everyone for taking the time to see Emi and I today. So, as you can see on screen, my name is David Matthews and I’m the digital video unit manager for Avon and Somerset police. Which is a member of the Southwest Forensics Collaboration, which covers pretty much the Southwest of England.
My team itself is a pretty small team. We cover a fairly large geographical area by UK standards anyways, and we’re responsible for the complex recovery processing, analysis, enhancement, and court presentation of all multimedia evidence. So, the presentation that I want to go through with you today is, largely going to provide you with a bit of a high level overview of how Amped Replay has supported our force with their investigations.
I’m going to spend most of my time walking you through the business issues that we were faced with, which led to the decision to move ahead with Amped Replay as a product, as opposed to some of the other products that are on the market. I’ll talk about the features in Amped Replay which have helped us to remain legally compliant.
Then lastly, I’m just going to go over a few stats, as Emi was talking about, to show how the system has helped us in fairly high level numeric terms. I’m not going to be getting into the weeds of it. If you wouldn’t mind not throwing questions at me during the course of the presentation. I’ll pick those up when we get to the end, because I may answer some of those questions as we’re going through.
So, to give you a bit of context to the whole thing. In July 2022, our force was faced with what was pretty much locally referred to as the perfect storm. We were faced with a host of internal and external issues, which needed to be worked on for a whole host of various different reasons, all of which were going to impact on our ability to investigate CCTV.
The first one being the Forensic Science Regulators Code, which is a statutory document, brought in to ensure that any and all forensic science activity in the UK is completed to the standard expected of the Forensic Science Regulator. In addition to that, we have the NPCC, or National Police Chiefs Council, framework for video evidence. I notice we’ve got a few people from that group on the call today. Andy, I see you there.
That is another statutory document which supports the code by going into further detail on how CCTV evidence must be managed. Whether that be the delivery of competency training for some staff or full ISO 17025 accreditation for others.
Another one of the issues that we were facing was our criminal justice unit, or our case building team, were seeking to significantly increase or over double their full time equivalent staff levels. The CJU roles themselves were updated to own multimedia preparation and redaction for case files. On top of that, we also introduced a new blended working policy in force, allowing for those CJUs staff to work from home.
So, essentially what we’d done is created a situation where we’re dramatically increasing the size of a department, increasing the type of work that they’re going to be doing, so they’re going to be getting a lot more involved in the video and CCTV side of things, and then allowing them to work from home.
But if they’re going to work from home, how are they going to do that? Then the last thing that we were faced with was The National Enabling Program. Essentially what that is, is a national IT change process to bring all forces in the UK up to a standard, secure, modern cloud based system for all of their IT.
So, any of the products that we were going to purchase within Avon and Somerset to deal with CCTV and video evidence needed to be compliant with that NEP rollout. We’d also introduced restrictions on general software use within force.
On top of that, we were going to make a decision to remove the majority of our standalone computers for this sort of thing, where historically we would have had standalone terminals in every single office around our whole force area for CCTV processing and review, we decided to remove those.
So, bit of a difficult situation to be faced with, and why did we choose Replay?
After a thorough examination of the market and available software tools and through some of the own testing that we’d done locally in my department, we were going through the process of reviewing the forensic tools that we were going to be using. We landed on replay as being the most appropriate choice to meet all of those needs.
It’s a process driven and simple to use software. Meaning that the training and competency requirements could be really easily met. It’s configurable, so we could ensure that the tools provided to our officers met the requirements of the FSR code in the MPCC framework by tailoring the build itself to meet those requirements, like I was speaking about in the comment section a few minutes ago.
It can also be deployed across our network, meaning that the management of that software by our IT team then becomes a simple situation. And crucially, going over what Emi has just been speaking about, it is a forensically sound application.
On that simplicity side of things, what we really liked about it was that it had the drag and drop feature for being able to get videos into it. It sounds like a really simple thing, but when you’re a regular frontline police officer who doesn’t routinely have to work with multitudes of CCTV playing applications, what we often get asked in my department for is technical support on how do I get this thing playing.
So, just having a system to be able to drag and drop the files onto it and have it automatically decode and load those files was a massive selling point for us. The system itself has a simple interface which leads the user through that process. So essentially what Emi is just saying again. So being able to import the footage, being able to review it, apply any optimizations or basic enhancements to it, get your annotation done, and then export a report.
Then there’s lots of training material, which is updated regularly through the Amped software blog.
The system owner itself can add and remove features. Entire tabs can be switched on and off in the system if you so wish, and individual filters themselves can be switched on and off. Custom groups of different features can be created for specialist teams. Crucially, there is no ability for users to add or remove features themselves, and we’re able to deploy it across the network.
Essentially what that means is what we have at the moment is a streamlined version of Replay deployed to all of our frontline officers.
So, I’ve sat down with some of my colleagues and we’ve reviewed the FSR code and we’ve reviewed the MPCC framework and we’ve taken a look at what are those processes that are going to be suitable for our frontline officers for them to be able to continue doing their work without having to concern ourselves with them being non-compliant with the type of work that they’re doing.
So, the system gave us the ability to be able to cut out some of those features so that we don’t have to worry about those things.
The system itself operates in a hierarchical, review conversion system process, where it’ll start with your original format. If it needs to do a conversion, it’ll opt for a streamed copy and then go through to a transcode. That’s a process that I really liked. Source files obviously being unaltered, high quality forensic reporting, and again, the inability for users to alter those export settings.
Removing the opportunity for people to be able to mess around with bit rates, to be able to mess around with formats and all of that on their export settings was an absolute must from our perspective, but also a must from our customers perspective, our frontline officers. What they need is a system to be able to do their jobs, to be able to review, redact, annotate where necessary and export without having to have significant technical expertise in how video works.
So that was a massive selling point for us. So, how has it helped?
November 22, if you see the numbers there on screen, around that 338 mark. That was pretty much the average amount of exhibits that were submitted to my office, leading up until that point. Some months it would be a bit lower, some months it would be a little bit higher, but on average we’re sitting around that three, 338 to 350 exhibit mark.
December 22, we ramped up all of the testing with our primary CCTV stakeholders who were working on this project whilst we were testing it. They were made up of our burglary squads, our major crime team and our incidence assessment unit.
So, the primary teams within our force who we’d identified as the sort of main stakeholders for the system. We ramped up all of their testing on it. Which gives you an idea as to why those numbers have come down.
In January 23, we then sent out our communications to our customer base to let them know that the system was coming and to inform them of the training that they needed to complete and the processes that they had to go through in order to secure access to the software.
Then in February 23, that’s when we released the software. The 127 number that you see at the end is the result of that reduction in exhibit submissions through to my departments in the lab. That has consistently stayed around that mark since we deployed Replay.
So, from around 330 to 350 exhibits on average per month down to around 120 to 140 exhibits per month since loading the system into the force network.
Essentially what we’ve seen overall is a 20 percent reduction in the submitted cases to my department. A 50 percent reduction in the submitted evidence items to our department. So, where those two numbers don’t line up means that we’re still getting roughly the same amount of cases submitted to our office, we just have less exhibits attached to those cases, meaning that we can get through those cases much faster.
The turnaround time for my department has now been reduced from around 12 weeks. From an exhibit being submitted to completion, to around three. Many weeks, we can now get it done within one and two.
Most of our officers now have the ability to immediately investigate their crimes rather than waiting for their evidence to be processed by my department. The feedback that I routinely get from our officers out on the ground is that they actually feel listened to and that they’ve got a system that works for them to help them with their jobs.
Emiliano Polito: Okay. If there are no other questions and I really like to thank David for his really interesting insight and also for the interactions from our users, which has been really good. I’ll put out these slides here, for people so they can get in contact with me or with the company.
Talking about training and education, we have our blog, which is available from our website, from this address here. It is full of articles that are not just product based, but over the years have become a knowledge base with anything to do with video forensics.
In the blog, the easiest way to find articles is by using the search engine. For example, if you’re looking for some more information about interpolation, we were talking about earlier, start typing that word in the engine and it will come up with a bunch of articles for you to educate yourself.
We tend to provide this information to users when they have a specific problem because we know we don’t have to explain it again as it is already somewhere in the blog.
Michelle: Hi, everyone, and thank you for joining us for today’s webinar, Collecting Essential Mobile Data for eDiscovery Today. I’m Michelle Durenberger, and I am the field marketing manager here at Cellebrite Enterprise Solutions. Before we get started, there are a few notes that we’d like to review. We are recording the webinar today, and we’ll share an on demand version after the webinar is complete. Now, if you have any questions, please submit them in the question window, and we will answer them in the Q&A. Now, if we don’t have time to get to your question today, we will follow up with you after.
Now, I’d like to introduce our fabulous speaker, Andy Jacobs. Now, for those of you that don’t know, Andy enjoys the challenges that come with complex litigation, focusing on digital forensics and eDiscovery. He has spent the past 10 years consulting law firms, service providers, and enterprises as an expert in witness…as an expert witness in digital forensics.
As a man for others, he assesses the needs of his clients to provide critical feedback to posture them for success. He believes that management and preservation of data is a critical component to a legal team’s arsenal. Andy now resides in Denver, Colorado and can be found enjoying the mountains or the wonderful food scene around the state. Wherever he may be, he will be supporting his Ohio State Buckeyes and of course, pestering his wife. Thank you so much for joining us today, Andy. And if you’re ready, I’ll hand it over to you so we can get started.
Andy: Hey, thanks, Michelle. Happy to be here. So, a few weeks ago, I was at ILTA, and I wanted to put this little webinar together based on some feedback that I got from ILTA. A few of the big topics there, and the one I’m going to focus on today, was eDiscovery in general. There’s a lot of topics: generative AI, mobile data, deep dive forensics. I’m focusing on eDiscovery today, because I have a small…there’s a spot in my heart, if you will, for eDiscovery.
The world has changed so much in regards to eDiscovery, and even in the past 15 years. I started…when I started the industry, eDiscovery and forensics were kind of two different things, if you will. And now it seems they’re the same coin, just two different sides. We have these service providers, enterprises, and law firms all trying to get data in a defensible manner and review that, and their review platform, whatever that might be, we’ll say Relativity or whatever, I think that’s probably the big one right now.
We’ve seen that 85% of the population has a smartphone. That’s worldwide. And in the US, 90% of adults have a smartphone. And data just seems to be everywhere anymore. And what’s relevant for your matter, what might not be relevant for another matter. Being that mobile phones are, like I said, pretty much everywhere it seems, I can pull my Office 365 data, my Slack data, I can send emails, I can go on Facebook, all from a computer in my pocket. And, depending what your case might need, I have a feeling some data might live there.
So I was playing around and it looked like in 2013, that was the first time smartphone sales surpassed feature phones. I’m almost curious if this was the change when eDiscovery and forensics really started to be more hand in hand, where you said, you know, we have that tablet, we have our smartphones. I was at…when I was at ILTA, everyone seemed to be on a tablet, you know, kind of working, but kind of still enjoying the show. Learning, going to their conferences or going to their meetings, but everyone was on something cause you know, work never really stops. I think that was the big change, 2013. You know, it used to be just loose documents and, and paper documents as well, but loose documents and email from computers. We didn’t really see a lot in mobile phones on the eDiscovery side, it was always forensics.
Now though, we are saying communication can be everywhere. Slack data, Signal data, messaging, back and forth. Whether that be for work, not for work, whatever. I think we’ve all seen that there’s a need for this data in our litigation, and I’m kind of guessing that’s why some of you joined me today. At least enough to, you know, join this webinar and, you know, maybe think, you know, kick the tires and see what we actually might be looking at for eDiscovery data. I’m seeing a lot, mostly just messages, a lot of email, a lot of, you know, “I’m leaving my company and I’m going to try to solicit someone to come with me,” or, “taking some screenshots off my phone.”
A 2024 industry trend highlighted that 97% of cases now involve mobile phone data. That’s huge. What kind of data are we looking at? I think what kind of data we’re looking at really depends on the scope of our case. That’s one of our biggest challenges, right? Is: where is this data? Where does it live? Where do you know, if it lives with me, I’m going on vacation in a few days. (Shout out: we’re going to Alaska. It’s going to be awesome. I don’t know if I’m coming back! Can’t wait. And that’s going to be fall and it’s going to be wonderful. I’m out here in Denver and I cannot wait for it to get to beanie season and college football season. Go Bucks.)
So if I need to collect my phone and I’m going to be gone here in a few days and a service provider is not going to stop me on that one: been looking forward to this trip for a while. That’s a challenge that our service providers, our law firms, and our enterprise corporations are running into. “The data lives on Andy’s phone. We know he’s going somewhere else. How do we get that data?” It needs to be cost effective. It needs to pull what we need. And we need to make sure we have a happy custodian so they’re not, you know, raging up a storm. What are we getting with some of these collections? That’s another challenge that we run into. We don’t want pictures of Andy on a cruise.
You might, you know, I might post some on LinkedIn. You want text messages that as on Andy’s phone. Maybe some documents, maybe some email, if they live there. What are we getting with these collections and how, you know, how can we make it efficient? I’m a big fan of remote collections. I have been for quite some time, send an email, collect what you need, send it to your S3 bucket. You imagine, I could be doing that at the airport. And then you, we’ll say you’re the law firm today, you’re going to throw that in your review platform again. Maybe it’s Relativity. Maybe it’s something fun. You want to see your buttons. You want to see your bubbles of me texting Michelle back and forth on “here’s…”, you know, “here’s the weather in Anchorage.”
Everyone wants to see what it looks like on the phone. That’s another challenge. We were saying, you know, a few years ago data sets have changed. It used to be forensic investigations, used to be Excel sheets and PDFs and link files, all this fun stuff. Now we just want a large scale review of messages. Depending who you are, all these cases are different, right? Are you the internal corporation that’s trying to look for work with legal and make sure you’re just under compliance? Are you the law firm tasked with collecting and reviewing a bunch of data? You know, whether it be Andy’s one off or is it a class action lawsuit? Are you the service provider? The expert trying to collect all this data and maybe do an investigation as well as the review? You all have different things going on, but you’re all kind of doing the same thing, right? “Let’s collect some data and let’s get it so the powers can review it and make sure it’s good to go.”
My last slide…my last little bubble here is on compliance because this one has been coming up quite a bit too. And this goes with me at ILTA when we’ve had internal law firm…or law teams from corporations coming up and saying, “hey, we need to make sure our IP is set, make sure we’re safe. We have people all over the country.” So it’s a hybrid environment anymore. Some people are in office. Some people don’t ever go to office. Some kind of go back and forth.
So you have an employee departure and, you know, I’m sure you already have a plan for collecting email. I’m sure you have a plan for collecting that workstation. But what about the phone? Is it their phone? Is it a bring your own device? Is it a company phone? I think some people are going to have more fun being able to collect targeted data sets. And I say fun: an easier time collecting those targeted data sets if it’s a bring your own device compared to pulling that full device image. That full acquisition, that full file system and data that is not really relevant.
I did a previous webinar a few months ago in regards to data size is just growing, and I still believe (and that was one of the things we discussed at ILTA as well), with data sets growing lawyers and attorneys and review folks and you, Mr. Enterprise, aren’t going to be able to dig through millions of different haystacks when you can just target maybe the one haystack that might have your needle. I’m sure you can. Sure there’s AI out there.
Like we just discussed the very first slide that I’m not discussing today. I’m sure there’s, you know, review teams out there that can go through a bunch of data and help you out. Our review platforms have quite a bit of algorithms to see what might be more relevant than others. But if we’re just looking at text messages today, I don’t need Andy’s picture album, I just need maybe that SMS DB. I think with compliance, we’re really going to want to have a plan. And we already have plans for some things that we already have, like we said, plans for computers, plans for email, but what’s our plan for mobile devices?
The biggest challenge I had as a service provider was being everywhere at once. Let’s face it, it’s all really about timing. And if you’re implementing a program, is the headache of data collection really worth that risk, worth that reward? We’re all busy, we all travel, and no one wants me or a service provider or a law firm to go in or go sit at HR for hours to collect a phone. Or, “send me your phone today, I’ll get it back to you once the collection is complete.” Is that a scare for some folks? Is that why we’re not implementing some things on mobile devices? Is that it’s inconvenient?
I think we can make that pretty convenient. I think we can schedule it to where at the convenience of your custodian, “hey, plug in a device. Here’s what we’re getting. Go grab lunch. Go watch a little bit of the game. Go to bed, come back, pick up your phone. It’ll be done.” All from the power of your house, all from the power of your office, maybe your law firm. Let’s collect what we need to collect.
I mentioned, I did a webinar a few months ago and I loved it because we discussed just how data sizes are massive. When we talk about the convenience and timing for our challenges, I think if we can pinpoint what we need to get more off the bat, it’s probably going to help. My two cents, maybe three cents with inflation. Collections will be quicker. Custodians aren’t going to be annoyed for collecting non relevant data, personal data on their devices, especially if it’s a bring your own device. When I started, you know, it used to be UFED was the premier tool, Cellebrite product.
This was 10, 11 years ago when I was starting. And it always was that phone in hand, never a remote acquisition, never a good way to pull what we need. It’s everything or nothing or as much as we can or nothing. Used to go to boardrooms, used to go get called for depositions and fly across the country and try to collect a device sitting right there. Gosh, service providers. You’ve all been there. We’ve all been to someone’s house to collect something and it’s kind of awkward, isn’t it? And they stand over your shoulder or, you know, you kind of do ideal chit chat. “What are you collecting?” “Oh, I don’t have anything on here. That’s fine, sir. I’m just here to collect some data and get out of your hair.”
Or like we said, maybe you drop it off at the lab. Now someone needs to drive to me. I get chain of custody. “I’m not letting you in my lab. Sorry, you can sit in the waiting room and I’m going to have your phone for quite some time, till it’s done.” These ways are quite disruptive, and daily lives and people being busy, people even willing to be cooperative in the first place, giving over a phone. So I said, you know, previously I’m a big fan of the remote collection. “Let me send you an email, plug in your phone. Let’s call it a day.”
I think a lot of this knocks out our security concerns and our compliance in the last slide. IP theft is everywhere. We all know it. It’s a big thing in litigation. It’s easier now than ever. When you’re sure things are more locked down with security on a device, we’ll say, but there’s more devices out there. Not just emailing yourself maybe some intellectual property or throwing in a USB drive to take the intellectual property.
I do that when I have it on my phone. I can access anything off my phone. My Dropbox, my company this, my company that. Then we have our, you know, solicitation. Those messages of, “hey, you know, I’m leaving, come work with me.” Or, “hey, I’m long gone, but you don’t like the company anyways, send me some information, maybe, you know, let’s be sneaky about this.” I’m hoping that if you’re listening right now and chatting with me, especially you, Mr. Enterprise team, have a procedure for preservation of data, have a procedure for your mobile phone data, just an SOP that says, “if X happens, we do Y.”
We discussed previously some of our real world challenges. Some we didn’t discuss would be different phones, different makes, different models, different MDMs. Is an Android, is an iPhone? Where is this data and where does this live? I’ve been seeing a lot of the, “hey, can we just collect some messages and throw it into Relativity for review?” Along with the custodian’s PST file, along with the custodian’s outlook, along with Slack, along with…you know, with all these messaging applications. The full world of Andy and one little tab. If that’s what your case requires, I can’t see why not.
Target what you need, send it where you need to go. Mr. Service Provider, you might not be needing that case. You might be doing an accident case, traffic accident. That’s going to be a little bit different for you. That’s absolutely going to be a, “pull the full phone. Let’s see, you know, screen time and app time and usage.” Every case is a little bit different. You guys are the experts. You know what you need, you know where to look, you know where to find. I’m sure I have some lit support folks with me today. I think you’re the unsung heroes. Probably trapped in the closet right now, collecting email to be loaded into Relativity by right now. Hats off to you guys.
Work your, your work your butts off! Again though, we’re going to need a plan on what are we collecting? Where is it going? What are we doing with this data? What happens if there’s an issue? What happens if something doesn’t parse? Always got to have a plan. I was talking with a CISO at ILTA from a service provider, and that was the one big thing that they came back with was: have a plan. Even if it is a remote collection, maybe a cord isn’t working. How are we troubleshooting? Maybe it is an onsite collection and the MDM is just not playing well. What are we going to do? Have a plan. Different collection methods required different workflows. Different phones require different workflows. For my folks here, like we said, if they’re a service provider, a law firm, enterprise, you’re still using the same tools, you’re still collecting the same types of data, but you might have different workflows for your environment. Have a plan.
Part of that plan is standardization. Make sure you’re trained, make sure you know what’s kind of going on, up to date. New applications come out all the time. Say I’m collecting Signal today, I’m going to need a full file system. Say I’m collecting something else today, maybe it’s just SMS DB. Maybe I can just do that remotely. Really depends again on that phone or, “I know my enterprise will have an MDM installed.” Okay. Litigation happens. My plan is instantly put these on lit hold.
Maybe you’re in tune MDM has a different setting that says, “I’m only allowing this for right now.” And then after my support team’s done, I’m going to email info sec and say, “hey, I collected my 15 phones. You can switch their profile back.” That’s part of your plan. That’s part of your workflow. I love my service provider guys. Like I said, service provider for quite a few years. You all have workflows for everything. Parsing, collections. What kind of collection is it? Your project managers for Relativity have workflows on literally…everything is documented. And it’s a brilliant thing to see because that way it’s defensible. That way we can come back and say, “yes, I did this. Here it is written out on my SOP.” Have your plan. A lot of that plan could be around targeting data, if that’s part of your plan.
Shout out to me again for my old webinar. We discussed how big data is getting and I’m sticking to that fact. If you’re an attorney or you’re on review, do you really need to see the same data sets over and over and over again? There’s things like deduping. That’s been around for quite some time. There’s things like, you know, targeting just the user profile. We don’t need any deleted data for this case. It’s not relevant. On the mobile phone side, let’s just collect messages. We want to see who Andy’s talking to and about what. We don’t care about Andy’s vacation. (I do. Andy’s very excited about his vacation, clearly.)
We don’t care about the ESPN app for his fantasy football league. We care who he’s talking to and why. Maybe there’s a pointer in there that says where he’s storing some data. Who knows? Again, it could just be for preservation. Andy’s retiring. He doesn’t need a data anyways. Take it. Yeah, maybe he’s never come back from Alaska. We’ll see! Yeah, this next webinar will be from Alaska. I love it. Or a beach, beach sounds good.
So why collect a lot of data and rack up charges depending on your platform? Rack up hours depending on what you need to review? When we can target certain things, streamline certain things. That goes back to that plan, that procedure. Let’s collect data. Let’s put it somewhere where you, the smart folks, can review. You, the people going and fighting in litigation can review and say, “boom, here’s my needle in the haystack.”
Or, “hey, we did our due diligence. I collected 50 phones all remotely, all using this standard. None of them have what we’re looking for. This case is wild.” Streamline things that you need. What are we going to get if it’s, say, RSMF, which is a beautiful, beautiful format. I’m going to put that into Relativity. I’m going to put it right next to my PST file and put it right next to my workstation user docs. We’re trying to get first speed and accuracy. I need it to be right. I need it to be done quick. (Also need a drink.)
eDiscovery professionals work with so much data. You guys are crazy. Multiple cases going at the same time, different instances, different everything. What are we collecting today? What are we reviewing today? Are we using machine learning? Are we using any algorithms to try to speed this up? Everything’s different for you guys, and you guys go through a ton of it. If you’re external, like a service provider, and you’re doing all that, wonderful: high five to you guys.
Same thing if you’re a law firm, a lot of my law firms have, we’ll say, Relativity one in house, easy to use, nice to set up, someone else can manage it. All you need to do is log in, review your stuff. Let’s say you are a corporation. You are an enterprise where people come and go, litigation happens all the time. Maybe we need to collect groups of data sets. All of HR today, and maybe tomorrow’s all of IT. Two different needs, two different litigations.
Build your SOP, make sure you have a plan that if A happens, we do B. If C happens, we go to plan D. Your service providers do this all the time, your law firms have a plan. I’m hoping for my enterprises here with me today and my corporations, my companies: have a plan for your mobile data. A lot of that goes with, you know, choosing the right tools. You have a toolbox, I’m sure at home. You have a garage full of tools for your car…and maybe not. I’m honestly not handy with a car. The wife on the other hand, brilliant. Me, I could change my look. Invest in the right tools.
Tools that work, tools that are quick, tools that are defensible, and tools that can streamline the process for you. Say, I’m collecting again, Andy’s text messages. I want to see the bubble format that they go in between him and Michelle. And I want to see that next to his PST file. Great. Let’s collect some data. Let’s throw it right through Relativity for you on one little streamline button click, if you will. Click button, email goes out: brilliant!
This goes back to some of our previous slides about some of the issues we’re running into with data collections and gone are the days of having the phone in lab. You might need to, for some things, you might not need to for others. Invest in the right tools. Saves you time, saves you money in the long run, it’s defensible and everyone’s familiar with them.
Stay informed and keep up to date with the trends. Things change all the time. New phones come out, new tech comes out, new app comes out. Hopefully you see it, you know, in a lab environment or in a review…a test environment prior to maybe running into that issue live, right? Someone else probably ran into it already. Look at your forums. Look at your groups. Stay up to date on webinars like this. There’s a few of them out there. There’s podcasts. I love them. I listen to them all the time. Stay up on your CLEs. Someone ran into this issue probably before you and lived through their experiences, right? Let’s stay informed. We’re a team. We’re all trying to do similar things. We’re all trying to get data defensively, make sure it’s reviewed, and we all want to talk about it. We want to have fun.
I think finally, one thing I want to end on is mobile data really used to be a pain. Mobile data has gotten a lot of love recently. And I say recently, you know, last 10 years. I’ll say 9 years, 2013. That’s when smartphones surpassed feature phones. When we had databases that we could start pulling through. Now we can collect remotely and build an RSMF instead of having phone in lab. (RSMF is Relativity Short Message Format, by the way, which looks brilliant in the Relativity Review Platform. All your little bubbles and who’s talking to who.) Gone are those days where though it used to be phone in lab. “You’re not getting your phone back for hours.” Cool. “Mr. Attorney or Mr. Whoever. Ms. CISO, you want to review messages, here’s an Excel sheet that looks kind of dorky. Or a PDF or an HTML.” Mobile phone data has gotten quite a bit of love. And I think it’s because there’s a ton of data there. We said, it’s always with you and it accesses everything.
We talked about some of our challenges. You know, that’s one of the things that I think has changed for the good for mobile phones, as well as our challenges. Where’s that data? Is it personal or is it company owned? Do we need the full phone? Great. We have a plan for that. Do we just need a single database? Great. We have a plan for that. If a challenge is Andy’s going on vacation and we need his phone, or Michelle’s got a football game tonight (season starting), we can easily get around their schedule by doing a remote collection by, “hey, awesome, you’re grabbing dinner at some point, plug your phone in, grab some food, come back. That’s that single database we collected, it’s collected, and it’s already in the hands of your attorney, or your lit support team, or your HR team.” Like I said, maybe it’s just preservation. Don’t need it for anything other than to hold it for policy, you need it for three months. So the challenge I see right now is just trying to be everywhere at once and convenience.
I’m a firm believer in workflows. Some of our best practices are based on workflows. Having a plan was the…one of the best things of advice at talking to folks at ILTA. Be prepared, have your workflow and know what your data…what you need to collect, how you’re collecting it and where it’s going. Every case is different. Every need is different. Multiple plans. And that way I can always say, “yes, I did it this way.” “Yes, hey, new guy. Here’s how you do it.” Have a plan.
Ongoing training and staying up to date is usually a must, right? I listen to some podcasts once a week, different forensic and eDiscovery podcast. There’s a group of folks out there that are brilliantly smart and can actually explain it to laymen, or not so smart people like myself. People have already ran into some of the issues that we’re running into, right? Learn from their experience. Continuous improvement, continuous training. And as things change, data changes, technology changes, we evolve with it. So make sure you’re staying up to speed, continuous learning and updating your plans.
I think that was going to be it for me. I do have a white paper on this, a few white papers. And I have my previous webinar already kind of posted if you need it. But that was it for me. Michelle, did we have any questions based on ILTA and some of the stuff we saw?
Michelle: Absolutely. Andy, thank you so much for giving us that fantastic overview. And we did have a few questions that came in. So let me start with this one. Okay, so first question is: you mentioned that mobile phones have become more prominent in the eDiscovery space recently. Definitely getting some more love. Can you elaborate on this a little bit more?
Andy: I think so, yeah. I…getting more love. Mobile phones used to be very forensic oriented, if you will. Not really on the eDiscovery side of things. There was never really, you know, especially when I first started, a good way for a layman to review mobile phone data. It was a…like I said, a kind of a goofy Excel sheet that was just kind of overwhelming. You had UFDRs, reader reports from Cellebrite. They were brilliant. But again, that’s something else that maybe I need to load on a computer and review outside of a review platform.
And finally, gosh, maybe it was, oh…you know, you’re going to want to fact check me on this one. Maybe five or six years ago, RSMF came out and we could finally get some load files off of mobile phones and Relatively short message format was brilliant. I could run search terms and Relativity on my phone collections. I compare those against other phone message conversations.
So I think in regards to getting more love, especially on the eDiscovery side, you no longer need to be in a lab with specialty tools to review phone data. You can have your service provider, you can have your law firm, you can have your info sec pull the phone and put it right to your review platform. Speaking Relativity, Short Message Format, we’ll stay with the Relativity side of things. You pull your phone, go right to Relativity along with everything else. So it makes it nice and easy. Everything’s in one spot. And it’s got that awesome little bubble format that everyone is used to in regards to messages back and forth.
Michelle: Perfect. Thank you, Andy. Okay. We have some more questions that came in, and somebody said that they were actually just at ILTA and had a chance to walk by our booth. And collections of messaging is a priority of ours. They’re the typical, “I just need messages and I need them in a review platform.” How do you pull data and is it similar for iPhone and Android?
Andy: I love it. So here at Cellebrite, we do have a tool called Endpoint Inspector. It is one of my favorite tools that we’ve had. I used it as a service provider for quite some time, and it’s definitely your eDiscovery collection method of choice. You might have saw it at ILTA. We were doing some demos and stuff. (I’m going to have to see who sent this so I can reach out and say hello and thanks for visiting.)
Endpoint Inspector will allow you to collect not just mobile phones, but chat applications such as WhatsApp and Telegram, workstations, your Linux, your Macs, your Windows machines, but as you mentioned, Android and iPhones. And it’s as simple as sending an email to a custodian, whether they be at your corporation, whether you’re a law firm and sending it out to a custodian for litigation, or you’re that service provider sending out a bunch of these. You can send out an email to your custodian says, “hey, please follow these few easy steps, plug in your phone, walk away.”
And on the backend, you’re going to set what you need to select, whether you’re collecting maybe an advanced logical, maybe just a targeted data sets, contacts calendar and messages, call logs. You could target your data sets and pull those back. And we’re actually doing some stuff with Relativity to make that pretty easy for you to review. So, to answer your question: yes. It is the same kind of collection method for both Android and iPhone. We pull messages, targeted datasets, advanced logicals, logicals, and you can save that wherever you need. And love to go over a demo with you if you didn’t see it at ILTA.
Michelle: Awesome. ILTA was fantastic. I know that we had so much activity in our booth. Okay, we have one…two more questions I think that we can do in terms of timing. So, another question around ILTA: with ILTA 2024 in the books, the next big eDiscovery event that we are attending is RELFest Chicago. Will you guys be there and can you provide demos for everyone interested?
Andy: I personally will be at RELFest Chicago. I go almost every year, our whole team…we have quite the team going this year. I know I can’t say full team: we will be there. Cellebrite will be there. Come say hi with us. Say hi, hang out, let’s do a demo. Let’s set some time to go one on one. I’m happy to show you our product. I’m happy to show you how it integrates with some of your favorite review platforms. If you’re going to Relativity Fest, you’re more than likely a Relativity user. I can show you how it integrates from collecting data on Cellebrite and getting it right to Relativity. And, you know, we can get the teams together and collaborate. So, yes, I will be there. We have a booth. I’ll probably be in a meeting room doing demos. Come say hi. Give me a high five. Looking forward to seeing you.
Michelle: Awesome. And I will see everyone there as well. Okay, let’s see. I think we have time for one more question, and so here we go. We are a law firm. What kind of mobile phone data can we get into review such as Relativity? Anything other than messages or is it just RSMF?
Andy: I love it. You’re familiar with RSMF, which is the Relativity Short Message Format, which honestly kind of was a game changer when it first came out and I picked it up at that service provider. Everyone needed a good way to get data. You know, just everything in one spot, right? Why do I have mobile phone data here and everything else up in a review? RSMF is a beautiful product. It shows a little bubble views, like we…bubble views for chat messaging, like we spoke about.
But RSMF has a limitation: it is only for messages and has all your metadata behind it. But it’s beautiful, but it’s only for messages and SMS, MMS, chats, whatever. Using Cellebrite some of our other tools built into Cellebrite, Physical Analyzer, you can also create load files. Load files are brilliant for maybe non message data on the phone, or you could use it for your message data, whatever your workflow, whatever your plan says for your data set.
But yes, not only can you do RSMF for messaging, you can build a load file for, say, loose documents on the phone. I must admit, I looked…I pulled a menu, me and my wife were looking at dinner the other day, and I, you know, scanned a little QR code, and it downloads a menu to your phone. That’s a PDF. That load file would say, “what was it? Where was it? How big is it?” And the native itself, and you can upload all that into your review platform. So you can double dip. You can get your messages as well as your loose docs.
Michelle: Perfect. Thank you so much for going through all of that with us, Andy. I cannot tell you how thankful we are for all of your valuable insight today around identifying and addressing not only the current gaps in mobile data collection process, but also for encouraging everyone to build a plan and providing them with strategies on how to optimize mobile data collections. And unfortunately, we are running out of our allotted time here, so we’ll wrap this up. And we will reach out to you individually after the webinar to answer any of the questions we were not able to get to today, so don’t worry. Andy, huge, huge big shout out to you. Thank you so much for such a great discussion.
Andy: Of course! Thanks for having me.
Michelle: Now everyone…thank you! And I remember for any additional questions or to learn on how you can get started with any of our solutions, please feel free to reach out to us at enterprisemarketing@cellebrite.com. Thank you again, Andy. And thank you everyone for attending today. Have a great day.
Michelle: Hi, everyone, and thank you so much for joining today’s webinar, Revolutionizing Mobile Data Collection: Streamline Investigations with Cellebrite Inseyets. I’m Michelle Durenberger, and I’m the field marketing manager here with Cellebrite Enterprise Solutions. Before we get started, there are a few notes that we’d like to review. We are recording the webinar today and will share an on demand version after the webinar is complete. If you have any questions, please remember to submit them in the questions window, and we will answer them in our Q&A section. If we do not get to your question, we will follow up with you after: don’t worry. Now, I’d like to introduce you to our speaker today, Paul Murphy.
Paul enjoys the challenges and ever evolving nature of digital forensics, especially when it comes to mobile phones. He previously spent nearly 30 years in law enforcement, with the last fourteen years as a digital forensic investigator in the world of counterterrorism, where he worked on a number of high profile cases. He joined Cellebrite three years ago and now works as a solutions engineer bringing his knowledge and experience to assist customers on a day to day basis. Paul resides in Manchester in the North of England, where he enjoys mountain biking and watching one of Manchester’s two famous teams: is he a red or a blue? Thank you for joining us today, Paul. If you’re ready, I’ll hand it over to you so we can get started.
Paul: Thank you, Michelle. And welcome everyone to this webinar. So what we’re going to look at today is Inseyets for Enterprise. What is it, and how can it benefit you? So Inseyets for Enterprise is combined of a number of products, which is going to enable you to access data, extract data, decode data and at the end produce your evidence. And all this has come about because the world of digital investigations changes on a day to day basis. People now communicate on many different platforms, use social media more than ever before. And this requires you to change your approach to the way you’re gathering data.
So, what actually is the Inseyets Solution? Well, it’s made up of a number of products. Some of them you may be familiar with, and some of them you may not be. So, first of all, we’ve got UFED. UFED is a longstanding product and is used for the logical extractions, advanced logical extractions from smartphones, feature phones, and SIM cards.
Combine that with Mobile Elite, which allows access to file system extractions from all the modern smartphones. Combined together, this gives you access to devices and data previously unreachable, and enables you to extract the full file system, including data from containerized applications. Talking about things like Telegram, Signal, WeChat, these type of communication platforms. We’ve also got a new version of Physical Analyzer called Inseyets Physical Analyzer, and this is going to enable you to process, decode and analyze your data from the broadest range of apps and data sources.
It will enable you to get immediate insights into key information, such as most visited locations, top five messaging parties, the last ten searches conducted. Also uses AI for media classifications, which enables you to quickly identify relevant media. And also a really useful feature of Inseyets PA is it enables you to process the data only once and reopen cases in seconds as many times as you want. You can save space by exporting cases. Gives you the ability to import them at a later date if you want to review them. More importantly, it also allows you to dive into the hex view or leverage the database viewer or the directory browser. The capabilities of Inseyets’s Physical Analyzer just go on and on and are being developed on a daily basis.
Added into Physical Analyzer is an option to add in Legalview. This would enable you to create the relevant formats of eDiscovery load file and RSMF. These are useful for RelativityOne platforms. Now built into Inseyets PA, we’ve got UFED Cloud, and this enables you to extract, preserve, analyze cloud data using device based tokens, or known user credentials (if you’ve got them), and give you access to valuable data that is not usually stored on the device.
So let’s just look at some of the different types of data that you’re going to be getting. We’ve got on the left hand side there, we’ve got logical extractions and on the right hand side, we’ve got the advanced extractions. This is full file system extractions. And as you can see from there, there’s substantially more data. This is a full file system level. And this gives you stuff like activity sensor data, application usage, application usage logs, the applications themselves, gives you access to the databases within the applications, gives you access to locations, email, device activity, the list goes on. All the data is there using Cellebrite Inseyets full file system extractions.
So let’s just have a look at a average workflow when you’re using Inseyets for Enterprise. It’s made up of a number of components and from the left hand side there you can see we’ve got a mobile phone. We’ve then got a piece of hardware called the TurboAdapter. That’s connected to the software on your workstation called Inseyets UFED. And that, in turn, requires a connection to Cellebrite Enterprise Vault Server, which is a cloud based server where all the resources, mechanisms needed to exploit the device, that’s where they sit.
So, first of all, the user connects the device to the Server Link Adapter. The UFED software identifies the device and makes a request for resources to the Enterprise Vault Server. These are downloaded back to the UFED in an encrypted form where they move to the Turbo Link, which decrypts the resources and exploits the device. Now the number of resources will depend on the exact make/model of phone.
So steps three, four and five may happen two or three times during the initial exploitation of the device. Once you’ve exploited the device, it’s as simple as the user selecting the acquisition type that they want to do. An important point to note here is that there is no exchange of user data between the device and between the Enterprise Vault Server.
So what I’m going to do now is just switch over to a live demo of the product just so that I can show you what the dashboard looks like, and then we can look at a phone that’s actually connected to it and see what type of extractions we can get from it.
So this is now a live running version of Cellebrite Inseyets. And as you can see here at the top of the screen where it says resources were connected to a server, which is providing us access to the resources we require. It’s made up of three main components. So we’ve got here, we’ve got full file system extractions from supported iOS devices. Here we’ve got full file system extractions from supported Android devices. And here, we’ve got you UFED, and this is the UFED for PC that you may well have used before. And this is quite a familiar tool to a lot of people has been around for quite a number of years. And you move between the two using this.
So what we’re going to do is we’re going to have a look at an extraction of a device and look at the different features that are built into Inseyets because there’s a number of new features that are built in one of them being quick insights. Another really important one, a really useful one called Streamline, and this enables you to streamline the process from the extraction all the way to decoding and reporting. So it’s as simple as selecting the operating system.
In this case, we’ll go with Android. The workflows are the same. And we’ve now got a choice to make: is it locked or is it unlocked? But what do we mean by unlocked? And unlocked means that the passcode is known or no passcode is set. And locked means that the passcode is unknown. Inseyets for Enterprise comes with the unlocked version of the tool. If you’re interested in the locked version of the tool, if you can speak to us separately, we can discuss options with you. But for now, we’re just looking at the unlocked side of the tool.
So you need to enter a few case details. This is configurable so that you can match it up to your workflows. Follow the simple instructions to prepare the device, and then you’ll see the instructions here. Number one, connect the Turbo Link to the computer. This will turn green when it’s ready. It will then initialize the Turbo Link environment, which simply means that the environment is verifying its connection with the Enterprise Vault Server to get the resources. This will then turn green. And then it will ask you to connect the device.
Once you’ve done that, it takes a few minutes and it gets to this state. And this is now ready for us to make extractions from the device. So what’s been happening in the background? If we look at the progress console bar, we can see here that it’s downloaded a number of resources. And you can see here as a resource there, it’s been downloaded. There’s another one here.
This one takes about three resources to download. And it gets into a state where we can exploit the device. We mentioned earlier a number of other features that are available with Inseyets. One of them being quick insights, which is a new tab that’s been added here. And if we look at that, this is going to give us some really relevant device information before we’ve even done an extraction of the device. And if we look at one or two of the items here, we can see…here, you can see the last SIM cards that were installed in the device.
You can see some information about the IMEI number, the chip set, the make/model of the phone, the battery level. If we look at this page, we can see here the email associated with the device. We can see here a list of different accounts on the device. You can see here, it’s an Android device. We’ve got Samsung account. We’ve got Signal account and various other Facebook accounts in between.
And on this page, we can see (which I think are two really relevant items), we’ve got a list of the Wi-Fi networks that this device has been connected to, but also a list of installed applications. And this is searchable. So you can actually type in the top. If we look for Telegram, you can see that org.telegram is installed on this. Worth checking this before you do an extraction, because if you’re particularly looking for an application, why don’t you just check it’s on the device? Because if it’s not on the device, it’s safe to do the extraction in the first place.
There’s an export option here, which allows you to export it to a PDF. So going back to the device status, we’re now ready to look at the different extraction methods. So if we simply click into that, we’ve got two options: we’ve got streamline and manual. Look at the manual one first because this is what you may be familiar with. And this gives us a number of options. We can do a full file system extraction. We can do a selective file system extraction. If it’s an older device like this one, you can actually do a physical extraction. We’ve also got triage built into this, and this is a first to market full file system scan of mobile devices against a set of criteria and profiles which you create.
Full file system extractions, depending on the amount of data, may take quite a long time. So sometimes it’s worth considering doing a selective file system extraction first. And that means you can actually start working on your data before you’ve done the full file system. You’ve actually got some data to work on at this point. And if we look at what’s included in there, we can see here that I can just pick off the data types that I want to extract quickly. For example, I could just take chat applications, I could take finance, or I could simply take Telegram and WhatsApp and just do an extraction based on those. And that would be fairly quick.
So if we now look at Streamline, Streamline is a new feature that’s been added into Inseyets. What this enables you to do is start the process, the full file system extraction, start it running, and then in the background, it will also create you a case in Inseyets Physical Analyzer. It will decode the case, and it will also create a report for you. And this enables you to almost streamline your workflow process because no longer do you have to sit in front of the machine, waiting for things to happen, clicking buttons.
And it’s as simple as deciding on what type of extraction you want to do. In this case, we look at the full file system extraction, fill in some case details. Do we want a report creating at the same time? So we’ve got a number of options here. We can actually just go with decode and create a case only, but we can also select from a number of different report options there, including UFDR, PDF, HTML, Word, XML, and if you’ve got the Legalview option, you can use a Relativity Short Message Format, the eDiscovery load file as well.
So we’ll just select that one. We can actually move the report to a particular save location using this button here as well. And this is a summary of what we’re going to do before we actually do it. So here we’ve still got an option to change the path of the extraction. This can be set globally at the beginning in the settings, but you’ve still got an option to change it through each individual extraction if you want to. We’re using the Streamline method. We’re doing a full file system extraction and we’re also going to create a UFDR report. So it will be as simple as selecting extract from there.
This will then start the full file system extraction, but it will also call Inseyets PA in the background. There you go, perform full file system and it will start running on that. And if we take a quick look at what’s happening with Physical Analyzer. So this is, for those of you who have not seen it before, the new Inseyets Physical Analyzer. We’ve got a new tab now called pending cases. This is the case that I just created. I created this and it’s now doing a full file system extraction.
So as you can see, Inseyets PA has also turned into a dashboard for your extractions. What about these other ones above it? So we’ve got a number of ones on hold, and these are simply on hold because I’m actually using Inseyets PA to do some work on today, so I don’t want it taking over the processing power. But when I go home in the evening, or on a Friday at the weekends, I can simply click resume all and pause all, and this will queue up the work for me to do. So once I come back to it, these will all be processed and reports generated.
So just moving back to Inseyets PA, you can see here now cases, you have as many cases you want on here, and they open really quickly. The process once stored in database file that takes a few seconds to reopen them. At the beginning, I mentioned about some insights into the data you’re going to get. Here you can see visited locations, last ten calls, messaging parties and media classification. Also mentioned Cloud Analyzer. That’s here, and that’s built in. Everything down this left hand side has got device tokens and enable you need to bypass two factor authentication and on the right hand side, any of these, if you know the user credentials, you can simply enter these and you can gain insights into that cloud data as well.
That’s a quick overview, Michelle of the Inseyets. Is there any questions that we’ve got so far?
Michelle: Yes, absolutely. Thank you so much for that demo and the overview. I know that it’s great to see the capabilities of our product. So, thank you. Okay, we have had quite a few questions. So, let’s start with this one. Okay: I need to be able to access data from several chat applications, including WhatsApp and Telegram. Is that possible with Inseyets UFED?
Paul: Yes, of course. Inseyets UFED enables you to extract a full file system as we mentioned earlier. Once decoded in Inseyets PA you’ll have access to a lot of data including all the chat applications all the way down to the database structure of them. You’ll also be able to browse through data at a file system level and just validate all the information you’ve got.
Michelle: Great, that’s fantastic to hear. We do have a few more questions. So, let’s see. We normally have access to the pin code for phones we examine, but if we do not, what could we do?
Paul: So on the initial screen I showed you, there was a locked and unlocked option. So Cellebrite Inseyets UFED has an additional option to access data from locked devices, including using supersonic brute force and using what’s called the after first unlock mode. And if you’re interested in this side of the tool, if you contact us directly, we’ll be able to discuss various different options with you.
Michelle: Great. It’s good to see that that’s an opportunity there. Okay. And: does Inseyets UFED support the latest iPhones and operating systems?
Paul: Yes, it does. And for unlocked or known PIN code devices, Inseyets is able to extract a full file system from iPhones ranging from the iPhone 5 (if anyone’s still got one of those) all the way to the iPhone 15 running the latest iOS 17.6.1. Inseyets also supports an unrivaled range of Android devices allowing for full file system extractions of those devices as well.
Michelle: Perfect. Okay, a few more here. Okay: from what I understand, Streamline is automation. What happens to the queued and pending cases when I am not there to process them?
Paul: So Inseyets and Inseyets PA is acting as a dashboard now, and it enables you to maximize the efficiency through the automation, which is Streamline, which is what’s built into Inseyets. The simplification of the entire examination process. That’s from device extraction to reporting, just a few simple clicks. So when you’ve gone home in the evening or the weekend, Streamline will continue to work, processing your data into cases and producing your reports ready for when you come back.
Michelle: That is incredible. Definitely a time saver. Okay, I think we have time for one more question. Let’s see. Oh, here we go. Ah! We use RelativityOne Review Platform. Can I create the required file formats directly from Inseyets, and can I also push them directly to RelativityOne?
Paul: Yes, so we mentioned earlier, there’s a Legalview bolt on to Inseyets Physical Analyzer. Using this option, you can create exports in our RSMF format and also eDiscovery load file format. We can also use the insights PA API that’s built in to push these files directly into your RelativityOne instance into the staging platform.
Michelle: Perfect. That is great to hear. And I think that that brings us to time. So, Paul, I just really wanted to say thank you so much for your valuable overview of…into Inseyets. It was so useful to understand the various features available in the platform and how people can use them on a day to day basis.
Paul: You’re welcome, Michelle.
Michelle: Thank you. Unfortunately, we do have to wrap this up. And if we did not get to your questions, please know that we will reach out to you individually after the webinar to answer your questions that we didn’t have time to get to. But Paul, again, thank you so much for a great presentation and remember, for any additional questions or to learn about how you can get started with any of our solutions, please reach out to us at enterprisemarketing@cellebrite.com. And remember to follow us on Twitter, Facebook, and LinkedIn at CellebriteES, that’s Cellebrite Enterprise Solutions. Thank you again, Paul, so much. And thank you all for joining us today. Have a great day. Thank you!
Dan: I’d like to welcome everyone to our Oxygen Forensics Tech Takedown webinar today. Our topic is A Remote Journey. My name is Dan Dollarhide and I’m the director of global solutions at Oxygen Forensics. I will be off screen manning the chat today. Sharing a screen and guiding us on the remote journey using our Oxygen Remote Explorer tool will be our vice president of training and technology, Keith Lockhart. We will be getting started momentarily, so buckle up.
Keith: Hey everybody. My name is Keith Lockhart. I’m the VP of technology and training at Oxygen. So we decided to get together and do this webinar called A Remote Journey because we want to talk about ORE. Which is Oxygen Remote Explorer.
And if you’re a detective user or an Oxygen user and you have OFD, Oxygen Forensic Detective, this is the natural evolution of, you know, maybe doing local extraction when you’re hooking up a device at your desktop to, let’s go get things from other places around the world remotely. And we want to see how that works. So I’ve got a few slides together just to keep us on track from a talking point perspective, and I have several windows open because I want to show different aspects of interfaces and things like that, but let me just start here and give a little background.
So, like I said, with Detective, if you would start a device extractor and, you know, let me just play along while I’m talking about this. If I came here and (that’s our AMC, we’ll get back to that in a second) and I ran the ORE interface, which is already running. Kind of like Detective with a few minor differences, it won’t make a difference for this conversation. And I ran the device extractor locally. That means I’m running this application, hooking up a phone with a USB cable and doing everything extracted locally to this PC workstation, whatever I’m on right now. That’s fine. That’s what we know, but we want to move away from that model in this conversation.
Think about, “oh, I’ve got a PC…”, and this is kind of like an enterprise technology that you would hear about from a PC perspective, but it also includes mobile devices, because that’s how we roll and what we do. In the old days, you know, if something happened on a machine somewhere, I may either have to travel to that machine or have somebody take it down, package it up, and travel it to me, you know, with a FedEx or UPS or whatever it was.
And, you know, that was the days of, “wow, that’s tough because I’ve got to have a whole office building full of machines, if something happened, I needed to respond to.” Well, then we come up with the world where, “hey, let’s send agents out those machines that I can control with a console somewhere else and not have to get on a plane, and I’d have to ship a hard drive somewhere,” or things like that.
So that’s the one thing moving away from a desktop, local in place thing and getting the ability to get to machines or endpoints or workstations, whatever you want to call them, remotely. Well, then we add in the mobile device factor. Not just PCs and workstations like that today, but phones. And how are we going to accomplish that rather than hooking up with a USB cable right here at the desk?
So ORE, you know, Oxygen Remote Explorer, has two components to it. One is this Explorer interface, which is kind of like the Detective one you’re used to, the one I just started here. Okay, there are a couple differences. But one of the biggest other component differences is this Agent Management Center.
And we’ll come back and look at it in this perspective, from this interface in a minute, because I already have it running administratively, we call it the top admin, you know, somewhere in the world, there’s one Agent Management Center that rules them all, essentially, and this is what I want to walk through, because this is a new component that allows us to collect and generate endpoints from our workstations and mobile devices, you know, make jobs to run against them, make profiles for those jobs, schedule jobs, maybe over the middle of the night or whatever it is, build other users into the system that can log in and do remote work and things like that.
So let’s walk through this interface and we’ll just take for granted everybody knows the Explorer interface like Detective and have another conversation about that another day. So here’s the AMC, as we call it: Agent Management Center. And what’s on the screen right now is an endpoint list. And if you can see in this list, I’ll just say that I’ve got a couple groups generated in my endpoint list. The red group is a group of PCs. The green group is a group of mobile devices, right?
And what I’m allowed to do here is add an endpoint, you know, throw an IP address out there or a machine name, credentials to that, and then deploy agents to those endpoints, whether they’re PC ones or mobile devices. Then I can assign those to groups, I can issue licenses to them, I can see all kinds of pertinent information about them here, and I can run tasks against them.
So this is my endpoint list, and look, in a corporate environment, there could be 10,000 in this list. In my house right now, I’ve got six going. One’s offline, the other five are online, but, you know, consider this conversation at scale as we move along. So here’s our endpoint.
And again, they’re separated into groups right now because the next thing in the list right here are groups, which allow me to segregate devices for later, maybe scheduling jobs against a group of devices, kind of like an IP range, or inheriting access to a group of devices from a user perspective, so I can assign a user to this group as well as devices, and we’ll see that later. Okay, so I’ve got a group section and it’s very simply, “hey, create a group, you know, make it a color, name it, and comment it”, and then assign endpoints to that group or take them away, okay?
Profiles. Here is an interesting conversation. A profile is what we’re going to build with the variables and qualifiers in it. So when I say, “look, I’ve got this computer hooked up right now. I want to collect the following information from it, or I’ve got this mobile device hooked up right now, I want to collect the following things from it.”
So let’s walk through this process because it’s super important. I’ll create a profile, or click the button to create a profile, and you know, I can just name this “webinar test” or “demo”, whatever. But here’s where things start going. Is this a workstation or a mobile device profile? And we’ll go with workstation first, because that’s half the equation. And we want to point out that, or I want to point out that when a workstation world we’re talking about Windows, Mac and Linux, right? All the flavors that we want to get through here.
And then, if you’re a Detective user, and you’re familiar with the technology called KeyScout, this section of tabs here might be very familiar to you, because in this workstation profile, in this profile of things to collect, what do I want to go after? Well, if I click search…well, in general, I can just do this, you know, all applications, just browsers, just messengers, just some kind of prefab filters there, or I can start from scratch.
Do you recognize this if you’re a Detective and a KeyScout user? What paths do I want to search through specifically and at what depth? What places do I want to avoid specifically to not waste time? If I have other passwords, I can put in this list to try to go against password vaults or other things that have passwords. Put them here.
From a file filtering perspective, I can add all kinds of rules. You know, “hey, does this file name include this? What are the date ranges, sizes, file types?” From an application perspective, well, what do I want to collect or not? From all these different applications, and you can see Windows, Mac, and Linux variations all the way through this. From a system artifact perspective, you know, file system type things, registry type stuff, what am I after? From a memory perspective, why do I want to grab processes, keys for different encryption things, file handles? What are you after there? And do I generate some YARA rules from a malware analysis perspective?
All of these things can be built into one profile for a workstation. Kind of crazy like that. And then, do you want to recover deleted files? Put a description in there. Terrific, and you can save that. So that’s workstation flavor. Let’s do a mobile device flavor.
Okay, so general, am I doing a full extraction? Which is kind of what you see, what you get type thing, all files, all sections, applications, activity, media, preordained filters there. Or do I want to get gritty? You know, and maybe filter by operating system, additional files, application files, audio, databases, documents, pictures, APKs, WhatsApp, shared media things. I come over to data, you know, apps, browsers, calendars, some of the default things you would see, like if you’re maybe using an Android agent locally through USB cable or not. You know, and then iOS applications.
Do you want to target your iOS extraction? Very cool. Target collection is big name in the game these days. What do you want to pick? Everything or not? Android applications. This is fairly…all really new. If you’re into an ORE world, we finally started targeting third party applications for Android now. And WhatsApp is the name of the game to start that out.
If you think back to the Android agent locally, that was not even available. If you could get the Android agent on a device with an OTG device, you had access to a third party application menu, and I think there are probably 18 of them where you can go get individualized application data off your Android. You know, if you got it right on the phone and you had it in your hand. Now we started that process for remote. Very cool as this grows up.
So we’re building profiles and we’re saving them. Right? So here’s an Android targeted profile. Here’s a C-suite collection profile that hasn’t been run. Here’s a new local targeted mobile phone. I mean, here’s a top secret custodians only profile. Because the point is, what are you trying to collect? Who are you trying to collect it from? Maybe you have different needs for different groups of people or different devices or different whatever. So you have a whole repository of profiles that you can build with all those different parameters to target very specific things or full things or whatever. And here’s where they’re maintained.
Here’s a list of tasks, a list of profiles that have been run against different devices. Or canceled, or whatever they are. This is just a big repository of things that have been done on the AMC.
Here’s the schedule section. So look, if I create a schedule, what I’m going to call this? We’ll also call this “web demo”, and I could put a description in there. So I add that. That’s fantastic. Do I want to add an endpoint to this? Well, here’s my endpoint list. Let’s say we’ll do that computer and that computer. Terrific. Then what do I want to happen here? Well, I want to…details, great.
Additionally, I want to run a weekly task starting on a particular date and a particular day and time, and then I want to add different parameters. “Hey, do you want to queue things up? If there’s already something going on, do you want to export the data over to the ORE Explorer interface when it’s done?” So, you set a big thing to go overnight and in the morning when you come to work, you want it all to be processed and waiting for you. If there’s a failure, do you want to retry every so often and how many times? If it’s longer than X amount of time, you know, stop it or whatever. So a lot of different variables you can build into a task like that, or a schedule like that. We’ll come back and try to do it at the end so it doesn’t go and take up all my space.
Agents. So here’s a repository where we build the agents that we want to deploy, whether it’s a workstation or a mobile device. And there are a couple of things I want to point out specifically. One of them is when we create an agent, what do we want to call it? What operating system are we dealing with here? So we have to have a little conversation about how workstations work versus how Android devices work and how iOS devices work.
So if I’m making an agent to deploy brand new to a workstation, let’s say it’s a Windows one. That’s fine, the version of the agent that’s being created. Here is something super duper important. When this agent goes out in the world, and it’s supposed to bring data back, or send data back, where is it sending it to? So the default is this, you know, local machine loopback address here. However, you can see up above, these three agents I’ve created, two of them are pointing at 10.0.0.199 inside my local network. My AMC machine is 10.0.0.199.
However, what if I had machines outside my network, somewhere else in the world, that I wanted to deploy out there, but still call back to this particular location? Well, you can see here is a public IP address for a remote workstation. I just named it remote. I named it local. And my description is “this is for machines outside the local network.” No, that is not my actual public IP address, all of you that are like, “we just got Keith’s public IP!” No. However, if I was to set up a port forward to this machine and use my public IP address, that agent would knock on the door, come through the port it’s supposed to and have access to the AMC environment, the ORE environment here to do the work it’s supposed to do.
So you’ve got to think about that, right? And agents are not licensed. The only thing that really gets licensed here are endpoints. And we’ll talk about that a little more as we get into some of our additional point list. But you could have a whole list of agents that are local that are remote. You could have a Windows set of agents…like almost every one of these deserves a local and remote, right?
A Windows local network remote one. Linux: local network remote one. Mac: local network remote one. Android: local network remote one. Now, you don’t see iOS here, so let’s have that conversation as well. The Android agent today, the grown up, recent grown up version (I guess it goes in growth spurts) is now an agent that we can deploy to a mobile device that will collect from anywhere without necessarily needing friendly hands to do anything.
And what I mean by friendly hands is sitting…let’s say Dan is my cohort in crime here. Dan is in Alabama. Dan and I routinely do this exercise where Dan can log into this environment, control…build an agent that would talk back to his house, control this AMC, and say, “okay, Keith, I’m ready to collect that phone.” Because I’m an employee of Dan’s somewhere in the world. So I would hook up the phone with friendly hands and initiate the collection and Dan can pull it back to his house.
So that’s one way of doing things. And that’s the current way of the iOS environment. And we’ll talk about that a little more in a second. But the Android environment has grown up now to where if Dan deploys an agent to this Android phone, I can take it anywhere and Dan can call it from afar and I don’t have to do anything. So you can apply lots of scenarios to that. But that is the way an Android agent has had a growth spurt in recent builds of ORE.
Okay, I do want to then come back to the endpoint world and say, “look, from an iOS perspective, we have to close that loop on agents and how they deal with iOS.” So a workstation, like, look at this Burnett iPhone X. That’s a phone. It is currently online. It is assigned to the O2 workstation, because the O2 workstation down here is a machine. For iOS (and if I do this and just let’s see if I can smash that there) here is the workstation, O2, that has that phone hooked up to it right now.
And there is a remote device collector on that machine. This is it running. So, from the iOS perspective, I would take a machine, I would add an agent to it, and on that agent, I would also initiate a remote device collector component. So, not only could I collect from this machine, I can use it to collect remote mobile devices.
Now, the way the mobile device world works, look, this one, this Burnett iPhone X can only be collected from this O2 machine. The iPhone 7 can be collected from any machine out there with a remote device collector on it. So you have that kind of, I don’t know if security is the right word for that, but you have that kind of lockdown ability to say, “yeah, no, there’s only one machine in the world that collect that phone. It’s in a closet, and only one person has access to…”
I mean, again, build whatever scenario you want there. The remote device collection of an iOS device will require a friendly hand at this machine to hook it up, run the RDC, and once it’s connected like this, then you can initiate a task back here, run a task, and have that whole process complete like that.
Okay, so big conversation about profiles, big conversation about agents, especially when it comes to, are they local or remote, and what exactly are you trying to get? A Windows one, a Linux one, a Linux environment, or a Mac environment. Okay. Then we have users. And I just have four users in here, but each user has its own kind of nuance.
So first, let’s go look at the set user roles. If I make a user, what do I want this user to be able to do? Be a complete administrator? Do whatever? Or be just an operator with some limited things? Or I can set up a guest role, kind of like when you put that guest code on your garage door opener on the outside. Or I can make a whole custom one. Brand new role to anything I want and assign people different things to do. There’s like a whole permission matrix of, “hey, can you make a schedule? Can you make a user? Can you delete users? Issue licensing?” Right?
And that might be important, but, you know, reserve that for the administrator because your available endpoint licensing loops down here in the bottom left, I have 64 licenses left every time I hook up an endpoint, that decrements a license here. That’s where your licensing comes into play. So it might be important to reserve that only for special people, but I can assign permissions to users this way. So I’ve got an Amanda user, and interestingly enough, if I edit the Amanda user, Amanda is part of the green group, right? So I think that’s my mobile device group.
So if Amanda logged in, all she would see is the green group devices or endpoints. I’ve got a Dan user. I think Dan is assigned the red group. And we’re going to use the Dan user here in a minute to actually see how that looks when the Dan user logs in because Dan is also an operator. So, take a mental note: I’m logged in as the administrator right now of everything. I’ve got one, two, three, four, five, six, seven buckets of things over here to do. I don’t think Dan will have all seven.
Klavdii happens (oh, let me just look over here) Klavdii is assigned everything, unassigned green group and red group and Randy has no groups at all assigned to Randy. And I can see their individual permissions and things like that. So, watch this nuance, right? So, I am somewhere in an ORE environment. There is an agent management center, as I said earlier, the one to rule them all, kind of the top admin. One of those runs all the time somewhere so other people can do remote work if they need to do it through this AMC if it has to come down to it like that.
So I’m going to put this, just minimize it like that. We’ve got top admin here. And then I’m gonna come back to somewhere in the world somebody has the ORE Explorer interface, right? Two components in this environment. One place has the AMC server to rule them all, always running, other places in the world, there are users that have this interface, and they say, “that’s great. It’s time for me to do some more collection.” So I’m just going to start my interface to the AMC. Oh, I’m already logged in as Dan. Let’s disconnect. That ruined my surprise.
Okay, so I’m going to go start up my AMC, and I’m looking for this one, this top admin right here at 10.0.0.199. I’m going to log in as my Dan user. I’ve got my password here, so I can connect. So now Dan’s logged in, and if I look, Dan’s green on my board here. Where’s Dan logging in from? Terrific. When’s Dan last logged in? Terrific. And here, Dan only has access to the red group environment things, because that was, as we looked, I look at groups over here, for Dan, Dan has red group assignment.
So Dan can’t see the green things. He can only see the red thing. So it’s kind of permissionally for Dan inheriting like that. And Dan has endpoints, groups, profiles, tasks and schedules, not agent library, not users, because he’s an operator, not an administrator. Okay. So big premise there. You could have a thousand users, not licensed. Available endpoints are what’s licensable for the AMC environment as you hook up different endpoints.
Okay. What I want to do for a second before I talk about some of the additional points we want to do is come over here and see if we have any questions about this conversation to this point. Okay, so I don’t see any actual questions there, so that’s fine. That being the case, I want to come back here. And, okay, we’ve talked through those things about the interface. Let’s talk about this. So, covering Windows, Linux, Mac, Android, and iOS. So we’ve had that conversation, this is true. Allows for targeted collection. We’ve had that conversation. This is true. Workstation collection. Ah, scheduling.
Okay, so let me come back here and I’ll just minimize the Dan log in. I’m going to come back to schedules and we’re going to create one and try to run it against some endpoints here. Now, let’s go to endpoints. All of my PCs that are online right now are in the red group, okay. So I’m gonna come back to schedules. We’ve got our web demo. I’ve added those, so I’m going to remove those. And I’m just going to assign the red group to this.
Okay, so I’ve got three endpoints, three users, terrific, PC workstations, great. Then…what time is it? 10:31. So I’m going to come over here and do this once, and I’ll start it today. (Today’s the 12th, right? Yes. 10:31. How did I get to 11:17? I don’t know. So 10:32. Apply that.) Okay, so I’m right on the cusp of 10:32. Let’s do a scheduled workstation profile. I’ll apply that. Got my profile set now. Got my group. Let’s go back and just watch the endpoints and see what happens. I can see I ran one earlier and canceled all of them as they were going. So I’m just going to stand here and watch until 10:34 and I saw a question in the interface here. So let me see if I can get to that.
Is it imperative to know Dan’s IP address? Amy asked this question about that. Is it imperative to know Dan’s IP? Yes, Amy, if we were going to create an agent to call back to Dan’s house, we would need to know Dan’s IP address for sure. Okay, hope that answers that question. Kind of like his public IP address versus mine. And you’d have to port forward at that end to get through the door or, you know, whatever kind of VPN…however you set that up is kind of on you, but you’d obviously have to know his to make it go back there. Amy, good question. Okay.
Doctor, what are the most features that enable option software to be distinguished than other forensic tools? Big conversation, not quite for right now. So I’ll come back to that if we have time. So, next question. So this is tailored towards organizations, which would work for like an ESI collection. We could work with the IT staff to allow access for collection. What if I want to use it more as a collector system for like an individual client that has a single computer and a single IFR Android is that possible? Ty, absolutely. Great use case. Right? You could be a service provider that works with one person that says, “hey, listen, I got to do this.”
You know what you say? “Hey, I need to send you an agent. I’m going to teach you how to deploy it on your machine, and then you’re going to hook up the phone for me, and I can collect it back to wherever.” Or in, you know, maybe it’s a consenting victim, maybe it’s a..who knows what it is. But Ty, you can absolutely do that use case, which is, I want to use it for more as a collector for an individual client that has a single computer and or phone. Is that possible? Yes, it is.
The third trial versions of what auction software? You know, Dan, you might have to put in this…we set up POC, proof of concepts, for our enterprise technologies versus a trial like that, I think. I’m going to hope Dan can clear me up on that answer. So I presume you could use (new question) I presume you could use a solution like tail scale to create a VPN network that doesn’t require port forwarding, right? Jacques, I think that’s, you know, networking, not my forte. There are a few ports that have to be open for communication above and beyond that you can secure that connection or make that connection any way you need to. 23891 is our port of communication by default. As long as you have that free and clear to connect to, you can tunnel your way there any way you need to. So, good point there.
Oh, and I’m watching that, not even looking at my screen. So if you look back at my screen, this PC has kicked off its process down here. So I think I made my 10:34 mark. (What’s this one doing?) So is this one. (And what is this one doing?) So all three of them, the red group, kicked off at 10:34 it looked like, and I think that profile is go collect Firefox. So that’s doing Firefox, that’s doing something, and that’s doing something, but this one’s already got to its Firefox point. So cool, I created a schedule, made my train stop time, and all three of them kicked off and are doing their thing. Now, from a time perspective, or no…from a size perspective, I just want to cancel them all because I’m populating space that I don’t know that I have right this minute, but that’s okay, I’ll let it go.
Let me come back to the questions over here. Okay, nothing there right this second. So then I’ll come back over here to these points. So workstation collections can be scheduled, we just did that. No recurring cost. And I said, you know what, Dan, you can have that conversation. But I think Dan’s point to that at that point was, yes, look, once you license an endpoint, I’m pretty sure the default endpoint count when you purchase an ORE environment is 20. And then there are, “hey, you want to purchase 20 more? You want to purchase…?”
I don’t know what that scale is. Price point per number of endpoints. I’m sure a great sales person beyond me could have that conversation with you. But once those are…that’s it, that’s the only real cost. And once you license them, your endpoint, your environment, you know, you don’t have to redo it every time. It’s not like you’re charging per collection.
Once it’s in endpointed, it’s an endpoint, and that’s what you’re paying for. So the cost savings versus travel. Yeah, that…look, if there’s one big pain point, like what does this do for me? It eliminates you having to have 30 people in an organization package up their phone and send it to you and take them offline all week so you can collect data from their phones for your, you know, “oh no, we’re getting sued!” You know, for your ESI, and man, it’s just crazy that you can do all this remotely and save all that time, effort, and pain. I mean, think of the COVID implication all by itself. Nobody’s going anywhere, and we don’t have to.
Super important point, though, this is on prem. We don’t host any of this. We don’t do your data. We don’t want your data. We don’t want anything to do with it. This is all you, right? So, however, you want to host it, whatever you want to hook up to, to collect you because it’s big, you know, that’s all you. But we do provide deployment. So, let’s take the example: Ty’s question.
So if it’s tailored towards this and I want to use it like this, I want to do this…hey, we can help you set that up, you know, I mean…here’s the kicker when people do what Ty had asked about single computers, single iPhones, helping a client like that. What I had mentioned was: look, there’s got to be a couple of ports open for agent deployment and or communication back through 23891. Again, not a network person. Anytime you’re doing the one off like that, you may be rolling into an “I don’t know what’s going on in this network situation.” And people do that. People have done Ty’s model before and they call us up going, “hey, I can’t get this to work.”
Then it’s a big exploration of, “okay, tell us about the environment.” And, you know, people get kind of frustrated. It’s like, well, “what about the environment?” “Well, what firewalls are up? What security things are in place? What ports are not allowed to talk?” We got to work through it. I mean, I can tell you the ones that got to be open by default.
And then we have to start troubleshooting network connectivity and, you know, access. And it’s kind of a…you can do this. You might not be able to do it immediately until you sort those issues out because people want the instant gratification, but they’re going into an unknown environment. So you just…a little bit of expectation management that learn that environment because once you learn it (back to Ty’s scenario, and other people would have done this) whenever that client needs another phone four weeks from now, you talk to him and you say, “hey, remember that machine we set up? Keep that set up that way. Go back to that machine. Hook up the device.” Because you’ve got everything, now it’s static, and you’re not unknown every time. So if you’ve got repeat business from that person, you’re in good shape, right?
Okay, Dan had said 20 endpoints, correct. Thank you, Dan. And the next question are you able to change those 20 endpoints? In our corporate environment, I’d only connect an endpoint, PC or phone for a collection, then remove it again. No, you can’t change those. Once they’re in there, they’re in there. But I believe there is a way, if you are retiring equipment out of your environment, because time. These machines are no longer going to be…I think there might be a way to do that so those endpoints in your license pool can be reused for new stuff.
Maybe Dan can write back to that as well. Because the question goes on to say, does that use up one of the 20 even after you release it? We can’t release it. So, yes, it uses up that one of the 20, or can you add a new endpoint repeat as long as you have 20 or less at any 1 time? Yes. So…I mean, no. Those endpoints, once they’re licensed, they are licensed in the environment for good. That is the decrement from your endpoint count. Adding new endpoint licenses is cheap and can be priced in bulk. Good answer, Dan. Keep is a good word. Bulk’s a better word.
Okay, so going back to our points for discussion. Ah, those were our points for discussion. These are good questions, so I’ll keep watching here. This is the point, right? And, matter of fact, let me see. So I’ve got many machines and many devices on while people are putting questions in as they continue to do that, I’m going to get gutsy and just try to come over here to this…this is a Google Pixel that I just have emulated on the screen here.
Let’s see if I can start the agent here. And I’m just going to do one of the magic ones. I’m going to run the Android targeted profile task against that. (Oh, where’d it go?) Okay, so at no time do my fingers leave my hands, but that agent is out there on that device now pulling WhatsApp data. And you can…I’m just emulating it so you can watch it do its thing. And you can see down below here that job is started and it’s a targeted collection against just WhatsApp, which is super, super cool, by the way. So a couple other points: this collection…any collection is encrypted for transport, right? We want to secure it. So, you know, it’s not out in the wild for people to steal.
Oh, there’s a question. If there’s Oxygen software tools to make forensic analysis of color laser printouts as digital printers work from computers and mobiles to…make forensic analysis of color laser printouts as digital printers work from computers. So, if it’s a printed document (I’m not sure if I understand that question completely), but a printed printout we could scan and import and potentially OCR, optically character recognize, what’s in that print out. If it’s a print spool, we could possibly analyze that just in the interface as a print spool to see what might have been printed. Doctor, I’m not quite sure if I’m interpreting that question appropriately. So maybe you can add a little bit more to it.
Okay. The jobs that are created that say “add data back to the interface.” (Oh, gosh, are they still going? Oh, wow.) So, that workstation completed from our scheduled job. All of our workstation jobs completed there. I let them go. Oh, boy! Let me just go…having said that, let me go look over here. I don’t know if I had those selected to import into the interface or not. And maybe I turned that off. Oh, that’s a good thing. Because I’m just…as I realized this morning, I’m kind of low on space on this machine right now, and it does not look like they imported. Good.
I could go get them, but that’s fine. This one however, I believe will import when it’s done, but I don’t know when it’s going to be done, so that’s fine too. But again, targeted collection. Shouldn’t be too big and that’s, you know, within scope because it’s targeted and I’m not collecting, you know, this isn’t like a…to be clear, we’re not exploiting a file system with this. We are getting what we can get and we have a lot of access with the agent. But this is not, you know, some physical export of the phone.
Oh, Dan and I did talk about a really interesting point, though, that I will bring up. So here’s a PC. I can right click on this PC and do some really interesting things. Grab memory, get a file lift, kind of a field mode extrapolation of what’s on there, or even today, I can capture…I just have a hard time recommending this, but I know there are people that it’s policy for them, I can capture a full disk image if I really got to do it, in the forms of E01 or DD, if that means anything to you, because you know that is, great.
If that doesn’t, that’s just capturing the entire content of a hard drive or partition by partition if we want to, as you can see from our menu up here. And collecting it back into an evidence file format for processing into Detective or the ORE Explore interface. So that’s kind of a recent development to customer requests. You know, I’m like, “wow, I would not be gathering up 10 terabytes over the network from afar, but maybe your network can stand that.” And again, maybe that’s your policy. So I get it and it’s in there now.
So the Android agent extraction from anywhere, will it be able to get data via cellular? Is it Wi-Fi only? Should Android be on corporate VPN to work? Excellent question. And the answer is: anywhere. That Android agent, once deployed to a phone and, you know, not today, but in the future via MDM, that can go anywhere. Wi-Fi or cellular. Excellent question. Glad you brought that up.
Should Android be on corporate VPN to work? Does not have to be. Could be, you know, if that’s your corporate solution, that’s fine too. But as long as it knows where to call home, right, whether it’s a VPN, whether corporate…whatever tunnel you make (I don’t want to mispronounce your name) but to your question, you could connect any way you want, but it does do cellular as well. Very cool.
Okay. Well, gosh, it’s 47 minutes after the hour. I can tell you guys are getting the conversation based on the questions you’re asking. That’s great. Oh, here’s more. Can this be deployed in a covert setting without the end user being aware of the agent running? So Aaron, what do you do for a living, Aaron? No, kidding! So Aaron’s question, can you deploy this covertly? So here’s a conversation point. I’m going to have with you…one moment while I pull up a picture online.
So right this minute, Aaron, and I’m just going out to Amazon to find my favorite OTG device. And I’ll tell you what I mean by OTG in just a second when I can show you this picture. (Really, they’ve moved everything around. I’m just getting cables right now.) Can you deploy it covertly?
Yeah, I use a James Bond storyline, Aaron, where let’s pretend Aaron, you and I are working together. We’re at the bar with Dan. Dan leaves his phone on the counter, walks away, goes to the bathroom, does whatever, and we very quickly plug an OTG device into it, which is basically a USB device with an SD card or something in storage in it, like where you download all your videos because you’re out of space when you’re filming the concert and need to quickly download them.
Same way we can upload and install the agent to the phone. And then connect it to you, Aaron, who’s sitting in a booth across the bar with your laptop there with a wireless access point. And we connect Dan’s phone to that. So you’re just pulling data from it while we’re all sitting there having a beer or something and nobody knows! So you could literally do something like that. Or once you deploy that Android agent, if you’re not doing a third party application collection like the WhatsApp one you just saw, requires permission to do human interaction with the phone, Aaron.
It wants to swipe. It wants to start the app up as you saw it. I wasn’t touching it. It was doing all that on its own. And you can see that we have not yet built into a dark mode with that, or built a dark mode into it for something like that. However, Aaron, if you do standard collection, you know, messages, media, calendar appointments, contacts, all that stuff, nobody is the wiser. So once it’s deployed, the collection component may be clueless. Is that the wrong word for that, Aaron? But I hope that answers your question.
So, okay, Jacques, I may have missed this, but can you throttle the speed of the collection to address endpoints with a slow connection so you don’t bring the network down? That’s a great point. There’s no throttling right now. Like I just, I kicked off three at a time and, you know, internally, you know, I’ve got gigabit cables running through everywhere and my routers gigabit, but beyond that, I could very easily probably blow everything up in my house like that. There’s no throttling that I’m aware of right now.
Dan, maybe you can put in the chat window, something about…if you know anything about a way to throttle those. I don’t. The only things I know on that scheduling conversation that might come close to that were, you know, if it takes so long, stop it, or if it fails, you know, restart every so often and only for a set number of attempts. So your network doesn’t get bogged down with that maybe. You can see I had turned off the export to ORE automatically. So it’s a great question. Matter of fact, that’s a great kind of feature point…and let’s…okay, so Dan says no throttling, Jacques, so there you go for that. But I like that request from a future perspective, and that’s part of my job, is to collect stuff like that, so I love it.
Jeffrey says, can this reasonably be used by forensic service providers who have the occasional one off need to image a remote phone, or is this geared for corporate? So that’s kind of like a question earlier from Ty. Yeah, Jeff, you can do that. Completely. And there are people that do that now. You know, one off, maybe two off. They have repeat customers that they’re not, it’s not a corporate environment, collecting all it’s like a service provider. So Jeff, absolutely.
Dan, any follow up direction? Oh, Dan is saying, so you can send questions to the sales team to discuss 1-on-1. Great.
Hi, we had a case last year that required collecting WeChat. WeChat is complicated as it can only be installed one computer and one phone per user. Yeah. I mean, WhatsApp collection, generally, you know, depending on how you’re collecting WhatsApp, you log in to collect WhatsApp and you log somebody else out. Maybe a lot of applications get into that kind of complication.
So, yeah, that’s kind of a statement. There’s not necessarily a question. Oh, there’s a question right after. Is this able to collect WeChat data? Not third party from Android. I don’t know if WeChat is in the iOS list, as a targeted collection thing, I’d have to investigate to see, Ty, if WeChat is something that comes from an iTunes backup, because that’s how we’re accessing the iPhone right now. So, don’t know the answer right now, Ty. Dan, give the contact link.
Jacques: we have staff in remote offices on a 12 megabit shared connection without throttling, we potentially kick off the rest of the staff. So, you know, what I would suggest for that is schedule your jobs after everybody’s left work, right? Run them…you know, maybe you do the red group at midnight, the green group at 2am and the blue group at 4am. Throttle yourself using some of the capability in the tool already and schedule those jobs to maybe not do that.
Dan agreed: good feature request probably from the one above about throttling the same conversation. Dan says yes, service providers are a big chunk of our early adopters. That’s kind of this…yes, I was hopefully answering yes correctly to that earlier, too.
Can it collect Telegram or Signal data? So not currently from a third party perspective, but that will have growth spurts for that, Joe, from Android perspective and I think Telegram and Signal I don’t know if they’re going to target on an iOS backup right now or not. I’d have to look and see. Excellent questions! Gosh.
And the ones I don’t know right off the top of my head, I’ll get back to you guys from this list here, after I get a chance to go look. What else as we come up on the hour? Dan, anything else you want to throw into chat from the points we were talking about before we logged in here?
What happened to my pixel job? Okay, I’m not sure if that went into ORE or not. You know, I could…while we’re sitting here coming up on the end of our hour, I can maybe go kick off a job on the Burnett phone. Run task, target iOS collection. I will be sure to export that at the end. I’ll run it here. It’s queued on the server. And if I go over here to this machine…so here’s my RDP window to the machine that has a remote device collector with the Burnett phone here that says, “oh, look, there’s an available task target iOS collection.” That’s what I just selected for my AMC. And if I go to extract it, now it’s trying to connect to the phone to do that. And you can see it just changed to connecting to the device for that target iOS collection, did the same thing earlier before we started. So it’ll do that on this machine, but that’s looking at an iOS version of it. This one though requires the friendly hands right now to hook up the phone and start the remote device collector, just FYI.
Okay, throttling is on the roadmap for next year. Oh, very cool. Can I send the APK via email or some other way? (Maybe that is.) Or is the remote agent deployable only via USB? So, can I send the APK via email? Yes, you can send it…oh, and to a phone? I don’t know why not. If you can get the attachment via email onto the phone, installed, saved as a file and installed, don’t know why not. So I’ll say yes to that.
Telegram and Signal up for next in targeted collection. Excellent, Dan. Thank you for that reply. Yes, growth spurt, I’m going to call that, Dan. Our next growth spurt includes Telegram and Signal! Good, but yes, however you can get that agent on there, email, USB, MDM someday soon in a growth spurt? Yes. Good question.
And you can see my iOS job has kicked off on the iPhone 10 over here that’s sitting there hooked up and now it’s processing data sources.
Interesting tool. Got a jump. Look forward to follow up. Excellent. Okay. Any other questions we can answer while we’re here? Again, I mean, the grand scale conversation here is we’ve taken local collection of devices from a detective model and grown it up to include PCs and Windows, Mac, Linux remotely, and devices, Android, and iOS mobile phones in a remote collected world, the remote journey we’re on! Remote journey with growth spurts, Dan. That’s what we have to call our subsequent follow ups to this.
Okay, terrific. Listen, thanks for joining us today. Really appreciate that. I got everything in my list out of the way I want to get out and I’ll bug out of here. Have a great day everybody and get in touch with us. We love to talk about this stuff all day long. Thank you so much. Bye bye.
Jordan: So good afternoon, everyone, and welcome to today’s fireside chat. We’re super excited to be here. Today’s fireside chat is called ‘Navigating the Cloud: Expert Insights on Emerging Cloud Threats and Complexities’. My name is Jordan with the team here at Cado Security, so I’ll be moderating a little bit at the beginning and at the end, but James will be our main moderator today.
I’m really excited to welcome our presenters to you today. So we have James Campbell, who is Cado’s CEO and co-founder and Robert Wallace, Senior Director at Mandiant. So thank you both for joining us today. But before I pass it over to you guys to do some brief introductions and to kick us off, I wanted to note a few things at the front of today’s webinar.
So first, for all attendees, we are going to leave time for an open Q&A at the end of today’s session, so you can feel free to post questions inside the Q&A function at the bottom of the Zoom screen and we’ll address as much of those as we can at the end of today’s session. Also, this webinar is being recorded and will be available on demand later, as well.
Okay, I think that’s it from my side to kick things off. So, without any further ado, I’ll hand it over to James and Robert. So, why don’t you guys just do some brief introductions and then James, you can kick off the conversation.
James: Yeah, sure thing. Robert, you are the guest, so I’m going to let you go first. And then I’ll go after you. Not a problem.
Robert: All right, sure. So, hey, everyone. Happy to be here. Robert Wallace, Senior Director at Mandiant. I’m an incident responder consultant. Been doing digital forensics for a long time now. I’m hesitant to quote the number because, we’ve got…
James: I like how you hesitated when you were trying to figure out how long it’s been.
Robert: We’ve been at this, we’ll just say north of 15 years and we’ll just cut it right there because we’re splitting hairs at that point. We’ve been doing this a long time. I’ve been at Mandiant for nine years and prior to joining Mandiant I used to work at PwC doing computer forensics and response, which is interesting, so that’s how James and I first met working cases together at PwC. What 10 years ago? It’s been a minute.
James: I’d say it’s been close to 10 years, which is pretty scary. But yeah, I think you’re about right.
Robert: Yeah, absolutely. So James and I go way back. We’ve been in the trenches for a long time now. So I’m happy to be here and working with Cado. We’re partners now, right? And it’s a fascinating partnership from our perspective, because James, like you, you’ve been doing this work for the majority of your career, and now you’re building tools for investigators to leverage. And, from my perspective, there aren’t a lot of tools that are born out of actual incident responsive forensics. And so, you know what you’re building, it just caters to a very specific need.
James: Born out of a need .
Robert: Absolutely. I have to ask you, you know, going back in the day did you ever envision this of yourself? When we were working cases together and always fighting with our tools and so frustrated with them and always imagining ‘It’d be so much better if someone did this’ and I feel like that’s like your origin story.
James: Oh, yeah, to be honest, my origin story is like, so for those on the line, and thank you for joining, so my background as Robert said it’s cyber incident response as well, starting out with the Australian Signals Directorate, better known as the Australian Cyber Security Centre now, doing incident response for national government. But now I live in London working as a consultant as well and doing incident response for PwC across Europe and that’s how Robert and myself met.
And I guess the one thing, and I think we’ll touch base on some of it today, talking about some of the threats and how we’re dealing with it now versus how we were dealing with it then, nice segue to that is, a lot of what we’re doing at Cado just came off the back of my own daily grind, trying to help customers deal with incidents. And I think one of the things that really fascinated me about our space is it’s like even today, right, some of the tooling we use in incident response has the exact same interface I had when I was a graduate in 2007.
It’s the same kind of coursework, the same everything, and largely hasn’t changed. And so, I think today would be great I think for the audience here to hear from yourself, Robert, about what’s changed in our environment, how attackers have changed. And I think more importantly why it’s so important that the space of incident response and forensics starts to modernize, I guess, in a way which is going to keep up with the challenge because at the pace we’re going right now, it is a struggle, right? So it is a struggle.
Robert: Yeah, absolutely. I mean, it’s a nice segue there, teeing up into what we’re seeing and how the space is evolving. So I have to put a plug out there: today we’ve released the Mandiant MTrends 2024 report. We put that out annually and it tracks all the trends that we’re seeing over the past year. A lot of stats, a lot of metrics and some of the stories that go with that, but there’s a section in there just dedicated to what we’re seeing in the cloud.
The report just came out, so I’m probably going to butcher the stats, but I was studying up on it the past week. I think In our customer base 90% of our customers that we’re doing incident response for leverage the cloud in some capacity, right? So, from our perspective, like the cloud’s ubiquitous, pretty much everyone’s using it. And that also is the same for attackers, right? That’s where the data is.
There’s also a lot of keys that are stored in the cloud. And that’s a big trend within the interim report is targeting the cloud, it’s not just for like, stealing data from publicly-exposed S3 buckets. We’re seeing a lot of supply chain attacks where it’s like, let’s see how many AWS IAM keys I can steal and then pivot into all those customers’ environments. And you just see just hopping in and out of everyone’s clouds.
And that type of activity, it’s been around, right? We saw it in data centers back in the day, right? Now we just see it in a new form and we still see the data centers and on prem and all those other sort of components of an investigation playing a part in it, right? The cloud’s just one other source of evidence. It’s not to say that all the threats are just in the cloud, right? We see a lot of what we call ‘vertical movement’, right? You may be familiar with the traditional lateral movement of attackers moving in and around the data center internal reconnaissance. Now we’ll see them socially engineer a user, get onto an endpoint, typically developer and then move vertically into the cloud once they’ve stolen the data.
James: Do you find, say, this is exactly why we have you on here, Robert and, getting it from the horse’s mouth, so to say, I think there’s no better place than Mandiant to talk about the experience of what’s genuinely happening every day across multiple enterprises.
And I guess, when you come across these kinds of incidents that are involving some element of cloud, and no doubt involves on prem a bit of SAS platforms like Office365 or whatever it might be and also potentially cloud environments who’ve got multi-premise-style situations that you have to investigate as part of one thing. But do you find most customers prepare, particularly when it involves cloud, or are they even aware around what the threats are and how it’s different to how they deal with the on-premise side of things? Or do you think there’s a lot more education to be had in there?
Robert: There’s a lot more education to be had, but also a biased point of view, right? A lot of organizations we’re working with, they weren’t previously thinking about all this, which is largely why they’ve engaged reactive consulting services to help them respond.
So, yeah, definitely seeing like, folks just being unaware of like, oh, here’s how attackers are bypassing multifactor authentication to take your keys. And then pivot into the cloud environment, for a lot of people, it’s it once they look at it, they’re like, ‘oh, that was relatively trivial to pull off. How do we secure that going forward?’ It’s a different part of the equation, beyond the investing piece of it, but it also is an opportunity to help folks harden their environments and just also, make sure you get the logs, right? I mean, how many times are you going to just, oh man, there’s no evidence, right?
James: Well, I guess on that point as well, I think some of the things we’ve seen, I’m sure it resonates with you as well. I’m interested if it does is I think people’s perception around the kind of shared responsibility model. So it’s we’re using cloud, it’s the cloud provider’s responsibility for security which isn’t usually the case, actually, and it’s like a shared responsibility model there, which I think a lot of people struggle with getting their heads around.
Do you feel like people taking onboard cloud services and the likes, that maybe they’re thinking, oh, actually, some of that security should be baked in or how does that come across on your side?
Robert: Yeah, not necessarily the sort of that shared responsibility, I know there’s been a lot of major cases in the news that Microsoft’s been dealing with. Google has this shared faithful philosophy in terms of how they approach it. But still, organizations need to be getting logs, right? They need to have visibility, they need to manage their secrets, right, and manage those keys properly. I would also say where we see a lot of it especially in the cloud space, we recently worked a Web3 case together, right? And if you look at a lot of Web3 organizations they are obviously very security-conscious, right? Because of the sort of the area they’re working in.
But also they build really fast, right? They go fast and they just forget to like, hey, we need to be logging this. We’ll call it Web2 stuff, right? Some of their internal infrastructure stuff. Not just on chain on chain, and so I mean, we just see really smart people who just weren’t even aware of all these configuration settings and how to harden environments and how also you need to monitor those components and those pieces of it.
James: That’s something I’d like to dive down into a little bit is that kind of complexity and the, I guess, that problem space. I guess wrapping up your kind of, opening statements there, which was really useful, thank you. You’re kind of clean cut, right? It sounds like you guys are dealing with quite regularly attackers leveraging cloud, right? That’s pretty straightforward. And it’s quite interesting, we do come across kind of customers where they don’t necessarily feel like that’s where a lot of the threat is, but certainly it seems, even from our point of view and your own that it is just a daily occurrence, given that data is shifting to the cloud, but operational activities for organizations rely on cloud as well. And so why wouldn’t attackers be there too?
My kind of final thought on that, and then I’d love to touch on, why is this a little bit complicated, I guess, and what complexities you’re dealing with that cloud brings to the table. But why do you think it is that a lot of people don’t think there’s a lot of stuff happening in cloud? So I would say there’s a bit of an understatement from a public perspective at least that not a lot is happening in cloud. But in fact, someone like yourself, who’s at the coalface of it every day, there is a lot going on.
Robert: Yeah, absolutely. I think people are just really focused on building, and cloud enables you to build really fast, scale really quickly. I’m sure even as a small business owner, right, you guys are building rapidly, you’re iterating, you’re constantly releasing stuff, right? And I think for really lean organizations they maybe sometimes overlook that need for monitoring and investigation capabilities, right?
I think they’re just moving so fast and they have the right mindset, they’re just overlooking some of these, I’m going to say, maybe they’re perceived as historical hacking problems, but it’s not. It’s the same threat actors. It’s every threat actor, right? From espionage to criminal to scammers, you name it. Everyone is in the cloud, from customers and clients to threat actors.
James: It was quite interesting. I quite often have the conversation where an organization is doing a bit of a lift and shift of a lot of their data center capabilities into the cloud whether it be any one of the major vendors. And I often have the conversation of like, ‘Oh, cool. Okay. That’s quite a risky period, right? You’re moving things, which are traditionally on prem, which have had a firewall and kind of a gateway in the way straight into cloud. Have you guys profiled that risk?
Have you got a way of detecting new threats or even responding to them as well? Like, if you found something as part of that transition, what do you do next?’ And they’re like, ‘Oh yeah, we’re definitely on our roadmap on our program, but we’re going to wait until the data is into the cloud and the systems are in the cloud. And then we’re going to look at our security program.’ which seems a little bit backwards to me.
Robert: It does, right? And you’re right. That’s a really vulnerable point in time when you’re doing that shift. And oftentimes what we see is like post-shift, people keep that data center running for a little while just because it makes them feel secure because they still have all that. Those data centers, that legacy stuff actually gets hacked a ton and then they just pivot right into the cloud.
So once you do that lift and shift, really make a concerted effort to sunset all of that old legacy infrastructure because that stuff’s just hanging out there as an avenue right into your cloud.
James: Yeah, no. Nice. And I guess from your perspective, what sort of complexities, and I’ll give us some examples myself, but I’d love to hear from you first, why do you think people are finding dealing with cloud challenging? It’s almost as if they know they have to do this, but it’s quite a task, it’s quite complex. What sort of challenges, like, what’s different between on prem and cloud in that respect?
Robert: I think one of the things that’s really different is some of the containers and ephemeral architecture things that aren’t persistent per se. How do you grab evidence from those things? How do you know if keys were taken from, like, I think the ephemeral nature of it is one of them, right?
James: Things are scaling up and down all the time.
Robert: Yeah, absolutely, right? There’s that component of it. And then I think the other one is just really taking the time to read the manuals, if you will, to figure out how to tune your logging just right. There are plenty of cases we’ve gone into like ‘oh, yeah, we have logging cloud trail logs’, for example, right? But they don’t have object-level auditing on. It’s just like, I didn’t turn this feature on. We know the attacker is here, but we can’t see what they’re doing because it’s not locked. Let’s go turn that feature on, right?
So I think those are the two main ones, right? The ephemeral nature of it and just the log. I think cloud actually makes things a lot easier. And I think, and this is what I love about Cado. Like, instead of it being a challenge, you flipped it on its head. It’s like, no, we’ll use the cloud in order to investigate the cloud and speed this whole thing up.
James: Yeah, and I think we’ll touch on that in a second. I think one of the really important things around cloud is using its strength to also deal with a little bit of that weakness there. And that’s through that automation kind of component. But I guess resonating on the ephemeral infrastructure point of view, right? Like, I’ve talked to a few customers out there where they’re like, hey, we’ve got detections in our Kubernetes environment in say, AWS, just to pick one, it could be Azure or Google, of course in Kubernetes as well.
And then I whip up roughly a 15-minute life cycle on our containers that serve our external customers. We get a detection in there, the data’s gone by the time we get anywhere near trying to work out what’s going on. You could be at lunch, get a detection, data’s gone. That’s a scary position to be in. We’re on premise, right? If it was a server, you had a detection, server’s not really going anywhere, you can go and grab it, right? But with cloud, you’ve got audio scaling groups, you’ve got family infrastructure, even got serverless like Lambda and the likes of that, so all functions. And they’re isn’t even a server or something to query in that sort of case.
So, a lot of people just say, ‘Okay, cool. Now I have a hundred detections this month, which I had no way of triaging or saying I did the right things.’ And so that’s a hugely different challenge from that perspective.
Robert: Huge, right? Like, for me, I’m like, man, I wish all my customers had Cado deployed, so when we show up, we can just hop into Cado and evidence is being preserved.
James: That’s the idea. Yeah, all through automation. To give you an idea, we have some honeypotting infrastructure, of course, to keep an eye on some of the threats. And last time I was at an RSA conference, I was just actually demonstrating live compromises of the honeypotting infrastructure. And it was a container running in a cloud service provider. And we only had one vulnerable service, just one vulnerable service. We didn’t run a hundred of them, just one at a time, and it would get popped on average about every 15 to 20 minutes which is crazy, right?
Like, could you imagine just having an exposed service of some variety available to you? Just even for half an hour, chances of it being compromised is relatively high. What’s even worse is that most people don’t have the ability to actually see that they were compromised in the first place, which is a little scary, which is where we’ve seen some issues unravel bigger issues.
So, as an example, developers and the likes of that, obviously can have a container running, seems like not a big deal, it only lasts 15 minutes, right? So, security maybe not a top priority, but let’s say we store the keys to access another system or to spin up another cloud resource, or even credentials to a database that might be available elsewhere.
And they can suck that down and then use that to go somewhere else. And in fact, we did a SANS presentation once where we showed the example of a hacker actually breaking out of a container onto a node and in Kubernetes and then actually creating their own console account for that cloud provider. Which, again, that’s a different level there, isn’t it? Because you’re going to run time and then now you’re on like a control plane. So you’re on the console and nothing you’re running from an agent or anything like that from a runtime perspective is going to spot that activity.
Robert: Yeah, it’s a challenge, right? But I’d say, taking in a holistic picture, right, of a security program, right, and bringing all those things together, whether it be through Sam whatever folks are into, having the ability to respond to it is the next step, right? Largely people are blind to it and then there’s no way to respond to it. Now it’s like, hey, you need that visibility, right? I can’t protect what I can’t see. And then you just need the ability to respond when you do see something to triage those alerts.
James: Absolutely. And I guess that probably brings us a nice little segue. So, not necessarily all doom and gloom. And I guess, there’s a lot going on. So we’ve established that plenty of stuff that people aren’t aware of as well. I can 100% say that from our perspective, too. And then the cloud brings a new level of complexity through kind of resources, just spinning up, spinning down. Also, complexity in the sense of, it’s so easy to leverage the technologies and just spin things up and down. Maybe you won’t have an agent or logging or any kind of that. So a bit of shadow IT, so to say, just on steroids from that perspective.
So, moving on to some of the strategies or, how does kind of automation play a role here with cloud? And I guess you’ve had some experience with Cado, and I don’t want to make this about Cado necessarily, Robert, but like, what do you think? What are some things that people could be thinking about or should be thinking about when it comes to protecting themselves in the cloud? And I think logging was one of the first ones you hit the nail of the head on at the earlier part.
Robert: Yeah, logging’s a big one. Hardening the environment. I know that’s a generic term, but there are a lot of guides out there, especially from the three big cloud providers, right? You can find all this documentation and turn that on. That part of it’s key. I think it’s a component of an overall security program in general, right? It’s hard to say, like, ‘oh, you should only focus on cloud or only on point. You should focus on security and find the right balance around risk and things of that nature.’
And then, I know this isn’t about Cado per se, but I would say, I mean, for Mandiant retainer customers that have IR retainers, I would encourage them that you should consider having something like a Cado in your environment for folks who are operating in the cloud. I mean, you have a retainer, right? You’re ready to respond. In an event you’re going to have your evidence, too. It’s just going to make for a much faster response time. The average dwell time is down to like 10 days now for attacks, right?
I think that also can correspond to why incident response times are down as well, right? We have to go faster means we got to work more cases and we got to do them in a shorter amount of time and so having these types of capabilities It just it reduces the impact to an organization when something bad does happen.
James: Absolutely. I guess, say 100% I would recommend people do embrace cloud. Cloud is an amazing technology and it allows your organization to move quickly as well. But obviously that comes with new risks. Usually it takes a while to provision a server and then make sure you punch a hole through the firewall and all that sort of thing. But those things don’t exist anymore. It’s a very different playing field, right?
Robert: The security does show. The bad guys don’t go away.
James: The bad guys don’t go away, yeah, it’s just shifted. Exactly. And I think one of the things, obviously it’s something that Cado is designed to do out of the box is to help solve a lot of these challenges. But, focusing on the core kind of thing beyond Cado is, well, how do you deal with things like spinning up and down resources? How do you deal with things disappearing?
And also random assets all over the place. Like, let’s be honest. I think a lot of cloud networks tend to have a Wild West component to them. And so how do you deal with all that and also keep your sanity, too? Because, also, as an incident responder, okay, yes, you’ve got to have high-level knowledge of all the different cloud technologies, but there are whole job roles just for a kubernetes expert, right? That’s a day job for someone. And you can’t know the ins and outs of all the different flavors of containerized and docker systems, etc. You can’t know all the flavors or the various different cloud technologies. And so, good to have the high level knowledge, of course, and understand the threat and risk so you can advise customers on what’s next.
But really, I think what I found resonated the most is, and this is playing to the cloud’s strength is really automation. So automating a lot of those components and getting that data to yourself or the team, or to especially Robert off the back of a retainer, having that data ready to roll to dive into as and when you need to. And automation plays a key role there for you to be able to adopt a cloud in a way you should, right? You shouldn’t have cloud just be an expensive data center. That’s going to get very expensive very quickly. But you should be using all those scaling groups. You should be using a fair amount of infrastructure. And this is also how you save loads of money with the cloud as well if you’re using it right.
But that brings that challenge. And so automation is like, okay, cool, I now have that detection in that container, I better go automate that data capture from that container at that moment in time and have all that forensic information ready to roll. Even though that container spins down, you’ve still got that kind of forensic data. And that’s super important and making sure that’s all automated. I guess, Robert, to your point, really, it’s about being prepared to make sure you have not only retained providers such as yourselves, but also talk to them about what sort of logging should we enable if you’re not sure what’s available.
How can we get ahead of that? How do we turn that on? And then how do we actually get this automation in place so when somebody does fire, I don’t need to worry about how do I connect to a container running in Kubernetes and grab all the forensic data before it spins down? So, you shouldn’t be stressing about that because cloud can actually solve its own problem there, I think.
Robert: Yeah, absolutely. And we were cobbling together scripts all these years, right? And we still do, right? Like, yeah, I got a script for this and a script for that. But you took all that and you’re like, because I’d be like, oh, I need a GCP one and I need an AWA andiIt gets confusing. And you have experts around the world and each one, like you said, has its own particular job function, right? And you just put them all together and it’s like, hey, we got it all right here. And it’s like we have the cloud hybrid all covered now.
James: And as the industry responder, right? Like you guys need to do what you do best. And that is, working out what the bad guys are doing and doing that quickly on behalf of the customer. Stemming the risk and the flow there. And I think the last thing you want to do is be playing around with Hackey scripts and all sorts while there are attackers running around. I did a job once where we had it was an Iranian ABT group, we had to go and investigate about 80 odd systems, right? And this was before Cado, this was my consulting role. And it took us nearly a month to go collect those systems, like a month to go and collect 80 systems. And that’s a crazy long time just to go collect the data. But with cloud, we can do that within an hour now.
Robert: Oh yeah, I love it. And also no deployment either, right? Think of the pain of deploying technology during investigation, right? Pushing out agents if you don’t have them.
James: Not that you have experience with that, Robert.
Robert: It can be painful for everyone involved.
James: Pushing tech out in a crisis situation. It’s never easy.
Robert: How long does it take to deploy Cado, right?
James: Minutes, because it’s all API. So no agents or anything like that, which is pretty cool. And I know you guys like it for that particularly because you deal with quite a lot of XDR as well on deployments. And doing that in a live fire situation is, I’m sure, very tricky for you guys.
Robert: Yeah, but now we can show up and clients a lot of times have EDR technologies in place, which is great. Now we can operate on top of what they have deployed and bring in other tools where necessary, right? So it’s like, we can get endpoint coverage, we can get cloud coverage. and all the other different places where we need to get visibility. It’s great. And for us, Cado is an important tool in the tool belt.
James: That’s awesome. No, we’re happy you guys are involved. I guess one last final thought and then we’ll go to questions. I think Jordan came online just a moment ago to give us the hurry along. But we could talk all day long, I think. But I’d be silly not to ask this question of you, Robert. I know we talked a lot about automation, but how do you think AI and LMs are playing a role here in our space right now? Does it have a home in incident response? Do you think it’s a beneficial thing or a bit overhyped?
Robert: A little of both. It is beneficial though overall, yeah. I think it’s going to be key in automating a lot of the work within the SOC, you know, for if you look at security operations in general, hugely beneficial there. I still look at it as an assistant. However, we’re in the early stages of this, right? And I think it’s going to evolve from assistant to analyst. So, we’ve been doing some really interesting research on our side, James.
And I was reading some stuff the other day about how the buyer’s total team has been able to use Gemini 1.5. And as a malware analyst. They trained it and they can submit up to a 1MB binary and within seconds get a malware analyst triage report in the same way you would get from your malware analyst that you work with on a case, right?
So, it’s evolving. It’s coming along. That’s just one use case.
James: Like leveling up your sandbox to the max.
Robert: I think there’s so much more coming, but the way they did it was they basically fed it the decompiled code. I think it can also handle disassembled. So, someone went through the steps to decompile it and then plop it in there and let it stay. But again, we’re just at the beginning. So, I want to see it go from assistant to analyst. I don’t see replacing jobs, I see it augmenting what we’re doing.
James: Yeah, augmenting, yeah. Getting good information fast, yeah. You’ve still got to get the right information into it, right? Otherwise you get a crap output at the end of the day. Okay, great, and I think we’re getting close to the time here where we can take some questions. So if people have some questions, Jordan, how do they do that?
Jordan: Yeah, they can post this at the bottom. So we’ve received some questions, but feel free if you haven’t. We probably won’t get to all of them today, but we can definitely follow up if we don’t get to them today. I think we should take a few and then close out in a few minutes, if that makes sense for you guys.
Cool. Okay, so I have two questions that are similar, so I’m going to combine them and I’ll throw it out there to either of you to answer. And I think it’s very timely with the M trends report coming out because it’s around threat trends.
So, someone asked around, what are the biggest cloud threat trends that you see moving forward? And then also another similar question about a popular objective in the cloud being crypto mining and wanted to hear your thoughts on that, if you see this kind of being an ongoing trend, or if you can elaborate more on any diversification here.
Robert: Yeah, absolutely. So, I’d say the biggest threat, it’s really around credentials, right, is attackers accessing infrastructure.There are all types of ways to get in, but basically that identity and access management layer is like the firewall for the cloud nowadays. Being able to manage who has access to what I think modern security now is really around hardening those ACLs.
And then as it relates to the threat of crypto mining, also a huge problem, right? One way organizations detect it is through expensive bills. I don’t like that as a technical solution to detecting evil in my environment, but that is one way organizations are picking up on it. It’s like, why do we have 1,200 extra EC2 instances deployed? Things like that. It’s definitely a big trend, but I think the detection part of it goes back to your standard sort of detection and response capabilities.
The crypto mining thing, that is a real threat. And I know James, you run into it all the time as well. It’s like a nuisance and a resource drain, but it also leverages a lot of the same sort of PTPs that you would see in other types of attacks. So, all that is to say is like building out robust detection response capabilities, understanding your threat profile, who’s coming after you, I think that helps inform your risk-based decisions on where to harden, how to harden, and where to monitor.
James: Yeah, that makes sense to me. And I think one of the interesting things, we tracked a couple of kind of groups which do a lot of crypto mining and particularly in things like containers, obviously, but there’s definitely been a pivot there where, and to be honest, most people don’t notice this because they don’t have the ability to, or haven’t got that kind of pre-deployed ability to investigate. But they’re not only just running crypto mining, but they’re also stealing credentials, as you say, so credential theft and leveraging that for other purposes.
In some cases, we’ve seen them use those credentials to leverage attacks on other infrastructure, using your own cloud network. We’ve definitely seen loads of that happening and really a lot of those credentials can gain access to some pretty sensitive information. So, I won’t go through some of the examples just in case it embarrasses anybody particularly, and I think what’s quite interesting is that, you get this crypto mining detection, but you don’t get anything around what else did they do on that system? What else did they grab? What was the risk?
And so people were like, ‘Oh, okay. Crypto mining. Okay. Not a big deal. I’m going to close this ticket.’ And then next thing you know, you find out a month later something horrible actually happened. So yeah, it’s quite an interesting situation.
We tracked one group actually, they even post online how many cloud systems they’ve compromised at any one moment. It’s like 20,000 or something. It has been a long time since I checked. But that’s how brazen it is. But yet it’s so successful. And I would say that’s, I won’t shout it out too loud, but on the lower end of the sophistication scale. So, more opportunistic kind of stuff. So imagine all the stuff that’s happening when it comes to the targeted side, which is, I’m sure what you’re dealing with more and more.
Robert: Yeah, absolutely.
James: Thank you, Jordan. Any other questions, Jordan?
Jordan: Let’s take one more question and then we’ll close out. And any unanswered questions we’ll follow up after the webinar. I like this question because you guys talked a little bit about how this organization is definitely not alone. So this is the question. Our organization has just migrated a lot of resources to the cloud, and they’re just starting to ramp up essentially on the security side of things. And they’re asking if they have if you guys have any suggestions on what’s a good first step?
Robert: On the security cloud migration, I’d say first step for whichever cloud provider you’re utilizing is to read through the documentation on the logging capabilities and turn on each one of those relevant features. And then make sure you’re actually analyzing these logs and monitoring there. So get your logging right.
And then second, I would say check out some of the hardening guides. Mandiant publishes these on a regular basis. If you scour our website and our blog, you’ll find, I know that isn’t a great answer, but there are definitely hardening guides out there that I think people might find useful. I know it hurts as they start their journey. You’re not going to be perfect on day one, but you have visibility, you have logs, you’re ahead of most organizations.
James: I think from my perspective to just add to that is really understanding what’s in your cloud. It’s trying to understand what sort of data is in there, how it relates to your day-to-day business operations, and it will help you profile that risk a little bit better.
Also, make sure you can access stuff when it comes time to respond or investigate. Quite often people will be like, ‘Oh, man, we’ve got this really crazy detection on this system. We need to look at it now.’ And it’s a system in another region that they don’t really know about, that they don’t even know who the owner of the system is, and it takes them a while to find out and it takes them a while to get access to that information. All the meanwhile, everything’s on fire and they’re freaking out. So, if you prepare well, cloud can be a really good strength there in making sure you get quick access to information and data as and when you need it. I think it’s game changing from that perspective. So, really good.
Robert: And be militant with those keys, right? Whether it be IAM access keys, SSH keys, things like that. Hackers love stealing keys. And I’ve been surprised because in a lot of our investigations, there are just keys everywhere, just all over the place. It’s just like, they all got stolen, but in any event, be aware of those keys and how it’s accessed. Be sure you’re tracking those. We’ll see some that it’s like, oh, this key is four years old, no one’s used it, but somehow the attacker had it and they just logged into our environment. And you’re just like, oh, great.
James: It’s that simplicity in it creating. It’s great, you can create a key and create instant access to systems to do your job, but it also creates a minefield from a risk perspective.
Robert: Yeah, absolutely.
Jordan: Thank you so much, Robert. Thank you so much, James, for your time and for sharing your knowledge today. I really enjoyed listening in on the conversation. So, thank you so much. I hope everyone who joined as well enjoyed and found today’s content valuable. We will follow up with any questions that went unanswered. Thank you so much for submitting those. And with that we will close out today and have a great rest of your day. Thank you so much.
James: Hey, thank you everyone for joining. Thank you, Robert. Thank you, Jordan.
Ryan: Hey everyone, it’s Ryan here from your Oxygen Forensic training team. Today we’re going to talk about one of the newest modules included in Oxygen Forensic Detective at no additional cost. This is going to be our built-in translator. If you’re having trouble analyzing data in a different language, then this new built-in translation module is meant for you.
It can easily translate data to help reduce the overall timeline within investigators case. With the rapid globalization of technology, we cannot always rely on the extracted data being in a language that is known to the investigator. In order to solve this issue and reduce case time, we have added this translation module available for download within the customer portal. So let’s go ahead and take a look at the customer portal and see how we can download and install this new translation module.
From the Oxygen Forensic Detective homepage, we’re going to navigate to our global options menu into our configuration options. From here, we see our translations tab on the left hand side. Once we select that tab, we see that I currently don’t have any languages installed for my Oxygen Forensic Detective.
This is an indication that we need to visit the customer portal in order to download and install additional languages for translation support. I’m going to click this link to visit my customer area. Once here, I get to see my Oxygen TextTranslate add on. It’s a rather large download, so if you’re intending on using this for casework, ensure that you have enough time to download and install. I’m going to select my download option. Once my download is complete, we’re going to go ahead and install our TextTranslate add on.
From my chosen download folder, I’m going to select the Oxygen TextTranslate iso. When I double click this, it should mount to my system, allowing me to initiate the install. Another option is to select your mount option here in the Windows Explorer. Once this open file splash screen appears, I’m going to select “open”, and now we can run our Oxygen TextTranslate setup executable.
From here, we’re going to navigate to where we’d like our destination to be, select our program language, and then hit our install option. This may take a few minutes to successfully install, but once installed, you can go ahead and close out of your Oxygen Forensic Detective, and then relaunch, and all of our available languages will be present. Or, while this is installing, we can just go ahead and close out Detective.
Now that our TextTranslate module has been successfully installed, I’m going to select my “finish” option, and then I can go ahead and unmount that iso file.
From here, I can go ahead and launch Oxygen Forensic Detective again. And now when I go into my global options menu, back into my translation tab, now I see all available languages for translation. At this point, we are ready to begin our investigations and our analysis based off the data that we see that may require translation. So let’s get into it.
As you can see, I have an iPhone 6 extraction loaded into Oxygen Forensic Detective. There are many places where translation support can be found. Notably, those places are our Apple Notes, calls, messages, user searches, WebKit data, OS artifacts, wireless connections, key evidence, timeline, and at the individual application level.
Now let’s take a look at some of these sections to see translations in action. Here in our Apple Notes, if I wanted to implement a translation inline and at the same time as I’m investigating or analyzing this data, a few ways I can accomplish this. Along my top toolbar, I see my translations toolbar button.
I’ll just select the drop down arrow. I can choose to toggle on to show any translations that I have currently enacted. I can adjust my translation options back from my global options menu. Or I can choose to have the highlighted item in column two translated from this option. If I know the language that I’d like to translate from, I can select it here, or I can choose my “autodetect” option.
If I’m examining a multitude of data, I can highlight more than one item, right click, and have the same translate into English from a selected language. This language is going to be Spanish, so I’m going to find my “from Spanish” option, and then we’re going to see the translations run automatically, as indicated, both by my notification center and that spinning wheel icon.
Now, the translation icon will be apparent inline with the text that’s been translated. If I was to hover over this, we get to see what that translation looks like and the original text. This is also shown in our column three.
This same functionality is going to be apparent through each and every one of those sections that we previously discussed. So let’s jump to a different section now and see some additional translations on a larger scale.
So I’m going to go back to my device overview and we’re going to go into our messages. Once our messages tab has been successfully loaded, as we did in our Apple Notes, I’m going to go ahead and highlight all of my items, and I’m going to come to my “translations” toolbar button. I’m going to select “translate into English from Spanish” again. So this is going to be that secondary way, or additional way, in order to translate items within column two.
Now that our text to translate conversion is complete, as we’ve seen before, if I wanted to view what that translation is, I have a couple of options. I can hover over that particular line item in column two, to be able to review that translation and the original text. I will also have that option in our details column, column three, where I can see the original text as well as the translation into English.
Let’s take a look at another section where translation support may be necessary. Back to my device overview, and let’s take a look at an individual application. So here I have my Apple messages. I’m going to view all of my data by selecting my top categories in column one. And the same deal here: I have my options for translation on an individual line item basis, both with my toolbar button, as well as my global right click options.
So if I wanted to do the exact same, I can right click, translate from the language identified, or use my autodetect feature, and allow the translation service to run. Once complete, we’ll see those translations made immediately available to us in column three and at each individual line item. Now, as we’re seeing the translation, we can continue, of course, to identify items as key evidence with marking them with a key evidence star, right clicking and applying a tag.
So same functionality here that we see across any section that we’ve worked in, just now with the additional supported translations included. And as we can see, those translations occur at a very rapid pace, so we have much quicker access to identifying what particular text may be to help us further our investigations.
Now that we have identified items and translated text within our case, now I want to be able to report on that and visualize those translations within my reporting. For instance, if I’d like to report specifically on my Apple messages, I can select my “export” option and export this to a file. Now I can start to generate and customize my particular report as typical within Oxygen Forensic Detective.
But now we have our inclusion of our translation tab, and I can choose to show the source text only, translated text only, or both the source or original text and translated texts within my reporting. So now all that work in translation that you have completed throughout each of the individual sections can now be represented in line within your reports.
I can even choose to show the translation direction, for instance, English to Spanish, and so on. Once I’ve identified all the appropriate items within my reporting, I can go ahead and select “export”, and that report will then generate. Translations can be viewed from both within an individual section report, as we’re seeing here, or at a device level, or at a case level.
One of the best locations I like to visualize any data that I’m working in within a case is going to be my timeline. This, of course, is going to show us all of the available timeline based data from a particular device or from devices within a case, but gives me a one stop shop to be able to visualize that data.
So I’m going to navigate back to my device overview with my extraction overview button, and I’m going to navigate to my timeline. From here, like in any other section, this is going to aggregate all of our data that’s associated with a timeline. Here I can select my “messages” tab and have the same translation support here within my timeline.
Now, if there’s a translation that I choose to not apply, or I want to delete translations, I can simply right click and delete translation on an individual line item basis, or come to my top toolbar button and delete translations. This is going to apply for anything that’s currently highlighted within column two.
If you noticed when I selected translate on a single line item here, it brought in all the already translated data from our messages section previously. So if I ran translations within a section, for instance, messages or calls, when I would go to that tab here in timeline, I would simply need to right click and rerun one translation on an individual line item, and the translations that had previously occurred will then be applied to our timeline.
For more information about translations in any other additional supported modules within Oxygen Forensic Detective, or to include yourself in any one of our various training options such as our Oxygen Forensic Bootcamp, please visit us oxygenforensics.com. Hope to see you in class soon.
For further information and to sign up for a free trial, visit: https://go.exterro.com/FTKfreetrialsignup.
Lynne: Hello, everybody, hopefully you can hear us. We are just going to wait, I don’t know, 60 seconds to let some people file in. We have a nice big group today, so that’s exciting. 97 people have joined so far. Let’s see if we can get that to 100. Give it just a second, and we’ll be starting in just a minute.
Okay, we’re at 103. That’s good for me. So, good morning, good afternoon. I’m in the US, that’s why it is very dark. I’m so sorry that my lighting in here is terrible because it is nighttime here, but we’re joined by my colleagues internationally in the UK and Europe.
So, good morning. Welcome to this webinar. We’re so excited. We have released FTK 8.1 today. Our website should be live literally within the next minute or so and the press release and everything will be launched. And so today is a great day where we really want to show off all of the new features for you in FTK, 8.1. We’ve been working very hard on this and there are some very exciting features that literally our customers are going crazy over.
So, I just want to go over a couple of housekeeping items today. These are our presenters for today. I will leave you after these messages. Harsh Behl is here. Harsh is the VP of Product Management for our Forensic product line, and Harsh is going to walk us through the story of how the heck we built this product and why we built it and what is going to be coming in this release. And he’ll walk us through a lot of the new features on the internal investigation side of things.
Christine Hall is also here with us. She’s a senior international technical engineer. She’s amazing. She’s going to demo for us all of the mobile capabilities that FTK 8.1 has in addition to the new reporting features that come along after you’ve done your mobile review. Let me just orient all of you to, we’re using a new webinar platform called Big Marker. On your console, probably on the right-hand side of your screen, or however you have it oriented, there is a chat panel. You are welcome to chat with us throughout this webinar. If something looks strange, or you can’t hear or if something’s going wrong, let us know.
I see everyone saying hi. Oh, I’m so glad, everyone’s here from so many different countries: Jason from South Africa, Angelo from Italy. Welcome, welcome. So you can absolutely use the chat feature. You can also use the Q&A tab. So, as Harsh and Christine are demoing, they’ll be doing a lot of live demos today. As they are demoing, if you have a question for them please type that into the Q&A panel. We’ll save those until the end. Again, unless it’s something that’s really important, and I’ll probably interrupt them and have them answer it, but type those questions in and we’ll make sure to answer them.
There’s also a tab called ‘Handouts’ that you’ll see, and in the ‘Handouts’ tab, we have already loaded in a couple of our brochures for FTK 8.1. There’s a product brief, in terms of the mobile capabilities of 8.1 and most of the features that maybe our law enforcement customers would be interested in, and then there’s also a whole separate brief in there just about our mobile features, just focused on that particular part of the product. So there’ll also be another one that we will send out to you later this afternoon about all of our internal investigation capabilities. And so there’s tons of information getting released today, and we’ll make sure to get that to you.
So, I’m going to go ahead and pass it over to Harsh. Again, if you have any questions, chat them in the chat panel or put in the Q&A for us, and we’ll make sure to get to those. And then, yep, Harsh is our MC today. So, take it away, Harsh.
Harsh: Thank you very much, Lynn. Appreciate it. A very good morning, afternoon, evening to all of our attendees joining from different parts of the world. Thank you very much for sparing the time from your day. We are very excited to unveil what we have built for 8.1. US customers have vested your interest and trust in the technology for all these years. And as Extero, we are very proud to be bringing you the latest release that is FTK 8.1.
So firstly, what were some of the biggest motivators for us to build our 8.1 technology and all the features that we’ve packed into the product? From the corporation side, we truly want you to stay in charge of your data, irrespective of what your employees or where your data resides, we want you to be fully in charge of it and control your data as well as preserve and investigate at your own will, irrespective of where your employees are.
This release firmly positions us at the forefront of corporate investigations. We’re now along with the off network of Windows remote collection capabilities, we have added the capabilities to collect from Mac devices remotely off the network. That is the employees who are not connected to your corporate network or to the corporate VPN, you will still be able to pull the data back from that.
I’m only going to be showing you some of these slides and then I promise we’ll see the live product, as well. So I’m going to quickly just go through some of the slides that we have for you. Mobile data: Mobile data has been an area of huge interest for Extera. We want to become a leading mobile review platform where we allow our customers to bring in all the computer, mobile, cloud resources, data and a single platform and give you those additional context and insights into the data that you would typically miss.
So we’ve put a huge amount of focus on enhancing your mobile investigative experience with this release and can’t wait to show you all the different things that we’ve got. We have enhanced mobile parsing by lots and lots of newer artifacts that you could find in our artifact guide that lists all different artifacts for all different operating systems that we support.
We have added a lot of things that could truly benefit you from a mobile investigations point of view. First is, irrespective of Android, iOS extractions, you can bring them straight into our product from whichever tool you’re creating the image from, it doesn’t truly matter. But when it comes to review of the data, we want to enhance it, not just to show the usual chats, calls, SMS, but we want to provide those investigative analytics that those additional actionable intelligence points that you could act upon and then take it further in your investigations and make your investigations really easy for you and whoever you’re going to report to.
A lab-to-court report builder: we’ve put a huge amount of time and effort into enhancing the experience of a user when they have to report upon their findings. We have come across users who’ve told us again and again that yes, they have been able to locate the smoking gun using our tool. But how do they easily narrate the narrative? How do they paint the picture of how the events unfolded and just have an easier way to communicate and convey what they want to using a piece of paper?
So, our lab-to-court report builder will help you with just that and Christine is going to show you a demo of that also. We’ve put a lot of enhancements on the multimedia side as well. Now you would be able to do object recognition, not only from just the pictures, but also from videos. You can identify similar images, similar faces, so on and so forth.
And for internal corporate investigations, off-network Mac collections, the big thing that is packed in this release: remote triage. We understand when the time during the crunch hour when something unintended is happening in your network, time is of crucial essence, right?
So you want to limit the time you take to respond to an incident, and hence we have added the remote triage capabilities. You could quickly scan the endpoints remotely collecting only the data that will give you all the system artifacts that help you make the determination of whether you want to take further actions upon these devices or these endpoints or not. So quickly putting you ahead of the curve, helping you make informed decisions to later augment your investigation.
Splunk integration. We’ve built a whole new integration library. We’ve already had integration with Palo Alto, and you can find us in their marketplace as well. But this release brings the Splunk integration as well that allows people to orchestrate the workflows between Splunk and FTK hand-in-hand. It is going to be very beneficial for the organizations who want to further augment or strengthen their cyber infrastructure as soon as an alert is detected and it matches the thresholds that you have set, it can automatically trigger FDK to perform a series of actions to, you know, even when you’re not sitting in front of your end point or your computer.
So, those are some of the really good things that we’ve put in here, and just very quickly around the mobile. So instead of you as customers using various different tools, various different reporting templates that you create out of it with FTK, now you can bring it all within the same platform, irrespective of which tool you are using for collecting mobile data, and create one single unified report to be presented to whoever you like. The reporting has enough capabilities for it to match any persona or to the liking. It can be generated to the likings of anybody who you would like to report.
All right. On the mobile, as I said, we have the support for major chat applications, lots and lots of newer iOS 17 artifacts, some of the key differentiators being extracting deleted data better than some of the other other competitors that we have in the space. We’ve added the support for reply messages; you can see the replies and how they are responded to. And like I said, we believe that all that you would need from a mobile to help you solve your case, we’ve got it in there.
And of course, given the dynamic nature of the mobile space, we want to listen to our customers. If there is something that you would like us to do better, there is something that you would like us to support, please do reach out to us and we’re always keen on customer feedback and then addressing it.
Lots and lots of chat review tools that make us stand out. I’m just going to quickly skip these for now because I want to show you the live product itself. But again, language translation has also been built into our chart viewer so that if you are reviewing documents, chats, emails that are in some other language than your native language, then you could use our language translator built in within the product, all offline, no connection to the Internet, and you could you could then translate and even search on translated content.
You could do the legal privilege professional review using our product. You can do the Google takeout and warrant returns, iOS warrant returns with this release as well. And of course, retrieval and display of deleted and edited chat messages is supported. A lot of enhancements have gone on the timeline, as well. So you could now look at all the chats on the timeline, you could export the timeline view so that whatever could be exported quickly in a PDF view, as well. So yeah. A lot of enhancements have gone there also.
Our chat viewer, we believe, truly stands out. We’ve added the capability now in this release for you to filter the chat by relevant date and times that you would like to review. So if you are reviewing a chat between two applicants and you’re not interested in the whole chat, but a specific time range, you could do that within our chat reviewer now. Of course, you could have the language translation that we just spoke about, so on and so forth.
Entity extraction and management: I am not going to steal Christine’s thunder away, but this is one of the biggest features that we believe we’ve put in 8.1. We understand how important a role communication data plays within an investigation. And we truly understand the hidden objective, the hidden agenda of the people who you might be investigating when they’re committing a crime.
So, in order to make the communication analysis easier and more insightful, we’ve brought a lot of capabilities that first and foremost identifies all different entities that are present in the case by their names, their aliases, social media handles, phone numbers, and email addresses. We put those all together, so instead of you looking at five different chat handles or applications being from the same person, we tie them all back to one entity and then let you review the data by entities. It helps you to identify the communication patterns, who has been the most chatty in the device, which application is being used the most, and how are people just communicating with each other?
So, Christine is going to take you through it, but we have had great feedback from some of our customers. And some of them have even called it a total disruptor in the industry. Very keen on getting your feedback on this when you see it when you use it, but yes, we would love to show it to you today.
Of course, we’ve spoken about the reporting. As I said, reporting will help you generate reports in different formats. You can customize the templates. You can embed the files in line into the report so that you don’t have to keep printing files separately and then appending them all together. A timeline report can be embedded into your main report or you can have a separate timeline report, and you can have a conversational view coming into the report as well. Lots of multimedia review items that we just spoke about: image recognition on videos, pictures, and then similar face recognition, as well.
Off-network Mac collections. We’ve spoken about it, and we are so proud of what the team has put together. We are firmly putting you in charge and in full control of your data, no matter where your employees are and what time of the day you want to collect the data, you have full control, right? We use standard Jamf for mass deployment of remote Mac agents. Playing very nicely with your IT teams there. And then, of course, we do support Zero Trust framework as well. So if you have implemented Zero Trust, our tool is there to help and support you for remote mobile collections.
We support the logical and filtered collections from Macs. The product has the capability to resume the collection from the point of interruption. So if there is any interruption when the data is being collected, the next time we establish a connection with the agent, we will resume from the point of interruption itself.
Scalability is at its core at Extero, whether it is supporting a large amount of data processing, large number of reviewers or investigators on the case or large number of remote endpoints that you want to collect the data from. Our technology is built to match your scale, whether it is up to five users, 10 users or up to thousands of users, our technology could help you with that scale if deployed correctly.
All right, rapid remote triage. So, as we just discussed, in the time when it matters the most, it’s very important to get the data that is going to put you ahead of the curve and going to give you a head start into your investigation. It is important when you are looking at a threat that could be enterprise, that could be spreading enterprise wide, you need to look at probably indicators of compromise.
You want to look at some of the external devices that are connected to the end points, you want to look at some of the hidden processes that are running. So instead of collecting all of the user data so that you could look at these system artifacts, we have now embedded system summary collection within the product that allows you to only collect the system-based data to help you with the most relevant system artifacts that will firmly put you ahead and give you deeper insights into whether you want to go ahead for collecting more data from that machine or not.
So, the triage of remote investigations is firmly enhanced with our system summary collection capabilities. The collection size is very small. It is only the files that we need to show those system artifacts. So you are eliminating a lot of noise for your first review.
Okay, we are going to see all of this Splunk integration. Like we discussed, you can now orchestrate all your cyber workflows between Splunk and FTK. We will be having separate videos out on this. We will be doing separate webinars out of this for Splunk integration, as well. But please reach out to us if you need help orchestrating your workflows and you want to benefit from the automation between Splunk and FDK.
I am going to quickly skip to my last slide because I’m already getting some of the texts from my team to hurry up. So we want to summarize 8.1 and this slide perfectly summarizes 8.1. We’ve built so many features that can help you with your day-to-day investigative lives, starting from lab-to-court reporting, where you can highlight certain keywords, you can put embed the timelines within a report, to multimedia AI helping you with object recognition amongst videos and pictures, similar faces, offline language translation, entity management revealing the complete communication pattern between entities and forming individual entities itself.
Remote collections, off-network map collections, live preview, endpoint triage. Everything at your fingertip to help you with your internal external investigation. Hundreds of newer mobile artifacts and new mobile data support. And our integration library has seen three new integrations this release, where we allow you to orchestrate workflows with Splunk SOAR, we allow you to import your data from Griffeye, and using our using our APIs, we have OpenAI Whisper integration that allows you to to transcribe your multimedia data. So instead of watching the full videos or listening to the audios, you can go keyword searching between these as well, all deployed within your environment behind the firewall.
Truly excited about our 8.1 release, and within the next 10 minutes, I’m just going to show you some of the features that we’ve spoken about live within the product.
So here you see, I’ve logged into the FTK central interface, or some of you may know it as a SmartView interface. And we’re going to first look at the off-network Mac collections. Now when you come into collections, you would see that we have broken it down by Windows, Linux and other data sources. And you have your Mac investigation selection, as well.
If you select Mac, it shows you the dashboard for all the previous collections that have been performed, and it also allows you to create a new collection. You can click on ‘Create’; because Mac’s remote we can only do logical acquisitions, so it defaults to logical acquisition. You can provide a name to the collection, you can put in a description, and which case would you like your collection to be collected and processed in. You can choose your own processing profile that you would like to process the data with, and if you select ‘auto-process collection’, it’s going to automatically process it.
Save and next, then it shows you the endpoints that are there today in the system, which are Mac endpoints, and you can then choose which endpoint you want to collect the data from. The one with the green dot, it shows that there has been a recent contact established with this machine, so most likely it is on the network. And if you go here, it shows you that these machines have not been contacted recently. The last contact I had was on this date and time, so these are those off-network machines that you may want to pull the data from. So you could select that as well, and then you could say, I want to do a filtered collection, or you want to collect the logical drive. Of course, if you go by logical drive, we won’t be able to get the data from it. But of course, when you’re going by the on-network machine, it will help you to collect the data for it as well.
This is the machine we want to collect the data from. So here you can create your filters for the collection. I want to collect the files that are only pdf and probably xlsx, as well. You can define a certain path if you want to only collect from that path, size, some data and timestamps you want to specify or any keywords that you want to specify. You can save the filter as a template so that you can anytime come back and use one of the saved filters. So if this is one of those filters that you had created earlier, you can see what this filter is, and then you can apply that for your collection criteria. Or at the same time, you can just click on ‘acquire logical drive’, and it’s going to show you all the logical drives that are available for you to collect remotely.
That is for filtered collections on your on- and off-network machines. However, if you would like to go up doing a live preview, you can totally do that as well. So now it shows you the same endpoints and if this is the endpoint that you would like to do the live preview upon, you select it and it’s going to do a live preview.
So it shows you the full file structure on the left and then you can traverse through the file structure, you’re looking at the folders, you could go to the folder of your choice. Here I have all these different folders, I can click upon them and I can see all the different things that are part of that folder and I’m just previewing as normal. You could go to the documents folder here as well, like you see; I have a 49ERS in here.
So here are all the different files that are in there. You can click on those, it’s going to generate a live review of those files. And then as you can see here, if there’s a picture, it’s going to show you the picture as well, and maybe these are the files that I want to collect from. You simply choose what you want to collect or you can select everything in here, review your selection, so these are the files that you want to collect, and when you hit ‘acquire files and folders’, it’s going to do the remote collection for you. If you don’t want to do remote collection and just focus on the acquisition of the logical drive, you can do it from this screen as well. Simply select the whole drive and ‘acquire for collection’ and it’s going to do the full collection for you as well.
So those are some of the latest capabilities that we’ve added for off-network and on-network Mac collections within the product itself. Before I hand it over to Christine, I am just quickly going to show you some of the capabilities that we have built for the similar face recognition as well. So I’m just going to switch to my case of similar faces identification. You see all these in the thumbnail view, you see all these different pictures that we have, and if this is the picture that I want to find similar faces for, I simply have to do right-click, search similar faces, and it will then run a search through our AI server at the back, which is hosted in your environment to find the similar pictures. So you can now see all the different pictures that are there.
You can also import a picture that is outside of your case. If you say I want to select a picture that is outside my case, probably this one here and I want to show, well, it’s the same picture, but I am just trying to show you the results. So for it to show you, it’s just going to bring those old results back to you when you import that picture from outside of the case as well.
That was our similar face and object recognition. As you may notice in this release, we have added this newer button here, and that is our integration library. Integration library, we believe it truly helps you use best-of-breed solutions. We as Exterro firmly believe that an investigative lab and an investigator has to use best-of-breed solutions. You’ve got them validated, you’ve invested in them, so why not? But FTK provides that platform where you can use all of them together as we are expanding our ecosystem of integrations with other vendors. Very shortly we’re working on some really exciting things with vendors like Oxygen Forensics as well and you will see this integration library growing.
But today you will see that we have Splunk integration. It explains everything that Splunk can do. You can click on the Splunk SOAR integration guide and it’s going to download the document relevant for you. You can similarly look at Palo Alto. If you click for more information, it’s going to redirect to the Palo Alto Marketplace. And for Griffeye, it allows you to import the CSV that could be exported out of Griffeye for your grading. So if you use Griffeye as a tool for grading, and that is the one that you prefer for grading, you can continue to use that. You export your CSV for graded images and bring it into the same case in FDK. We will automatically mark the categories that you had graded in Griffeye to exactly the same categories in FTK. It could be Project WIC, CAID, whatever you use it for. And then, of course you can continue with a much deeper dive investigation that you will expect FTK to help you.
All right. I am now, at this time, going to hand it over to Christine, who’s going to show you all the amazing things that we’ve done for mobile forensics and how mobile forensics could be conducted with FTK 8.1 and some of the features that we have put in that could help you for mobile investigations, so on and so forth. I’m just going to stop sharing my screen and hand it back to Christine.
Christine: Thank you very much, Harsh, and thank you everyone for joining us today. So, we’re going to take a look at 8.1 and we’re going to go through a mobile investigation case that I have.
So, one of my roles here as a technical engineer is to go through our software the way that our customers would using the experience that I’ve had over the last 16 years as an investigator and an operation manager. So, when I look at the new features for 8.1, I look at how can we utilize them to make our investigations efficient, and how would our customers be using these features? So, the best way to demonstrate this is to do a case together. So, this is a mobile investigation case that I have. And the reason why I picked a mobile case is because my experience over the last few years of being an investigator is that mobiles have been the most challenging and that’s because mobiles are quite complicated. There are different ways to extract them and different tools to extract them, and because of that, one of the issues I used to have in my lab is mobile data being looked at in isolation.
Now, FTK allows me to bring in mobile data from different applications so that I don’t have to look at that data in isolation, I can look at the bigger picture. And one of the most popular services I offered was preparing mobile data to be reviewed by an officer, somebody that doesn’t have that digital forensic background, somebody who doesn’t have that training and experience of navigating through a forensic application.
So, if I was to use FTK in my previous role, how could I benefit from the features and the functions of 8.1? In two ways, and that’s what we’re going to go through today. So, first of all, how can I bring so many different users into my case? Well, let’s start with the dashboard feature of FTK. Because what this does is gives me an insight into my data, into my case, within seconds. If I have somebody reviewing the data who wants to focus on a particular aspect, they can use this dashboard as a filter and go straight to a particular set of data.
So, we’re going to look at any data that’s got location information for Zeebrugge. And straight away I can see this data in my filter and I can start going through and looking through it. So we have the ability for users to go straight to the data that they want, but equally, for those of you that haven’t seen FTK 8 before, as an investigator, we can also look at all of our data in this nice easy to review filter manager. So, if you want to look at your mobile data, you can click on the filters and go straight to the information that you want to view.
Now, remember, one of the benefits of FTK is bringing that data in from multiple sources. So, this may be a case that has a computer in it and it has mobile data from Oxygen and mobile data from UFED and mobile data from XRY plus raw extractions from Greyshift, and I’ve utilized the one of the features of 8 is to bring in raw extractions to save time on decoding, so it’s decoded within my tool. So I’ve already utilized that functionality bringing that data in and now I can look at all that data together in one place.
FTK doesn’t just have this traditional grid view of looking at the data, where I can sort and filter based on any of these columns that we’re looking at here. It’s quite a traditional view of looking at data. We allow you to look at it in different interfaces so that you, whether you’re an experienced investigator or an officer that wants to quickly get an overview of your data, can look at that in the way that’s most convenient to you.
So, looking at our core data in SmartGrid, we can see things like how often calls are made and things like maybe which number was called most often. And this may allow us to take information straight away out of the case and conduct further inquiries. So this may be a number that doesn’t belong to any of the handsets that I’m currently examining, and straight away, within seconds of being in my case, I’ve now got some intel that I can go away and do further checks on.
But 8.1 takes this a step further by bringing in entity management, and this is where we add the technical aspects of being an investigator with the human elements of being an investigator; because we know that when people log into applications and log into their user accounts, they may not use their actual name or their real name, they may have a number of aliases. And this is where entity management can really help make our case efficient.
We may have a number of entities on our data set, whether that’s email addresses, telephone numbers, various user accounts, and what this does is FTK can combine and merge entities that it believes is the same person, regardless of the user accounts they’re using. It can cross-check things like email addresses and telephone numbers. So now, when we’re looking at this entity, we’re not just looking at one account, we’re looking at all the possible accounts that belong to this particular person. And we can do that for a number of our persons of interest. So we don’t have to worry if they’re using their real name or a nickname or a pseudonym or any other alias across multiple accounts, we can see all that merged here in our dashboard in our entity management dashboards.
And it may be that the system is smart enough to identify where two email addresses match or two telephone numbers match. But as an investigator, I might have additional intelligence about the case. So I might know that there are two entities here that are actually the same person and I want to merge those myself. And we can do that by selecting them and then giving our entity a name, and we can merge that ourselves. So, combining the technology of the tool, the smartness of the tool, to automatically say, ‘actually, did you know that Steph also goes by the name Queenie, and that Lena also goes by the name of the Knight’, and combine that with the intel that I have as an investigator, to be able to say these two I know are the same person, because I have intel outside of my investigation that puts those two together.
And I can also add further intel into my case by editing these entities and adding things like the person of interest’s picture. I might add information about them to the case or maybe other information that I have from maybe a third party OSINT tool or information I’ve got from another source into the case, as well. So again, another way to keep not just my data sources together in a case, but intel on people together in a case.
We can search for entities, so if we have quite a few in our case and we’re looking for a particular person, I can do a search and have a look at any records associated with that person.
So, let’s take a look at one of our persons of interest, look at Lena. We want to look at all of Lena’s accounts in one place. So we’ll click on Lena and now we can see how she interacts. Who does she speak to? What applications is she using? Now, bear in mind that my case here is just a small test case. Realistically, when I’ve done mobile phones, I can have thousands of messages, thousands of calls, different applications that people are using, different circumstances where it may be very efficient for me and very ideal for me to get an overview of the case at the start, not when I’ve spent an hour reviewing messages and reading conversations between people and getting an idea that, oh, I think this person is this person and they’re talking to the same person, but in different names and different accounts. I want to get a visual representation of what is going on in the case quite quickly, and entity management gives me that.
If you take, for example, let’s say we were looking at a harassment case, we could look at someone’s interaction and we can see straight away that the interactions between the person of interest and the victim of the harassment, it’s all one-way communication. So, straight away I can visually see that. It might be that as in this instance, I’m looking at an organized crime gang. So I want to just look at the interactions between certain people. So I can also do that too. I can narrow down my interactions to look at them in whichever way is best for my examination. Do I want to look at just particular applications? I can do that, too. Let me see all the interactions that I just carried out on WhatsApp or Kik or via Messenger, or I can look at interactions that happened between a certain time frame. I can look at that too.
So let’s take, in this instance, I want to look at the communications between a certain group of people. Because these are the people that I have identified in my organized crime investigation are the investigations of interest. So I can add that as a filter, and now, because of the flexibility that FTK gives us, I can now move that data to an interface that is best suited for that information that I want to look at, which is possibly grid view.
So in this view, I can look at my messages and review them. Anything of interest to my case, I can bookmark them, so I can identify what’s relevant to my case. And it may be, so going back to sharing this data and sharing this mobile data with other resources, in my experience, I have had a lot of cases where it may be most useful for the officer to review the mobile case, especially in cases where drugs and fraud are involved, because that officer knows the slang terms for the the drugs, they know the people’s names and the nicknames involved, and they have intel about the case that may make it useful for that officer to have the first review where officer can go through the messages and bookmark any messages of relevance.
If the officer requires further assistance, so they’ve gone through their messages, they’ve bookmarked things that are relevant, they’ve carried out keyword searches, so you can carry out keyword search across the whole case, or you can look for keywords within messages itself, which can be highlighted. All of this helps you identify the relevant data.
They may then also carry out a media review. So, if we take a look at the media in this case. So whilst we’re doing our investigation, we might take a look at the media review and this officer’s doing a drugs investigation. They may come across some indecent images or disturbing images, they can all be marked in this case, which means any future reviewers that come in this case will straight away get a warning saying the case contains indecent images.
Those images can be tagged so we can add some welfare into our case to ensure that I’m not overexposing myself on a drugs investigation by looking at something that is a bit is indecent and not something I want to look at every time I have to go through my investigation, or I can just hide them from my case.
So I’m now doing my drugs investigation, and I might identify a picture of relevance. Now, if this is my officer and my officer is doing the review, the officer knows enough about the case to say this pitch is relevant, that pitch is relevant, that chat message is relevant. What the officer can do is they can bookmark items that they want to identify as they require further assistance from the digital forensic unit, from the investigator, from the person that understands the data.
So, we can work on this case together where I as an investigator identify things that I want to share with the officer. So I might bookmark this image for the officer to review. So the officer can tell me, is that your person of interest? Is that the victim? Is this the spreadsheet that you needed? And the officer can in turn, identify data that they might want further analysis on done by someone with that training to say, how did that data get on there? Where did it come from? Was it distributed?
So, we’ve gone through our case now and we’ve identified messages of relevance. We’ve identified multimedia relevance, and we can now look at our data in a number of different formats. Again, using the smart grid to do things like look at any times and dates of relevance, identify applications of relevance and look at our data again in any format that is relevant for us at the time. So we’re going to take a quick look at the time view. So we can again look at our time in time view and we can do things like we have identified a potential date of relevance and we can generate a report from a timeline that tells me everything that occurred on that day.
Once we’ve identified the date of relevance and we’ve got the data that’s relevant to our case, we can now start looking at how do we give that information to our reviewers? Or how do we get that information ready to go to court? And this is where one of our latest 8.1 features of the reporting really has enhanced the ability to get that data out in a format that is viewable and presentable to court and to your end users.
So, we’ve gone to create a report and I’ve created a template previously. This allows me to have my logo and my title already prepared in my case. Also, what you find in a lot of organizations, especially from the lab that I used to work in, I have services. So, if I’m producing a report for a particular customer or for a particular case, I may already have pre-configured columns. So, being able to select the template that I want can save time in generating the report by saying this report is for this customer and it’s a CSAM investigation and this case is for this customer and it’s a mobile investigation. So, some of the pre-configured settings that I’m about to show you now can be set in advance so that you just select the right template you want for your case.
So we can give it a title and a name, and then we can decide what data we want to include in our case. For this particular case, I want to include all the evidence that I’ve identified as relevant, all the chat messages, all the multimedia, the internet history, everything about my case that I’ve identified during the analysis stage, I want to include. We can also bring in a timeline so we can create a timeline in the report view, so I can go to ‘create timeline’ and say bring in all the data from a particular tag. So, earlier in this case, I ran what’s called a search report for keywords. I can say ‘generate a timeline based on any of the labels that I’ve already identified’. Or, in my case already, I had already looked at a specific date for the 26th of March, and I created a timeline in the case. I can add either of those into my report. I can also configure the report in terms of formatting and columns.
Do I want to, when I export my data, how much data do I want to be in the report? For call logs, for example, it might be that, I’ll show you one that I haven’t changed yet. So, it might be that when you’re exporting your data for any of the particular objects, you may find that this is too much information that you want in your report. So you can edit this to just give the subject header, the subject, the email name, the tos and from. You can edit which fields of the data sets you want to include in your report. And this is the one I’ve done earlier for calls where I just wanted to contain the information that I feel is relevant for my report.
We can highlight information in our case. Do we want to identify a keyword, like wherever the suspect’s name is mentioned, please make it bold, wherever they talk about a keyword that’s relevant to my investigation: cocaine, drugs, guns, a search term associated with CSAM, I want you to highlight that and make it bold. These are all the options that I can add to my report to customize it. Where I’ve included evidence, do I want to include a thumbnail of that picture, a thumbnail of that evidence, or do I want to include a copy, a full-size copy of the evidence itself?
And then once you’ve generated your report, you can choose whether to have it in a PDF format or a Word format. And I’ve done both just to show you the differences, because in the PDF report you can see here, you will see that I have the overview of my evidence. You will see that I’ve had the images embedded so you can see them in my case with the details of the files themselves. And one of the things that I really like is that when you embed conversations, and this is one of the feedbacks that we’ve had from a number of customers, is that when you bookmark conversations, when you view it in FTK, we have this very nice viewer where you can see conversations to and from and it’s very visually nice. But when you export that into a report, it’s quite hard to read.
In 8.1’s reporting, we have that same easy-to-read visual output of the conversations, whether you’ve exported the conversations individually that you can see here, or you’ve exported out a conversation and you can see in that conversation view in your report. And this is a quite nice way to present those conversations to court.
And then if you scroll down further, you’ll see where I’ve added in the timelines from my case. This is the timeline that I generated earlier for the 26th of March, and that’s all shown in my report. You can also have this as a Word document. Now, for me, the reason why this is a great feature to be able to bring into a Word document is because in the UK, our reports have to be in a certain format to be presented to court.
So now I have the option to present my reports in two different ways. I can have a PDF document that hasn’t been edited since it’s come out of the tool, so I know it hasn’t been changed or amended, and I can evidence that in a statement. Or I can export it out as a Word document, which allows me to edit this document, and I can add in paragraphs outside of the tool, such as things like my validation certificate or the the background about me myself as an investigator that I need to include in my reports, or it may be that I have a third party tool that I’ve used for a different part of forensics, and I want to combine those reports together. So having an editable report is also an option within FTK.
Thank you very much for your time, and I hope you’ve enjoyed these features.
Lynne: Thank you, Christine. Harsh, If you can pop a couple slides up at the end, we’ll wrap up. I just want everyone to make sure you know where to go next and make sure that we have some conferences coming up over in Europe. We’ll wrap up with that. I have also been answering the Q&A and some of the chat in the panel. And so I’ll read some of those out, as well. Harsh has got some slides popping up here. I just want to make sure everyone knows we’ll have a website up. It’s literally going live right this second. I think they just hit the button just now. So if you want to learn more, you can go to the website, the link is right there. It’s literally exterro.com/ftk81, super simple. So, you’ll see everything on social media. We will have every single feature highlighted on social media. We’ll have tons of videos of people giving little demos of every single one of these features in addition to what Christine just showed you. Which is, she’s amazing, right?
So that is all live now. And then just making sure, the next slide, if you visit that one quickly. There is an exchange conference coming up in October in Frankfurt, Germany. If you’ve been to one of these before, you know they’re amazing events. They are free to register. It is an amazing two days of sessions and thought leadership and networking. And just again, these sessions have been so highly rated and everyone who attends absolutely finds it to be a great use of their time.
So if you want to sign up for that, you can scan the QR code here. You can also search ‘Xchange’ on our website. There’s a whole page where you can sign up, but just making sure that conference is coming up in the Fall. Again, everything’s on our website right now for FTK 8.1, there’ll be social media going up all week and gosh, for the next couple of months if you have any other questions, let me know. I am just going to read a couple of these out. If you want to hang around, you’re welcome to.
So, in terms of questions, everything, I think in the chat I posted the actual link to the download page on our website where you can get the downloadable version of FTK for Standalone FTK. If you have FTK Central or FTK Enterprise or FTK Lab, you will probably need the professional services installation team to help you with that. So again, just, send us a note here. You can send it in a chat. I’ll see it. And we’ll make sure to have someone contact you. If you have a sales representative, you can let them know, and they’ll hook you up with that professional services team.
But FTK Standalone version, you can download it today, and it’s ready to go. The update to upgrade from 8.0 to 8.1, it’s very easy, you can just go install the update yourself for FTK Standalone. You do not need any help with that. So that is an easy one. This webinar is also recorded. So anybody who wants the recording, that will be sent to you automatically. So be on the lookout for that. As far as people asking questions about entity recognition that Christine showed us today, I think you can see as she showed, you can manually edit the entities, whatever has been merged, you can merge your own. So all of that is fully customizable, very easy to use.
Somebody did also ask, facial recognition, image recognition, is that available in FTK Standalone? And it definitely is, that all available. The Whisper AI feature will require you to have FTK Connect. So FTK Connect is the automation tool. There’s a full featured version for corporate and public sector customers that does all the API scripting. But there’s a much, much cheaper version that’s available of FTK Connect in our web store, or again, through your account rep, but there’s a FTK Connect Lite version that you’re able to purchase. Again, literally only a couple of thousand dollars. Extremely inexpensive. So, the Whisper AI feature will need FTK Connect in order to work.
Just looking at some of the other questions that have come in. Somebody asked about Internet connectivity in order to use FTK Standalone. All the features in FTK are available even if you are not connected to the Internet. You can even download offline maps if you do need some app and geolocation information while you’re working, so that’s all available in FTK Standalone all by itself. Let me just see if there are any other questions in here that we can answer quickly. Yes, definitely, the recording is going to be available. Don’t worry. There are FTK 8 training opportunities and content that are available to you. A lot of it is free. And there will be an FTK 8.1 certified investigator class. So be on the lookout for that. We’ll make sure to send you all this information so you can click the link and read about it. Frankfurt exchange, forgot to put the information up about that.
There is also a trial version of FTK. I’ll post the link here again for you. No problem at all. Put that in the chat and I’ll make sure that we post that in the post information. Let me just grab the link right now. I’m going to type it into the chat right now, and there it is. So there is a free trial available for FTK. You just have to fill out a form so we can get you on the list and then somebody will personally reach out to you and send you the information that you need in order to get that trial installed. Again, depending on what you’re interested in and who you are and what you’re trying to do, we have a couple of different ways to deliver that trial to you. So, if you fill the form out, we’ll get you in the queue and we’ll make sure we get you the correct version so that you can try that out for 30 days for free.
Again, for training certifications, I will make sure to send you guys all of the information there. There are definitely free training videos that are available on demand. In terms of certifications, I’m not sure if they’re free or if there’s a small fee attached to that. So I will check on that and make sure to get you that information, I’ll see your question there.
Another question that just popped in is the mobile portion included in the FTK 8 license? Or is that an add on? All of the features that Christine showed you today of reviewing mobile data, processing mobile data, parsing mobile data, right? Using the timelines, the entity recognition, the alias merging, all of those features are included as part of FTK 8.1. None of that is in a separate module whatsoever. So that’s all included. So that’s good news.
Okay, the trial version? Yes. Last question that just popped in. The trial version is only available to be used one at a time on a computer. So once you activate that trial on that particular computer, that trial will run there for 30 days. If you do need a different trial to run on a different computer for you or a different user, you’ll have to get that as a separate install. Again, I’m sure you can understand we have mechanisms built into the trial to make sure nobody installs it and downloads it like 37 times, right? We do have them assigned to one computer at a time. But again, if you fill out the form and when we contact you, just let us know that you’re like, hey, I have two computers, could you get me set up with that? And we’ll get that coordinated for you. So that’s no problem.
A question that just popped in about features being shown as part of the FTK suite: if out of all of them, which belong to FTK Standalone? So basically, in a nutshell, anything that has to do with remote collection, remote collection from a remote Windows PC or the remote off-network Mac collection that Harsh talked about today. So anything that has to do with remote collection is only going to be available in FTK Enterprise and FTK Central. FTK Lab and FTK Standalone, those do not have that remote collection capability. So those particular features are not available in Standalone or FTK Lab.
FTK Imager in terms of mobile data acquisition. So any of the FRK tools do not do any mobile data acquisition. We used to have a product a very long time ago called MPE, Mobile Phone Examiner, but we don’t have that anymore. We are leaving mobile phone collection to all of the other parties in the space, like Harsh mentioned, Oxygen and Graykey were certified partners. And any of those other tools that you are already using to do that mobile acquisition, those are great, we’ll take an acquisition file from any of those tools. It doesn’t matter which one, right? Cellebrite XRY, Oxygen, Magnet, Graykey, whatever. So all of those tools, whatever raw native extraction that you get out of those tools, you can immediately import that into FTK to process it and parse it and review it along with all your computer data or whatever else you’ve collected. That’s where the sweet spot is getting all of it in there to process and review together.
Let me look and see what else is in here. There are a few questions I definitely can’t answer in here, and I’ve even asked them to Harsh over chat, and he said he’s going to have to check. Like Peter’s question about Microsoft chat and Microsoft Teams, Harsh is going to look into that for you. Let’s see what else is in here. We’re getting to the end. Anyone have any other questions?
I think I’ve answered everything I can answer. Again, I’ll get a copy of all these questions and I’ll make sure to follow up with all of you individually if we did not answer your question, but I think I got most of them. I know a lot of people have asked about training certification and yes, I am not prepared with that information today, but I will make sure to get that to you. I apologize for that. I’m going to post the download link here one more time. There are two different pages, and so I’m going to send you the link, I’m pasting it in the chat right now. I just posted it. So on this product download page, there are two tiles, you’ll see they’re the red background. There’s one for Lab, Central and Enterprise, and that is going to be all the documentation that you need for those particular versions of FTK. Because again, those are complicated installs, and we’re going to want you to work with the professional services team to make sure that’s all configured properly.
But then there’s also a tile there that’s for FTK 8.1, FTK Standalone. On that particular page, if you click on that tile, that will take you to the download the iso executable file that you can install to literally install 8.1. And again, all of the documentation, the install guide, the user guide, the artifact guide, that’s all there too.
Okay. I am going to put together a training link. So that I can email that to all of you after, because again, there are a couple different options for training. So I want to make sure I send you the separate links for any of the free training that is available for FTK 8. There are definitely a bunch of free modules. And then I do want to make sure I get you the right link for the FTK 8.1, like certification class and all, everything that goes along with that. So I will send that to all as a separate note after the fact. Any other questions? Thank you guys for hanging in and asking so many great questions. Harsh, is there anything else you want to wrap up with?
Harsh: No, I just want to say thank you to all the customers, partners for your support. On the mobile acquisition, I would say that is our current status as of today, but we’re working on some really good things with some of the other vendors in the space. Very shortly you will see some of the great collaboration between Oxygen and Exterro technology as well. But yes, until then we do not acquire devices, but yes, as you said, we do allow the imports. Thank you.
Lynne: Of course, oh good. Henrik says, “see you in Frankfurt”. Okay, good. Sounds great. I hope all of you guys are able to go and make it to that. It’s a really great event. Again, thank you again for everything, Harsh and Christine, amazing presentations. Again, I’ll send a follow up for everybody in terms of training information and all the Q&A, we’ll send a little document so everyone can see all the answers again.
Thank you for joining. Everything’s available on the website. You can go download the standalone version now, and you can read about everything you saw today. And again, there’s a handout tab. So if you click on the handout tab, you can download a couple of the product briefs that are ready for today.
Thank you again, everybody. We really appreciate all your time and we hope to talk to you again soon. Have a great rest of your day.
For further information and to sign up for a free trial, visit: https://go.exterro.com/FTKfreetrialsignup.
Michelle: Hi, everyone. And thank you for joining today’s webinar: Maximizing Data Collection with SaaS Innovations. I’m Michelle Durenberger, and I’m the field marketing manager here with Cellebrite Enterprise Solutions. Before we get started, there are a few notes we’d like to review. We are recording the webinar today and we’ll share an on demand version after the webinar is complete. If you have any questions, please submit them in the question window and we will answer them in our Q&A. If we don’t get to your question, we will follow up with you after.
Now, I’d like to introduce our speaker today, Monica Harris. Monica has decades of experience specializing in the development, implementation, and training of proprietary software for eDiscovery service providers such as KLDiscovery and Consilio. Before joining Cellebrite, she worked with the U. S. Food and Drug Administration, where she oversaw policy and procedure curation, enterprise solution rollout, and training for enterprise solutions. Monica is an active leader and mentor in the eDiscovery community and has lectured on trending topics in eDiscovery at American University and Georgetown University and is the co-project trustee for the EDRM text message metadata project.
Monica has previously served as the assistant director of the DC chapter of Women in eDiscovery and as a board member of the Masters Conference. She currently serves as immediate past president of the ACEDS, Association of Certified eDiscovery Specialists, DC Chapter, and is a member of the EDRM Global Advisory Council.
Thank you so much for joining us today, Monica. If you’re ready, I’ll hand it over to you so we can get started.
Monica: Thank you, Michelle. And thank you everyone for joining us for today’s webinar, Maximizing Data Collection with SaaS Innovations. Let’s jump right in and talk about the critical nature that SaaS solutions have or just talk about the critical nature of SaaS solutions for internal investigations and eDiscovery. I think SaaS solutions are ubiquitous in our industry, with platforms moving from litigation holds to processing to discovery review, and now even collections moving to SaaS. Pretty much, if you can think of a software solution, they are SaaS-based.
And there are several reasons for that, including the fact that organizations need the ability to adapt to constant change, whether that change is happening inside or coming from inside the organization or outside, or maybe even both, organizations need the ability to be as agile as possible. And when you think about traditional software deployment methods where you could be thinking about potentially doing a P.O.C and then from there bringing in hardware, you’re working with a software that’s a little bit more beefier and maybe a little bit more resource-intensive, or whether or not it’s virtual, software deployments or traditional software deployments as we have seen them take careful planning and they are not rapid. Unlike a SaaS solution, which does have the ability the deployment option is removed completely, and then in addition to that, you have consistent innovation, which can always be helpful to stay up with the ever changing or dynamic role of technology and particularly mobile technology.
SaaS solutions are adept because of their ability to consistently update to work with the consistent updates on your mobile devices. I know that in these webinars we have talked over the past year or two about the constant updates that can happen to mobile devices, whether it is the device itself, which you may update annually, or the operating system on your device, which could be set to auto update. And that could be happening once a month, twice a month to the applications on the device itself. With SaaS solutions, updating or evolving in real time, that is perfect for the consistently updating mobile device market.
In addition to that, SaaS solutions are responsive to emerging threats; nefarious actors are always at play. I think we’ve all been through IT training to ensure that we understand what those threats look like. And any innovation that addresses those threats can be added to SaaS in a seamless manner, same for any regulatory changes that an organization could be facing.
And then, of course, the technical advances, the new features, the things that are differentiators in the market, the things that customers want as soon as possible, Saas makes it possible to provide those features, those solutions and add on really that armor against risk in real time. And by doing all of those things, that’s how SaaS solutions really empower organizations to be efficient and to be effective in terms of their operations and in terms of the dynamic climates that they could be facing.
The benefits of SaaS are numerous, but we can cover some of them here, starting with risk mitigation. And oftentimes that risk could be as simple as silos within organizations. It could be eDiscovery teams that may be siloed from a forensics team, but have a need to talk to the forensics team about what it is they need for their discovery or for their integration.
It could be, or speaking of integration, it could be the product integration itself with one software platform being hosted one place and one software platform being on prim and the need for these software platforms to talk to each other because you are moving through a process or a workflow. Let’s say it’s the EDRM workflow, for example, it’s important that there are fluid handoffs between those software products for investigations in cases to close seemly, if not be accelerated.
They’re cost effective. When we talked a little bit on the last slide about traditional deployments, traditional deployments come with overhead cost and SaaS solutions eliminate overhead cost, meaning that your upgrades at any maintenance that happens is seamless. Ao it’s cost effective, reducing overhead and also making it easier for teams that may not be as robust to handle the software.
And then accessibility, which really also is the other side of the coin when we’re talking about the silos within organizations. For SaaS solutions, there is transparency that really removes those silos that really is how we have the risk mitigation, but that accessibility overall is just great for an entire organization.
With SaaS solutions, because of the immediate nature of the SaaS solution in terms of how quickly you can get to them and how quickly you can use them, when it comes to collecting and mobile data in particular, you can avoid data spoliation. I think we have seen a lot of cases in the news in the recent years where we’ve seen data spoliation for no other reason than how much time it took to get to the device.
It could be that a provider or that a firm or that an organization came to understand that they needed to collect data, they then had to go through traditional procurement methods to get a solution that could handle the collection of the data, including the deployment, and then they were able to reach out to an employee or a custodian. And by that time, any number of things, some of which may not have had anything to do or any malintent from the employee or custodian could have happened that would have caused data spoliation. When you remove the need to have those complex deployments, when your access to the capabilities of the software is immediate, you can get to that data faster, reducing the risk of spoliation.
In addition to that, we talked about the ever-changing updates that come with mobile devices, whether it’s to keep them innovative with the latest features or even secure so that malware won’t be present on the device that is constantly evolving data that’s a constantly evolving landscape.
And so having collection software that evolves with that is important. And really, that lends itself to collection readiness, meaning that regardless of the operating system, the device, the application that you want to collect from, or regardless of where the device could be, because now we have a hybrid workforce. So your employees could be at home, they could be in the office, they could be in the field, but just ensuring that your SaaS solution, your SaaS collection solution is adept enough to be able to handle any and all of those scenarios, prepares you and gives you collection readiness.
So, with the benefits of SaaS, what does that all mean or what does that all come to for Cellebrite Enterprise Solutions? That gives us the ability to talk about our full end-to-end SaaS solution, enabling a holistic collection management from end to end. And what does that mean for us? Well, that means from preservation and that goes back to avoiding data spoliation.
So, from preservation of data, whether that be targeted or advanced solutions or advanced collection, it could be that when you’re preserving or even when you’re collecting, what’s most important to your case or your investigation is just text messages. It could be that what’s most important to your investigation is a more fulsome look at what could be on the phone, from text messages to applications that are installed to, let’s say, for instance, it is an insurance use case and your insurance carrier is state farm and a phone to tell you things like whether or not the driver had an application open at the time of the accident or how fast the individual was going at the time of the accident, and that’s a more advanced look at the phone. So it’s not one size fits all, but in an end-to-end solution, you’ll need to have the ability to target responsive or important data and also get as much data as possible.
We’ll also need the ability to do early data assessment, to be able to determine as quickly as possible whether or not the data that you need, the relevant data that you need is present on the phone or whether there are other sources that you need to take a look at. And then searching and filtering, because in eDiscovery sometimes there are still cases where you need all of the data, because you may need to return to it at a later time. But what gets promoted for downstream processes, whether that is review or investigation, that requires searching and filtering.
And, of course, for downstream processes, you will need that reporting, and depending on what the next step is for the data, you may need conversion. So, a full end-to-end SaaS solution will encompass everything from preservation of the data all the way out to its reporting and conversion for other platforms.
Now, let’s look at the SaaS offering solutions that Cellebrite Enterprise has today. Starting with Endpoint Mobile Now, which is our solution for remote targeted collection of mobile data for preservation and EDA. With Endpoint Mobile Now, SaaS, you have immediate access to mobile data. So, when we talked earlier about the need for preservation, this is about as quick as you can get to mobile data as possible with just three simple steps.
This is an online solution where an examiner can go into the product, they can enter the custodian’s name and the custodian’s email address, they can determine what data is needed from the phone. So, if you want to collect everything that you see here, including more, like a list of applications that could be installed on the phone, you can do that, but if you just want to target the text messages because you’ve got a couple of hours and you need that data as soon as possible, you can do that as well.
And then the examiner is going to determine where the data goes, whether or not that be an S3 bucket or an Azure blob or even SFTP, the data will come back to the examiner for immediate examination. So that’s real-time collection of mobile data without any of that overhead, no maintenance, no updates, and scalable pricing and usage for mobile collection.
Now, one of the things, what we’re looking at here on this screen, is what you have the ability to collect. And with our Mobile Now product, we are looking at advanced logical collections. So, let’s take a look at what that looks like. For an iOS device specifically, because while Mobile Now does have the ability to collect from both iOS and Android, we do see more iOS devices in the US.
An advanced logical collection for an iOS is going to show some of the artifacts that you see listed on the left, including a list of applications that are installed on the phone; calendar items; contacts; call logs; any media; messages, whether that be the native messages or iMessages, for example, SMS, or from third party chat apps; and then also web history.
Now, that’s really important for two reasons. One could be that you are needing to collect text messages or preserve text messages, so then you can collect just that and your collection is done. Or it could be that you don’t know what you need to collect just yet, you may not know what you don’t know. And then an advanced logical collection could easily turn into early case assessment with the fact that that advanced logical collection will tell you what applications are on the phone and allow you to either have informed conversations with the service provider that’s going to go in and do more advanced collections for you, or even if you are the service provider, for example, let’s imagine in this scenario that you’ve done this remote collection, you get back your advanced logical and you understand that there is a more complex mobile application on the phone like signal. That now means you have to send someone out into the field to do a more advanced collection. But before you ever sent that person out, you had an understanding you had a business case for doing so. And that is what Mobile Now can provide you with.
Now, everything that we’ve looked at here is from the examiner point of view, but we are talking about remote collection. And so for the custodian or for the employee that’s engaged during the collection process, Mobile Now is about as straightforward for mobile collection as we could possibly design, meaning that once the examiner has set up the collection, the custodian is going to get an email, the email is going to tell them that they need to install a lightweight application, and then it’s going to give them a code that they can put into that application. The application was designed for either a Windows or a Mac, so as long as the custodian of the employee puts that lightweight utility on their machine and uses the charging cable of their device, their either iOS or Android device and plugs into their computer, they can then take the code from the email, put it into the utility, and that code in the email it has all the instructions that the examiner set up at the time of collection. It knows that it’s going to do a full advanced logical collection or that it’s going to collect just text messages and it knows that when it’s done collecting, it’s going to send it back to the repository that the examiner designated, and it does all of that with very minimal custodian engagement. So it’s not disruptive to business at all. A custodian never has to give up their phone, and the collection can be done in minutes. That is the power of Endpoint Mobile Now SaaS.
Endpoint Inspector SaaS would be, I’d say, the big brother to Endpoint Mobile Now. Whereas Endpoint Mobile Now focuses specifically on the remote collection of mobile data, Endpoint Inspector SaaS not only can remotely collect mobile data very much in the same fashion that you just saw in Mobile Now, but it also has the ability to remotely collect data from computers and cloud sources, as well.
Endpoint Inspector SaaS has the same SaaS benefits, so no deployment, you are ready to hit the ground running without traditional methods of software installation. And there’s no need to do any type of traditional maintenance or updates. Some of the added features here is that for Endpoint Inspector SaaS, when you’re looking at the collection of cloud applications like WhatsApp or Telegram, for example, that is agent agnostic.
The agent is that lightweight utility we talked about previously that the employee or custodian would need to install on their PC. There’s no need to do that when you’re collecting with cloud sources and Endpoint Inspector. That can be done with a QR code, for example. When it comes to the computer collections that happen with Endpoint Inspector, those have the ability to be discreet, meaning right now, as I sit here and present to you on my laptop, it could be very possible that a collection could be taking place on my laptop without it being disruptive to what I am doing or working on at all.
You may want to ensure for a more holistic collection or even EDA of a custodian that in addition to collecting from their mobile device, their phone, their tablet and also from cloud sources like M365, that you also go out and do a targeted collection of files on the computer.
Let’s say you’re interested in any documents that could be generated from M365. Email, Excel, Powerpoints, Words and any pdfs, just to ensure that anything that might have,let’s say, be downloaded from in M365 and placed on the laptop, that’s part of the discovery for your collection or part of the evidence for your investigation. So, by ensuring that not only are you going after mobile devices and after a cloud, like in M365 but are also looking at what’s on the computer of a custodian or employee as well, there is no stone left unturned.
One of the great things about Endpoint Inspector is that when you talk about collecting from all of those sources, all of the places where a custodian or even an employee might have data, you can initiate all of these collections with one job. So, I can say I think it’s important to collect from Tom Smith’s phone, Tom Smith’s computer, and also Tom Smith’s WhatsApp collection. I can initiate that with one click once I have set up all of the parameters for the job because I can do targeted collection.
It’s not that I’m going to pull back everything from the computer, it’s that I’m just going to pull back those office files with the right extensions from the past month and particularly anything that was in the documents folder. And I may know that I want to go after those office files for the past month in the documents folder because I’ve used Cellebrite Inspector to do triage on the computer first. So I know exactly what’s located there, exactly what I need to bring back before I ever initiate the collection.
Once I have all that data, using Cellebrite Inspector, I can then begin to examine the data. And if I find that there’s data that I want to take downstream processing, let’s say a review platform, for example, I can create reports and convert data out of Inspector.
So I can create each eDiscovery load files, for example. So here with preservation, you can do the advanced logical or the logical collection that we talked about that you saw in the Mobile Now product that’s also an endpoint inspector. You can target the collections, whether it’s for mobile or for computer. You can even do triage as a form of early data assessment. Using Cellebrite Inspector, you can search and filter that data from any source that Endpoint Inspector can collect from. And last but not least, you can report and convert that data so that it can go on to any downstream processing.
Now, thus far, we have talked about SaaS solutions that deal with logical collection, or that work with logical collections, whether that is mobile or computer. And then we’ve also talked about the type of cloud collection that can come from, say, an app like Telegram or WhatsApp. But the third and final SaaS solution for Cellebrite is not a product, it is a collection of products. And that is our Insights for Enterprise SaaS Solution, which is our advanced collection.
Also provides quick insights, streamline reporting, and conversion. And that is really a coupling of products that we’ve had previously, starting with our flagship advanced collection product, Mobile Elite. Along with Mobile Elite, we’ve combined our flagship product overall, Cellebrite UFED. And then the Physical Analyzer, but specifically our ultra series, which has a database behind it with Legalview as an option, which is what gives you the conversion and reporting for the relativity short message format and also eDiscovery load files.
We’ve added to the solution, UFED Cloud, so that you have access to over 70 cloud applications that you could retrieve data from. And for larger deployments, we have Commander so that you can manage all of those applications, that those products combined or the capabilities or those products combined is Insights for Enterprise SaaS.
Before it was Insights for Enterprise SaaS, it was Mobile Ultra. I know in a webinar that we did earlier, I think about a month ago, when we released Insights for Enterprise, because that was released this year in 2024, we received quite a few questions about what is the difference between Mobile Ultra and Insights for Enterprise. And the answer is they are the same advanced mobile collection solution rebranded. Very similar to say G-suite and Google Workspace or Skype and Link, it is a new fresh look at how to do advanced collections for internal investigations and eDiscovery.
But also one of the things I talked about when I talked about Insights for Enterprise or just generally as we’re talking about it now is the fact that Insights for Enterprise is advanced collection where previously when you heard me talking about Mobile Now or Endpoint Inspector, you heard me talking about logical collection, and there is a difference between the two. On the left-hand side of the screen, when you’re talking about logical collection, these are some of the artifacts that you have the ability to pull back. Now, the artifacts differ depending on whether or not you’re looking at an iOS or an Android, so for instance, archives are going to come from an Android, not an iOS. Likewise, if you saw iMessage on here, that would come from iOS, not Android, but together these are the artifacts that would come from a logical collection.
Now, when you look on the right-hand side of the screen, you’re looking at advanced collection. And so you could see we go from a few artifacts to far, far more. And so the difference really is in the data and also how the extraction is done. So, when we’re talking about logical extraction, whether it be for iOS or Android, or whether it be for Mobile Now or Endpoint Inspector, those collections can happen anywhere, anytime, all at once with the power of SaaS and with the power of remote collection.
However, when we’re talking about advanced collection and Insights for Enterprise, that is a phone in hand for collection. So, either that phone has been sent to an examiner for collection, or there’s an examiner that has gone out into the field to collect. That is not something that we have the ability to do remotely.
So, why would you do one over the other? Well, we did talk a little bit about early case assessment. It could be as simple as planning and understanding costs or cost control. Why send someone out into the field if it’s an unknown or a variable as to whether or not the data is there? So you could do a remote collection first so that you understand what you may potentially be sending out an examiner to collect.
But it also is a matter of data, so let’s take a look at what that looks like. So, on this screen, we have brought up some of the more popular chat applications and some more popular social media. So, including Facebook, Discord and Snapchat, and what we’re comparing here is an iTunes backup, which is very similar to a logical extraction, to a full file system extraction, which is what we call advanced collection, so that you can see the numbers paired up side by side.
We did use an iOS because you get more data generally from them because of the iTunes backup. And now we’re showing you what it looks like in Physical Analyzer. So I know it could be a little challenging to see, but let’s zero in, for instance, on Snapchat, because I know that that’s a very popular application nowadays to collect from and a really challenging application to collect from as well because of the ephemeral nature of the data in the application. But as I often say, when I have an opportunity to speak at conferences, we can collect from Snapchat, but how you collect is important.
If you were to do one of those remote collections that we talked about, where you were using Mobile Now or Endpoint Inspector, it was an iPhone, then the data that you would see in Physical Analyzer when you went to examine it, it looks similar to what you see on the left. So it’s not that you don’t get any data, you do get some, but it is not nearly as robust as the data that you see on the right side of the screen that you would do with a full file system collection. The same can be said as we go through here for Discord. You could see differences in the numbers immediately, particularly within the messages and same for Facebook as well. So, an advanced extraction or a full file system extraction, like what you would see in Insights for Enterprise is going to give you the most data and requires the phone to be in hand.
Insights for Enterprise is a collection of products as we talked about before. So, with Insights for Enterprise UFED specifically, you get the combination of all of the capabilities that were available to you in our flagship product UFED in addition to all of the capabilities that we had in the previous product, Mobile Elite. It is now one product that we call Insights for Enterprise UFED that is part of the Insights for Enterprise Solution. So, between the two, you’ve got the widest range of coverage for any iOS or Android device that you could be collecting and also you’re going to have some in depth extraction and decryption there as well.
But it’s more powerful than that when you begin talking about the Insights for Enterprise Solution as a whole. So not just insights for Enterprise UFED, but Insights for Enterprise PA, Insights for Enterprise Cloud, and also the add on with Legalview. With Insights for Enterprise UFED, you have the ability to have quick insights, and you can see that on the left of the screen. So before you initiate collection, you’ll be able to determine things like the make and the model of the device.
In addition to that, what’s not pictured here is you also have the ability to pull out a list of those installed applications very much like how I showed you with Mobile Now and remote or logical collections, you can see that in the product itself when you’re looking at Insights for Enterprise and you have that phone in hand, but not only can you get those quick insights to determine what’s on the phone, but when Insights for Enterprise UFED and Insights for Enterprise PA are sitting on the same machine, they have the ability to talk to each other.
At the point of collection, you can dictate after you’ve taken a look at your quick insights and done some early case assessment to understand that there are applications on that phone that you want to collect from, you can determine that instead of going through an examination process like you normally would in PA, Instead, you want to harness the power of PA’s reporting ability and perhaps even the power of conversion with Legalview. And at the time of collection, you can tell Insights for Enterprise UFED that you would like to create a UFDR. And that UFDR can then be taken on to downstream review. Coming up in our next release of Insights for Enterprise you’ll even be able to dictate whether or not at the time of collection you want your data to be in RSMF format if a Legalview license is present for Insights for Enterprise PA.
So, all of that combined is the power of Insights for Enterprise, from preservation to advanced collection, early data assessment, searching and filtering, if needed, or you can go straight into reporting and data and conversion.
So, what are the key takeaways from today’s webinar, Maximizing Data Collection with SaaS Innovations? SaaS solutions are critical. I would say they are more than critical, they are paramount for internal investigations and eDiscovery because organizations are consistently working with change and SaaS is always adapting in real time to assist organizations with that.
In addition to that, SaaS solutions have the ability to break down silos by creating transparency throughout an organization, and they reduce overhead because there’s no traditional deployment, making them cost effective. It’s a faster way to get to your data, whether it be mobile or computer, and it always ensures that your collection team is collection ready with the latest innovation.
Here at Cellebrite Enterprise Solutions, we offer a full end-to-end SaaS solution for holistic collection management. Whether you need to preserve data, whether that be targeted or advanced, whether you want to perform early case assessment or early data assessment, and then need to move into reporting and conversion that could be automated, we have a SaaS solution for you. And that moves through the data sources of mobile, computer, and cloud.
And we do that With three solutions: Cellebrite Endpoint Mobile Now. or the quick and easy collection of mobile data for preservation, ECA, that uses logical collection. Cellebrite Endpoint Inspector, which also uses logical collection for mobile data that brings in the additional sources of computer and cloud. And Insights for Enterprise, which is our most advanced collection capability paired with decoding, reporting and conversion. Thank you for joining me for today’s webinar, Michelle, do we have any questions?
Michelle: Absolutely. Thank you so much, Monica, for taking us through that. And we have had a few questions come in. So, the first question that we have is, “What happens if the employee disconnects their phone during remote collection?”
Monica: Oh, that’s a great question. So during the remote collection process, as the agent or the utility is working through the collection, if before the data is uploaded to its final destination, the employee disconnects the phone, they do have the ability to reconnect it. Things happen all the time, it could be a faulty charging cable and maybe you need to quickly replace it or maybe there was an internet blip. So there is the opportunity to go ahead and make sure that you reconnect or re-establish that connection. Once the collection is done and uploading is taking place, there’s no interruption there. So, once that happens, the employee can disconnect their phone and the uploads taking place and the data is going to its final destination. But we do have resiliency built into the mobile agents so that we can really work with just life because sometimes things happen. So there is resiliency in the mobile agent for custodians and employees.
Michelle: Fantastic. Thank you so much for that. Okay, we do have some more questions. So Monica, can mobile collections be performed discreetly?
Monica: That is a great question. I know that when we were talking about Cellebrite Endpoint Inspector and computer collections, I talked about the fact that we have the ability to do those discreetly. We have the ability to do those discreetly for Windows, Mac, and in our next release, Linux.
However, for mobile collections, and it will be mobile collections that are happening remotely, right? For mobile collections, those cannot be performed discreetly. You will need the interaction of the employee or the custodian, and that is why we designed a workflow that is as seamless and as non-intrusive to business as possible with just a few quick steps to get that data back to the examiner.
Michelle: Thank you. Okay, Monica, one more. Can full file systems be done remotely?
Monica: That’s a great question. No, not at this time. In order to harness the power of Insights for Enterprise and and really capture all of that data that we saw on some of the comparison shots and be able to get those quick insights and the streamline collection to reporting nature, you will need to have the phone in hand, yes. Full file system collections, which is what we call advanced, advanced collection, yes, that needs to have the phone in hand. Great question.
Michelle: Okay, Monica. Can Endpoint Inspector collect data from servers?
Monica: That’s a great question. Endpoint Inspector for computer collection has the ability to collect data from Windows, Macs, and Linux. So as long as your server has one of those operating systems, absolutely it can do so.
Michelle: Perfect. Thank you for clearing that up. And I think we have time for one more question. Is there a list of third party chat applications Mobile Now can collect from?
Monica: That’s a great question. So the mobile collection innovation that’s in Mobile Now is also an endpoint inspector. We understand that the collection of mobile data is really trending and so we made that available on its own. And when we were looking at that comparison between logical collections and advanced logical collections, that was our engineering team testing several applications to understand where we were pulling back data from applications and also how much data we were pulling back so that we could compare it and be consultative. So to that, and we do have a list of several applications that we have seen present, and advanced logical collections that could be performed with Mobile Now, an Endpoint Inspector, and we can share that. Absolutely we can share that with the audience.
Michelle: Perfect, wonderful, Monica. Thank you so much. Well, unfortunately, we are running out of our allotted time here, so we will have to wrap this up. And we will reach out to you individually after the webinar to answer any of the questions that we did not get to.
I want to give you a big thank you, Monica. That was such a great discussion on how investigators and eDiscovery professionals can close investigations and get to review faster with Cellebrite’s suite of SaaS’s cloud-based collection solutions.
Now, remember, for any additional questions or to learn how you can get started with any of our solutions, please reach out to us at enterprisemarketing@cellebrite.com. Thank you again, Monica, and thank you all for joining us today. Have a great day.
David: Hi everyone and hope you’re all okay. It’s the afternoon here in the UK, late afternoon over in Italy and going to be evening time over in the East and then morning over for you guys in the States and Canada. And whatever time it is, if this is going on YouTube, so hello to you too. Investigating Video: The Vital First Steps.
Those final two words there, “First Steps” give you a hint of where we’re going on this webinar. We’re not going to be doing a massive deep dive into anything in Amped Five, we’re not going to be looking at any super fun workflows in Five or anything like that, we’re going to be going right back to the start and looking at those vital first steps.
As Michelle said, yes, my name is David Spreadborough. If you’ve never met me before, I’m a certified forensic video analyst through Lever and I’m the forensic analyst or one of the forensic analysts at Amped Software. A little bit about my history. I was a police officer in the UK for 24 years. The last 12 years was purely dealing with CCTV evidence and forensic video analysis. And then in about 2015, I moved out from policing and went to work with Amped Software, originally as their international trainer, but for the past few years, I’ve been sort of dipping my fingers and toes into various pies and mainly now concentrating on improving the workflow in Amped FIVE and doing things like this, which is great. And a lot of research and development, as well.
Okay. So, the big thing though, is that it’s not just a few of us, we are now a pretty big team scattered around the world. In the US, in the UK and mainly Italy, obviously, being an Italian company, but also other parts around Europe. The whole point is that we are a big group of people that are extremely passionate about what we do and extremely passionate about improving image and video forensics within our community and improving the education and training and knowledge within our niche sector of forensics. And yes, we develop the software relating to it, but education is a massive part of everything that we do and it’s really important to us.
So, video, where do we start? Well, I mean, I think that says it all to be honest, because a lot of the time it is pretty mind blowing. I still get fascinated by some of the things that we find on a daily and weekly basis with video, especially from a proprietary and modified point of view. So this is where companies have then done other things to a standard in order for them to facilitate whatever they need to do, usually within the surveillance industry.
And it is a pretty mind blowing subject and there’s not a lot of information out there from within our community. Yes, there’s a lot of the technicals and there’s a lot of things to do with standards, but from within our community, within law enforcement and intelligence, policing, security, there isn’t a lot of that information, so it can be pretty mind blowing. And the mind is the first thing that we need to look at. When we’re looking at these first steps, it’s having the right mindset.
There are mainly two types of people: there are the ones that are looking forward, and then there are the ones that are looking back. And it’s very hard sometimes to change that mindset, especially when it’s new information, and it can be quite scary. And if you’re looking backwards, you’re sort of saying, “Well, this is what I was just told to do.” There’s no questioning of what’s going on. There’s no questioning of, “Well, is it actually right? And should I be doing what I’ve been asked?” Because the person that may ask you to do something may not know some of the technicalities involved. If you’ve always done it in a certain way. It’s like, “Well, that’s what we’ve always done.”
So you may be in an environment where it’s, “Well, we need to create some procedures. Well, we’ll just create procedures that are based on what we’ve always done or what we always know.” There are the other people and then there are the ones that are looking forward and they form and test theories. And it’s like, “Well, what’s going on here? Is this the reason that this is happening? Why have I got information perhaps in a proprietary player? And then I haven’t got that information in something else, or I’m getting more data in Amped FIVE than I would do if I put it into this player or this analysis tool.” What are the differences and why are there differences?
Have you got forensic rigor? So is it forensically sound, some people would say? So can I backtrack all the way from what I’ve created, all the way to how it started on that DVR or MVR? Have I got a forensic pathway reversing all the way back to how it started for integrity? And you’ve got a fact-based workflow, rather than being subjective, and it’s like, well, that’s what we want to show, it’s the, well, these are the facts. And if you stick with a fact-based workflow, then it’s very easy when you’re further down the line in the courtroom, because everyone would agree, that fact can be proved. There are no issues. It’s reliable. So, having the right mindset is one of the first things.
Second step, really, is having organizational responsibility. It is now massively important because video and digital multimedia evidence is in so much of a law enforcement organization. And all the way from identifying where that footage is to acquisition, storage, handling, viewing, processing, the analysis, all the way to then the presentation of it, there needs to be some organizational responsibility. You can’t cope with that yourself as one person or one unit, because there are so many other connected strands to it.
There are some places that don’t have that. And then rather than sort of getting bogged down with the correctness of it, they’ll miss out some of the processing and the analysis, and then they’ll go straight from storage and handling. There’s no one involved in acquisition, so it’s just like, “Well, can you send this to us? Can you send the footage to us?” So the footage isn’t acquired forensically and then it’s just viewed online; the processing is done online according to whatever capabilities are in that system. And then it’s just presented and then there’s no analysis of it, either. So you’ve got missing gaps in that workflow. Having an entire organization look at it from and have responsibility for digital multimedia evidence, whether it be an audio or video is going to help you. So that’s another vital first step to work towards having that overarching responsibility.
You can imagine if you didn’t have that, you think of a car in a car factory, is that when it comes out, it’s got everything working, it’s got four wheels, it’s got a steering wheel, it’s got everything that a vehicle needs. But there have been various different departments in the construction of that vehicle. And if they’re not talking together and getting everything right, the vehicle is not going to work when it gets to the end. That’s exactly the same, especially with CCTV is that you want to avoid the problems at the end by making sure that that piece of CCTV evidence starts off correctly and goes through the entire criminal justice system correctly.
So a bit of a wrap up on organizational responsibility. Overcoming video challenges is not simply down to a video unit. Managing the acquisition, the handling, and the processing of DME is an entire organization’s responsibility. And a video unit, especially you, and I do feel for you, especially some of you in the smaller video units out there in the world, you can’t investigate that media properly and correctly if the data that you’re getting is already compromised because of how it’s been acquired, how it’s been handled, how it’s been processed, and you can’t do your job if the person beforehand hasn’t done their job. And that’s why you need that sort of responsibility.
Bit of a reminder there to myself, and I think Michelle is going to help me out here. Martino recently wrote the video evidence principles and then published that and presented that to the European parliament. It is a fantastically written document. I’m not just saying that because it’s Martino, the CEO of Amped Software, if you don’t know, but all of these issues he’s wrapped up in a very, very understandable way. And the first part of the video evidence principles is all about organizational help. So it’s not about the technical side of things, which is you and me in a video lab, it’s the organizational side of things. And so the document is split into two. So I’m hoping that Michelle would have put that in the chat and you will see the link there for that document. It is really worth a read: share it, use it, send it out to your bosses, send it out to your colleagues, it’s there for you to use.
Step three in these vital first steps is controlled acquisition. It is so important and it is getting more and more important now. In the last 6-12 months, we are seeing more and more issues. It’s not especially to do with fake: has it been manipulated? Has it been changed? Has it been edited? These things are now in more people’s minds. We’ve always had issues with regarding, well, has it been edited and have these things been done to it, but the ease that it can be done and the ease that it can be sort of hidden is now getting much easier. So controlled acquisition and then the controlled integrity of that evidence is vital.
We haven’t got time today to go into all the issues surrounding acquisition. So last year, I think it was last year. It might’ve been the year before last, we did a blog series on CCTV acquisition. I think Michelle is going to help me out here as well and put a link to this in the chat. We start off all the way from the basics. What is CCTV? And we go all the way through all the issues surrounding CCTV acquisition from searching for it to just going out and collecting it and recovering CCTV to the big issues of today, which is the public submission of CCTV and video evidence.
Mrs. Smith of the corner shop has no idea about video evidence. She has no idea that her system can export into five different formats. Which one is the correct one? Which is the one that’s going to give you the right timing information? Which is the one that isn’t going to be transcoded? She doesn’t know that. And why would she? And so putting that over to the public is a massive danger and we have to deal with that and understand that. So have a look through that blog series. Again, it’s there for you to use and it’s on the Amped blog.
The benefits of controlled acquisition are quite a few. One of the big ones is the overall speed of the workflow. I spoke to someone a few months ago and they said that now a lot of their detectives are just pulling drives. And so even for one hour from one camera, the video unit just gets a hard drive and their backlog is huge. So they’re being told to go out and recover the data forensically by extracting the hard drive, but that’s then putting so much work onto someone else. And so, the detective’s job is quite quick, of pulling that hard drive, but then the video labs unit is then increased. So, by having the control all the way through and by managing that acquisition correctly and ensuring that, well, okay, I’m going to make a decision on-scene to acquire this correctly using this method and validate it, then in the video lab that’s their workload reduced. So you’re balancing out the workflow and you will speed up eventually. Yes, you do have times where you need to extract that hard drive, it’s completely normal, but there’s an imbalance often.
You’re going to reduce the risk on integrity and authentication. If there isn’t a controlled acquisition, then it is, well, has it changed since the time it was created? Is it a true and accurate representation of that which it purports to be, which is authentication? And I’m not just talking about deep fakes there, I’m talking about the times when timing isn’t authentic, the aspect ratio isn’t authentic, color might not be authentic, that is what is authentication.
And you can then correct footage to become authentic in certain circumstances. You can correct the timing if the timing is off in your acquisition. But if you can control the acquisition, you can then go back and say, “This is a fact, this is what happened.” And if you have a look at the blog post, you’ll see some of the ways that you can do that, or the blog posts.
Also, it dramatically increases your chances of restoration and improved enhancement. If you’re starting off with your best evidence, which has gone through no changes whatsoever, then your chances of being able to correctly restore and enhance that to answer a question is going to be dramatically increased. Because the moment something changes in that process, like a transcode, then you’re going to reduce the quality of that data, and you’re then going to reduce your chances of that license plate being recovered; you know, a good shot of the logo or the face or what the telephone number was on the front of the phone that was captured from the CCTV camera in the lift. Yes, we’ve done it. But any slight change, you’re going to reduce your chance. So having a controlled acquisition and starting off at the best evidence is your best chance to make all of this so much simpler. I know it’s difficult and I know we haven’t got a million people out there doing this sort of stuff, but it is something to work towards.
If you haven’t and all you’re left with is something that has been sent in by someone and you’ve got no idea what’s happened and you’ve got no idea what has gone on, what the device was, how it was acquired or anything like that, put it in your notes, put it in your report. Be open and transparent, like the sentence there says is that, “I have not received or been made aware of any documentation surrounding the video recording device, the acquisition of the received data, any processing of the data before submission as evidence, or the individualization of the data through a cryptographic algorithm.” So you’ve got to then start that by doing the hash at your moment in time, because you’ve never been given that before.
And there’s a keyword in that, and that is data. What you are getting is data and you get the .abc file, the .whatever file from such as such a CCTV system until you deal with that and you manage that correctly, you’re just dealing with a big chunk of data, and the video is going to be in there, the audio may be in there as well, timestamp especially may be in there and you may have other data as well, camera information. But it’s just data. It is digital data. Deal with it as digital data, right from the start. You can’t go wrong.
Step four: standard operating procedures. Let me just grab a quick drink. So, the vital first steps. Step four is having good SOPs: standard operating procedures. They don’t have to be massively structured, and that’s probably not a good thing either, but it’s a good idea to get some consistency in a unit. I’ve been to some places, luckily I’ve been to video labs everywhere, and there’s often quite a big difference of, “Well, I don’t know about this, or I can’t do that, because Johnny’s not in the office.” and that’s what you need to sort of try and balance out from.
And a good standard operating procedure should help with that and can bring up the team’s competency in dealing with video evidence. This is something I put together a few years ago and then Martino sort of jazzed it up a little bit for the video evidence principles. And it’s very very simple. You have to get those three things aligned. You have to be starting off with the best evidence. Remember on the previous one: controlled acquisition. So you’ve got to be starting off with that best evidence. If you’re not starting off with the best evidence, it doesn’t matter how good you are, it doesn’t matter if you’re using the best tools in the world, you’re never going to be able to get the answer that you need.
And Martino has put here, sorry, hitting my mic, the forensically sound result. So, same tools, you could have all the skills in the world, and you could have the best evidence, but if you haven’t got the right tool, you’re not going to get that answer.
So, having a good SOP to make sure that you are controlling the acquisition and the evidence, you’ve got good skills and knowledge across the board within that unit, and you’re using the right tools, that’s going to give you the most and the best opportunity. And by having those standard operating procedures, you’re having that forensic rigor, you’ve got that structure, you know that you can go all the way back and all of your evidence throughout the whole of your chain is going to stand up to scrutiny, is going to stand up to questioning and other people, especially I know a lot of you are going to be within law enforcement, from the defense sort of standpoint, I can say, yes, that’s been done correctly, that’s been done correctly, I can understand what the person’s done there, I can understand that decision. There’s nothing to question. I do a lot of case reviews and when it has, and I can see everything structured and it has got forensic rigor, there’s nothing to question. It’s perfectly fine.
The other thing is then having a good structure within that standard operating procedure is having a good structure. Now, this is a bit of a modified workflow from another procedure that’s UK-based. This is my modification because in the official one, it doesn’t have these two very important points, it’s been missed out. These are the two important points and they are massively important. I can’t sort of stress this enough. A working copy has to be a direct bit-for-bit copy of the original. A working copy can’t be another version of it, like a screen capture or a transcode or a clip or whatever you call it.
A working copy is a direct copy of the master and you can hash validate that. Very, very simple. After you’ve got your work, so this is after you’ve done your acquisition, after you’ve done that, the other box there, Point 9, Produced Generated Exhibit is a huge box. There is a massive amount that goes into that part because that’s then, okay, here are my files that I’ve recovered. These are the questions that I have. These are the tasks that I need to do. How do I get to that point? What do I need to do to that multimedia to get to the next point, to answer the question or complete the task? If it’s just a timeline of a person moving between systems and cameras, well, how do you deal with each of those exhibits? How do you deal with each of those items? So it is a huge part and unfortunately it’s missing from this document. That’s why I’ve added it in.
So let’s look at that one part of that generated exhibit. Here it is. Here’s the sort of rough workflow. And it’s broken down into four main parts: analysis, processing, presentation, and then your conclusions. Your conclusions may just be the completion of the task, because it may be as simple as “Can you tell me what the frame rate is?” And so you may go directly from analysis, all the way to completion of the task, but you’ve got to do that analysis. And a lot of the time, the analysis is missed, unfortunately.
You can break down the analysis into three parts. And for those people that have seen any of my webinars or been on any of my training, you’d have heard me say this before, but exhibit data, and then visual and oral. Exhibit is, who created that item? When did they create it? How did they create it? All the questions surrounding that item of data, and that could be a huge pot of data. It could be a hard drive, or it could be one file, and then everything in between. What is the data? And you may need to do some data extraction in order to be able to work out what that data is. Is it a set of ABC files, or is it a set of MP4 files? Are they standard MP4? What’s the codec? What’s the frame rate? What’s the resolution? It’s the data that’s going on there and you’ve got to analyze that. Are all my files the same? Do they have different frame rates? Are they variable? Are they constant? All the questions.
And then, what are you looking at? What is my visual representation? Can I see that there are some effects in the video that I need to correct? So, and sound as well, especially if there are car tires screeching away or gunshots or anything like that. And, are there any sync issues as well? Very, very common.
So, that analysis is going to help you with your integrity and your authenticity stage and answering those questions for integrity and authenticity. Then you’ve got the processing. So you’ve got then the restoration and the enhancement, and you’ve got to do that within the image generation model. Image generation model is then reversing the errors in the image. So if you can reverse the errors, you’re going to get the higher result and you’re going to get a more accurate representation. i.e. it’s going to be more authentic.
And then obviously measurement, but you can’t measure something until you’ve done the restoration. Think about aspect ratio, think about some of the more modern now HD and QHD and 84k and 8k that are actually getting recorded at half-width and then the player or the system will then expand that. Sometimes there’s some issues there. So, until you do the correct restoration, you can’t do the measurement.
Then we’re moving on to the presentation. What are you then going to create? And this is your new exhibit, so these are your generated exhibits. Think about the previous workflow. So it could be an image, could be a series of images. It could be a video, it could be audio, and it could be data, it could be a spreadsheet. All of these things as well, how am I going to present that data in order to complete my task or answer the question?
And the question may be, where did the person go? Is this person this person? Whatever the question is, it could be that, well, is that reliable? Can I rely on that clothing representation? So can I rely on that clothing representation? Can I rely on that clothing representation? So if I can rely on the clothing representation, is there anyone else in the area that is wearing similar clothing representations?
So that’s your conclusion, and then you would have the report. So this is your forensic video analysis workflow, but it’s going to be not just one exhibit, is it? It could be two or three or four or hundreds, depending on the size of the case.
But think about that standard operating procedure. And if everyone is doing the same sort of thing, then you’re going to not have the risk of people missing things. And so having a good departmental and personal workflow will help you because it does turn into muscle memory. And you do the same thing every time and you look for the same issues. And the more issues that you find, you remember those issues, and then you look for them in the next one and the next one and the next one. And there are always issues with proprietary video.
So on a basic evidence trick, you may have something like this. Just checking the time, okay, we’re a few minutes late, but we’re running all right. Basic evidence trick. You’ve got your initial exhibit, then you’ve got your working copy, and as I said, it could have four DAV files. I’m just using DAV here as a bit of an example, because I think most people sort of recognize them as being a slightly proprietary video format. But that creation of that working copy, and then perhaps then the creation of another copy that you’re then going to be working with, depending on what your internal structure is, there’s the Copy & Verify tool within Amped Five. I will briefly show the Copy & Verify tool. Again, we haven’t got time to sort of go into it too much, and we’ve dealt with that under a couple of other videos and blog posts, as well. But we will have a brief look at it.
So there’s your Copy & Verify tool, so then you’ve got your working set of files that you can then work on. Now, we’re never going to overwrite those files, but it’s a good idea to have sort of a backup of a backup. So you’ve got your master and then you’ve got perhaps a working copy sat on a server somewhere and then you’ve got your files that you’re then going to work with on your workstation.
Then comes the Convert DVR bit is, well, okay, from the DAV, what have we got? And we could then end up with a standard and the SC on the Convert DVR bit, the SC is for stream copy. So I’ve just copied the stream, and then we’re going to have a log file for that process. We’re going to have a log file for the formatting process.
So we’re going to have four new video files, we’ve got four new log files, and you’re probably going to have four new time files, as well. And you could have four new audio files all linked with those. But you’ve got a pathway back to your working copy of your DAVs. And from your working copy of your DAVs, you’ve got a pathway back to the master. And you should have then some documentation on how that master was created, where and when, by who, etc, etc.
All the way at the end of this workflow, so all the way at the end of the evidence tree, you’ve got what you’re creating, so what you are then presenting, and they’re going to be different exhibits, obviously. So, you might have then one MP4 video because you just need a video and it’s going to be half of perhaps MKV2 and half of MKV3 that you’ve then timelined together and done some stuff to it and some annotations and whatever you need to do in order to answer the question or complete the task.
And you may then have a PDF image sequence as well of whatever incident it is and it’s different from the video, so it’s going to be a different exhibit. And then you would have a report dealing with all of those.
But there’s a box in between these stages, and it’s a big box, and in that is a billion things. And we are forever learning. I am forever learning to say, well, this is the question, this is the task, this is what I’m starting with, this is pretty easy, you know, you can get all that sorted, this is what I need to do at the end. Well, how am I going to get from that bit to that bit? And you will never stop learning. And it can take a little while on that initial stages of your video investigation to just say, well, I’m just going to have a look at that. And I’m just going to have a look at that. And well, If I’m going out to a video and I need that video to be displayed in a courtroom, for instance, well, I’ve got 4k video and I may have this camera and this camera.So I’ve got a 4k video here and a 4k video here. Well, how am I going to deal with all of that? That’s a lot of data. Do I need to crop? Do I need to resize? If I resize, would I then need to do a magnify? There’s all these questions.
And those initial stages of your investigation is that you’ve got to then figure out your workflow, and that’s where your competency comes in and your training with the software, and the time. It’s not a quick and easy job. It’s not something you can just say, #Okay, I’m just going to do this.” A lot of people would then just take the video down from a 4k to full HD. But in doing so, are you then affecting the question that you’re trying to answer? Think about dark clothing at night, reflective objects, perhaps under IR. And then by reducing that, you’re limiting the evidence that you’re trying to get out. So you’ve got to think about the question and the task at hand, and then how you’re going to deal with that evidence.
Remember, authenticity: is it authentic? Is it authentic? Is it authentic that I’m showing this? And has my processing actually affected the result. So you’ve got answer that.
Okay, tight. So, over to Amped FIVE, but I just want to sort of wrap that lot up is that having that structure and having that base knowledge and having the base questions really right at the start of the investigation will help you. And so, well, okay, this is what I’m always going to do. Always going to do this, and we’re going to do this way, I’m going to structure my working folder on my system. So I’m going to structure my directory in a certain way and this is how I’m going to work. And it makes things easier because then you’re not questioning that each time; you’ve got a good workflow and it will work. Everyone has a different way of doing it, but in general, it can be a good standard structure to start off with.
Okay. Quick drink, and let’s see if we can, how are we going to do this? There we go. Escape.Here we are in FIVE and anyone that’s been on any of our webinars or watched any of our things before, this is usually where things go wrong.
All right, first thing is, I tell you what, let me just go to “Details” here so we can see that a little bit better. Yes. And first thing is, we just saw a couple of thumbnails. So, first thing is, in your system, I’ve got a standard video contained here, look, I’m seeing the extension, remember you can turn extensions on and off in Windows Explorer, so I’m seeing the extension, .avi, that one’s a .mp4, and then we saw the thumbnails, which would suggest to me that Windows and DirectShow is being able to decode that video information.
So again, first things first, we’ve got an AVI and we’ve got a big long file name here. We can see we’ve got NVR and then an IP address and then a name, and then we’ve got these numbers here: 2019, 0319, et cetera, et cetera, et cetera. Okay, I’ve got a pretty good idea that that’s going to be perhaps the time to and from. This is really important if you’re using some automated system for members of the public to send in that video evidence. Make sure that those automated systems are not changing the filename or structure in any way. This is forensic evidence that someone is submitting that hasn’t probably been acquired in a very forensic way, but let’s not get bogged down with that for the moment. But this is forensic evidence and so it shouldn’t be changed. File names are vitally important. And so make sure that if there is some system in place, make sure the file names are being retained because it can help you.
Let’s just drag that into FIVE for the moment. Dragging it into FIVE brings up the video loader. Drag & Drop in Amped FIVE is different from loading a video. You can see we’ve got “Load video”, okay? Drag & Drop does some checks and we’ll look at some other things for you, as well. So there are different ways of loading video into FIVE for different reasons. Try and use Drag & Drop or any of the automated processes more than you can because we do a few checks and we then can take over a few things. We’ll always give you a question. If we need to do something, we’ll always ask you. So always consider Drag & Drop first.
I’ve loaded the video in and we can see some information, we’ve got visual information. The first thing I’m going to do is just see if I can scrub. Yes, I can. And you know, everything is looking fine. I just got some notes on my other screen here to keep my brain in gear and I’m now going to just open this up in “Advanced file information”.
So we’ve got this information here, I’ve got the video. The first thing I’m going to do is, well, okay, what am I dealing with? We’ve got, If I just close my assistant for the moment and then go to “View” and “Tools”, we’ve got basic file info here. So I can see that we’ve got some basic file information here: it’s h264, 25 FPS and the amount of frames. But remember, you’ve got the advanced file information here. First thing I’m going to do before we go into this is just start my frame analysis. Because that’s going to take a few minutes and then we’ll come back to it.
All right, so we’ve got visual information, I can see the scene, I can see when we’re talking about authenticity, is it where I think it is? Think about Google Maps, as well. I use Google Maps a lot. So yes, that matches, especially if you can’t get to it. And is it night? Is there any timestamp? Is there a pixel-embedded timestamp? And when we’ve loaded it, have we extracted any data timestamp? And then if we’ve got a pixel-embedded timestamp and we’ve got a data-embedded timestamp, and if we’ve got the timestamp in the file name, is everything matching? Is everything saying, “Yes, okay. It’s the same time. That’s the same date and time. That’s the same date and time.” Or are there differences?
If we have a look in “Mediainfo”, where are we going? Yes, Mediainfo. You’ll see that we’ve got our Mac times, we’ve got file creation, modification, or let me just put that there for the moment. Now, some of you are probably thinking “Oh my goodness, what’s this “Missing Coded Pictures” and when did that come up?” I am using a new version of Amped FIVE that is coming out very, very, very soon. So, I’m going to be giving you a bit of a sneak preview to some of the things that are coming out, including this bit of analysis here, which is really handy. So there we go. You’re privileged to see some of this. But going back to Mac times. Look, we can see here that we’ve got file-last-modified time. And sometimes you get encoded time, as well. And if a player is using this as the start time and then the duration of the video in order to create a timestamp, this is saying that it was on the 2nd of April, 2019, 8:26 in the morning.
But if you have a look, and you might not be able to see, but if you have a look at my file name, it’s the 19th of March, 2019, so a few days before now. So if you’re using a system or if you’re using a player or another video tool that is using this to create a timestamp, it’s probably going to be wrong. So just be careful of that. Again, it’s the power of file names.
Yeah, one thing that we find a lot is network, when something gets written that first file that gets written on the PC is because the PC client CCTV system has said, “Okay, I want this bit of data” and then it brings the video data over using a sort of a network transport protocol, and then it gets containerized on the computer. And so that’s when the file thinks it’s been encoded and written. Well, it was on the DVR a few days before. So always be careful of those dates and times and how it’s been acquired. And if there’s any, I’m sure there’s going to be some collision guys out there, but that’s another thing that we’re seeing about a 50-50 change in that 50% of the time, the original timing data is retained when that network access has been done and 50% of the time it’s original. So it’s keeping the original timing structure. I know, it’s a minefield, isn’t it? Remember that emoji? Head blown.
So here we’ve now done our frame analysis, and we’ve got some issues. We can see that we’ve got pts, not applicable, not applicable, not applicable. What’s all this going on here? AVI doesn’t allow for that data. The timing information is in the container, it’s not in the stream, it’s in the container and AVI doesn’t allow for that. So if you’ve got one of these raw streams that are in an AVI, this is why you’re going to see “not applicable” here. Because it’s not in there, it’s not in the container. So it hasn’t been able to be read from the container.
Another thing to point out, and it happens a lot in proprietary video systems, especially over a network, is you do get missing coded pictures. Something has happened in the creation of this video file. And look at this, we’ve now got 51 missing coded pictures. I’ll now do a check for you and tell you, and then you can copy this data out if it’s going to be relevant to you for you to make a decision.
So, always have a look at that data analysis because it could affect your timing and it could affect, do we need to go back out and try and do another acquisition if it hasn’t been acquired correctly, et cetera, et cetera. Excuse me, hit my mic again.
All right. Let’s just drag in another one. I just want to show you. So, you’ve got your video and you think, I wonder if it’s been transcoded? I want to show you something that I use a lot, and we can automate this now. In Amped Authenticate it has a filter in the video module, it has a filter for looking at this automatically. So if you’ve got a really big video and it’s very, very slight, having it done automatically in Amped Authenticate makes it so much easier. You may not be able to see, I’m hoping that you can see the sort of like, I call this like a parquet flooring effect here. And you can see it here and I’m just going to hit “Play”. Oh, what’s going on here? There we are. There we go. So I’m just hitting “Play”, you can see the little black specklies that are going on. Alright.
What I’m watching here is a transcode of this. So let’s say for instance everything looked as if this was original. Remember in our analysis stage, we looked at the exhibit, we then looked at the data and then we looked at visual, this is one of the things that I look at from a visual perspective. I’m looking at these little bits here and I can see, and you may be able to see it, there’s a bit of a sort of a flash, and it goes blurry, and then refreshes, and then blurry, and then refreshes. Oh, there’s a cyclist, but we’ll ignore them. And then there it goes. Well, that’s your GOP structure. And you can see it change to an I-frame.
Do you see the change at the same time as our frame changes to an I-frame? Does it match? If it doesn’t match, that’s your biggest clue that it’s been, so I’m just going to go to change. There we go. So there we go. So that’s a change. And that is our I-frame. But it’s shown as a P-frame and this is because this is a transcode, it’s been transcoded, it’s been changed, it’s not the original. And so your chances of getting a license plate, your chances of then finding other very, very small pixel-level detail will be reduced because of that transcode. And because of that transcode, have you lost timing information, et cetera, et cetera, et cetera. So your three stages.
All right, let’s move on. I knew this was going to be tight. Again, people know me and I can waffle for hours. Where are we going? Next one. Okay, just dragged it into Amped FIVE again. Remember that sort of stage. Oh, hang on. We need analysis here. What’s going on? If you want to, you don’t have to go directly to Convert DVR. You can have a look at the file first. And look, you can see that we’re just analyzing the file as it is and I can see some information.
I can see that this is the size of the video, I can see that this is the codec, I can’t do any frame analysis because it can’t read how many frames are in there because it’s in a proprietary container, but I can read some of the information that’s going on inside. If I go to ffprobe, I can see that we’ve got a lot of unknowns, not a lot of detail.
And one of the things, and it came up in a question, I wonder if it’s going to be in this. Yes it is. It’s not as bad. It came up in a question in support I think it was earlier on in the week, and this probe score, you can see you’ve got a probe score of 51. If that was 100, then everything is as a standard: we’ve got a standard container, we’ve got standard codecs, and everything is structured according to the world of standards. No problems at all. Yes, probe score 100%. 51 is in the middle. And if you get something like a raw h264 stream, which has got a few little sort of modifications in it, you’ll probably get a probe score of something like 1. Because it’s thinking, “Well, I think it’s h264, but I’m not quite sure.”
That’s what probe score meant, and it only came up in my memory because as I said it was a support question earlier on in the week. But we’ve got some information here, and I think, okay, I know the sort of things that I’m looking for. I’m now going to go to Convert DVR then, and I’m going to open it up. I’m going to put it into a MKV container and then I’m going to press “Ok”. And I’m going to close that for the moment.
So, we’ve done a process in order to get that in, and then this is what’s now in my file, or with my file. We’ve got the clean video string. So we’ve cleaned the video out and that is just the raw video string. I want to highlight the file size. The original .dv4 was 79,450 kilobytes and then the clean is 64,739 kilobytes. There’s a bit of a difference there. Have that in mind, consider these things, what’s going on. Then we’ve got the .time, so that’s all our timing information. We can deal with that, as well. You can see it’s loaded automatically for us as a timestamp. Then we’ve got our converted MKV, which then our clean video stream placed into a standard container. And then we’ve got a log of that process.
If you contact us through support and there are issues for whatever reasons, we may ask you to send us the original file and the log of what’s gone on, just so we can have a look and just make sure. And you will see from the log that it says well, success, yes, that’s a good start, but you can see what it’s taken in and we’re using ffmpeg, no problems here, but we’re using ffmpeg to place the clean into a container, so when we chose the MKV container.
We’re not going directly from dv4 to MKV, we’re doing a process in between, and that is the cleaning, and that is again so important and I’m going to show you why in a moment.
If I then just close that. Let’s drag in the dv4 again. And this time I’m going to “Attempt direct loading”. And you’re probably thinking, “Oh, this is new. I haven’t seen that before.” It’s another new thing. Yes. And this is so you can then say, don’t worry about any cleaning, don’t worry about anything else, if you direct-loaded it, what would happen? So if you direct-loading it with FFMS, then what will happen?
And as you can see, I’m now reading the .dv4 with FFMS. I haven’t got the timestamp because we haven’t done that important cleaning process, but we’ve got more frames. The original had, where are we? Down the bottom here. 11,379 frames. The dv4, 11,394 frames. Well, those few extra frames don’t don’t account for all that data. Remember the data size? So, what was going on there? Let’s deal with the data size first. There are always reasons and it might take you a little bit of time to find out the reasons but there are always reasons to stuff.
Here’s my clean file. Here’s my direct-loaded file. You can analyse any of those, and if I went to my clean file up here and I went to the original file, I can analyse the original file as well. So even if I didn’t have to load it, I can analyse it there as well. So, here’s my clean in my MKV. I can right-click that and I can go “Advanced File Info”. The buttons are everywhere for the reason so you can do whatever you need to do whenever you need to do it. So that’s the reason why you see all of these things and you see Convert DVR boxes everywhere and Advanced File Information boxes everywhere because you’re always thinking and there’s always questioning, well, what’s going on there?
So, here’s the MKV, and I can see all my hex data here. And let’s do it from here, so let’s do it from the original file, so Advanced File Info. So now we’re going to open up the dv4 and then I’m going to go “Hex View”. Okay. So this is the dv4 file. I’m just going to scroll down a bit and I can usually spot the big bits. Let me just make it a bit bigger. My eyes are going to go in a minute, I can see. Where is it? Let’s scroll down a bit more. Oh, there we go. Oh, wait, let’s go down a bit more there. There we go.
Look, see all this data here? It’s throughout the whole file. What that is, I’d need to do some more digging. I don’t know. Have I got a dv4 player? Is that a sort of a holding space for audio? Did the system have capability for audio? Remember that controlled acquisition? Has anyone taken a picture of the back of the DVR?
Has it got a mic input in? Has it got any capability in the system in the GUI for recording audio? But throughout this entire file, I’ve got these massive areas of this, and I’ve also got a data timestamp in there, remember? Our data timestamp has been extracted. So, that’s obviously affecting the data, but how is all this affecting the video footage, as well? And none of that is in the MKV, by the way. It’s all clean.
But what about these extra frames? Let’s just close that. So let’s go to our directly-loaded video. How are we doing on time? Not too bad. And I’m just going to scrub to the end because I know there’s a good time to see it. Oh no, I’ve missed it. Oh, there he is. All right. There we go. All right. Look at this. And remember the advanced file info for missing coded pictures. Well, if we do an advanced file info on our clean file now, on our MKV, it’s going to have some missing coded pictures because it hasn’t brought across the damaged frames.
Those damaged frames could be important to you for timing information. If you’re dealing with timing of a vehicle down on this street, you may need those frames. But everything is there, and whether it’s missing coded pictures, or what the differences are, or what you’ve dealt with for whatever reason, all the information is there for you to use.
So, our MKV with our timestamp, that is clean. There is no damage. Everything is there, and in the Advanced File Info, there’ll be a list of what are the missing coded pictures, and it’s because of that damage. And that damage is probably caused somewhere on how that data has been structured with that information.Just as a bit of a point for you, in the player for this format, you don’t see these frames. The player cannot present those frames because of the damage. So you see exactly the same as our plane. But if you play the dv4, you actually can see the damage that is there. So there are pros and cons, you see. And that is the reason why is that, well, okay, I’m going to do the clean, I’m going to get the timestamp, and I’m also going to look at this ,as well. Because you never know.
All right, anything else in that one? This is just briefly tell you, yes, look, if you haven’t seen that before, that’s interleaving, so you’ve got the top field at the top and then the bottom field at the bottom. And so what you’ve got to do is you’ve got to interlace them back together again and then deinterlace and then deal with the timestamp because you’re dealing with a data timestamp as well and then deal with the timing on the frame rate, as well. And it’s all possible within FIVE, but we could probably spend about two hours just on this file alone.
All right, next one. Let’s just close this. Have a drink. All right. So, we’ve dealt with a pretty standard file, we dealt with a couple of standard files: the avi and the mp4. We’ve dealt with a single proprietary file. Now we’ve got an executable. So this could be a file that has lots of cameras in it. And it could be an executable as well that has the player in there and all the video streams as well. And these executables, some of them are pretty good. This is not too bad. Some of you may recognize this as being a lot like this sort of IDIS clip player. it’s branded as many, many different things with all sorts of different capabilities as well. Some do some very weird things, some not too weird.
There’s always some weirdness going on, but anyway, you’re limited by what the player is then giving you. So you want to override that and we want to get to the video that’s inside this executable. So we can just drop that in and again, it’s going to come up, “Do you want to deal with the data that’s inside it?” Yes. Going to choose MKV again, and then I’ll quickly say why I’m going to use MKV, excuse me, after I had a drink.
MKV in practice is probably the most common in flexibility. MP4 has some restrictions; AVI has many restrictions, I don’t tend to use AVI unless absolutely necessary. MKV seems to be the most flexible. Sometimes MKV doesn’t work and it needs to go into MP4. You really have to be flexible. You’ve got to have that questioning approach, not that, “Oh, it just doesn’t work so I’m going to screen capture.” It’s a, well, okay, why can’t I do that? What’s going on here? What is restricting that? And learning from it.
So you can see now that we’ve loaded our four video streams. And these are the four video streams that were in the player. Now we can deal with that. So let’s have a look and see what is now in our folder. And we’ve got the original executable and we’ve got our clean file, remember, our video stream and we’ve got our .time and then we’ve got our working MKV and then we’ve got our log file. And you’ve got these for all the streams.
Now, this is probably a good time to explain that we’ve written these into the same folder that that was. We haven’t overwritten that, but we’ve written them into that directory. Remember what I was saying about with Copy & Verify and perhaps having a backup of a backup? And it’s a good reminder for Copy & Verify. Copy & Verify here. Copy & Verify allows you to select a source, So where’s your master? So that could be a USB drive, could be a network drive, could be wherever, external hard drive or another drive. That’s your source. Then you’ve got, well, where do you want it to go to? And then where do you want to save the log of that process? And then you want to copy the source to the destination and then compare it. And then you’re going to get a log file as well and that will be where your log of that process goes, with all your hashes, and you can select your hashes here.
And then what type of process you want to run So you can either have, well, I want that directory and all the sub-directories and all the files in it as well, or don’t include any of the sub directories, or just have a single file. So if you’re just hashing a single file and copying a single file, you can choose that there as well.
But a lot of the time, it may be two processes because someone may bring in a hard drive and say, “Okay, I need to take that hard drive back with me to my department. Can you copy all the data over?” Yes. Do your initial copy and then do another one to perhaps your NAS or your internal server, because then you’ve then got a backup of the originals that you’ve got, and you’ve got a process of that. So if anything goes wrong with your system, you know how dodging some of these computers can be and you get a hard drive crash or some corruption or whatever. is that within your department, you’ve still got a forensically sound working copy that you can then go back to.
And so that’s your Copy & Verify process. Then we were talking about your folder structure yourself. When you’re dealing with an investigation right from the start, have an idea of what your folder structure is going to be when you go out. I have my reports, which will be a PDF and that’s my analysis report, and then I have another folder linked to that report, which is linked documents, and then I have everything in my linked documents. That is everything is in that folder: all the files I started with, all the files that I’ve created, all my log files, PDFs, image sequence, everything is in there. And so everything stays the same because, remember, when you start using hyperlinks and you’ve got links in your fire projects as well, it’s all relative file paths. So as long as you keep your working directory the same, you’re never going to lose any of those links. If you start moving things around afterwards, it’s just going to be a headache for you. So just try and make it simple for yourself.
All right, so we’ve got our executable here, we can see all our files. What was I going to show in this? We were going to look at sort of a public release of an image very, very quickly. Let’s just drag this down. So our bad guy comes in and, let’s say, we wanted this just for clothing. We can’t really see his face because it looks like a bit of an alien, right? But I think he’s wearing ski goggles or something. Anyway, and if you want to just deal with that, very, very simple, and I could probably keep this here and I don’t really have to deal with much of this, but I’m just going to drag my video loader down. I think, what are we on? Stream seven. Yeah, so I’m just going to drag my video loaded down just down here I’m going to say “Yes”, I’m going to just rename that and put that “Wanted”, oh, can’t spell. And I already saw what frame I was on. There we go. And then one of the things that I think, it takes a few seconds of, I know everything’s been dealt with correctly from the start, I’ve got my full acquisition, I’ve dealt with all my files, I’ve extracted my image here, and now you can do something to it.
Very, very quickly. Edit, where are we going? Can’t see. Rotate. There we go. it wasn’t badly rotated on the original, but a lot of the time they are, and I see them in the press all the time. It doesn’t take very long to just hit the rotate button and just get the person straight. Otherwise you end up trying to do this with your phone or whatever. It’s crazy. So, a bit of a rotation, a bit of a crop, and let’s just make sure that’s nice.
And if you notice, I’m still dealing with the whole of the video here, so I haven’t selected just that single frame yet. And then just deal with that. A lot of the time you see stuff that’s really washed out, you can put in the levels and do perhaps some colour work, and it takes less than a minute sometimes to come up with a much better image for your sort of wanted. And then you could put some text over the top for whatever reason. But we haven’t really got time to go into that because we’re running out of time. So I’m going to leave that, but super quick and easy.
and then I could stick that in a folder and I could say, “Okay, well, that was my wanted image.” Done. Finished with. Then, when they get identified, and then you’re going to going out and preparing all the video evidence for after his arrest, etc, etc., then you go on to something else.
So hit “No”. Where are we? Always goes super fast. I’ve got another two examples to try and get through. It’s never going to happen, is it? All right. Let’s see. Nearly there. Quick time for a drink.
Okay, here we go. So, here now we’ve got our full folder structure and obviously there would be another folder here, which would be the wanted bit. Oh, that was the bit that I started off at and then you can then carry on working. And we can see this is from the gas station. So there could be other robberies here. I could have gas station and something else and something else. I could have four or five incidents here because then you can start linking those together and say, “Well, this was incident 1 on this date. This was incident 2 on this date.” Do a timeline, etc, etc.
If you’ve got a big timeline and you’ve got quite a lot of incidents in various places, I tend to use one Amped FIVE project for one location and then another Amped FIVE project for another location. So I’ll do my analysis and all my restoration and enhancement measurement, whatever needs to be done in that Amped FIVE project, then I’ll write out videos from that. And remember, in FIVE, you’ve got the FFV1 codec. So it goes into an MKV container and it is lossless, it’s 100% lossless, but it takes about a tenth of the file size than uncompressed would. And it’s scrubbed super fast, renders super fast as well. So it’s a brilliant format and container for then saying, “Well, okay, that’s going to be my interim file from this location. That’s my interim file from this location.”
You can then have an Amped FIVE project for all of your interim files to create your final timeline. So rather than having a massive history of exhibits from all over the place, I tend to tidy mine up and say, “Well, okay, this is this location project, this location project, this location project.” And I’ll usually run that in something like a spreadsheet with some notes as well and stuff like that. So if I’ve got a big incident on the go, that’s how I’d do it.
And you can see here that we’ve got our originals here and you can see that we’ve got file information and the hash code of our cameras. And then under the restoration enhancement, you can see we’ve got aspect ratio, then an undistort, then a resize, because I’m thinking about how I’m going to display this information. So what do I need to resize it to? Automatic colour equalisation. Then we’ve got the timestamp, but look, I haven’t rendered it on the screen. You see, I’ve gone “Disable rendering”. So I haven’t displayed the timestamp there, but I’ve got it as a data source to use.
Then I can choose my range and then all the way at the end, you can see now we’ve got the timestamp over the top, and then we’ve got our file data here, so that’s just for that camera, and then you can just copy them down. You don’t have to do each one the same, don’t forget these are all the same cameras. Again, controlled acquisition, you’ve been out to the scene, you know that it’s all the same cameras, all the same settings on the DVR, et cetera, et cetera. So it speeds up. This takes no time at all because you just Copy & Paste down. No worries.
Then you can join all of them together. Where’s our multi-view exhibit? Let’s just close that. DS1, there we go. And you end up with something like that. And you’ll see as he comes in, how are we doing on time now? Not too bad. You’ll see as he comes in, all of these, because of the work that you’ve done in sorting out each one and restoring and enhancing each one. Where is he? There he is. each one is now accurate. So let’s just leave that there. Okay.
And even though, that’s frame number 1452 from that file, this is frame 730 from that file. But look, you see it’s the same moment in time. These frame numbers are really important because if that is being questioned, then everyone is talking about the same frame from the same file.
I get some compilations sometimes and you’ve got no idea how that compilation has been made. Nothing’s frame-referenced, nothing’s referenced back to the file. I know that that’s the file that has been used, that’s the frame number, and I’ve got a document that explains where that file came from and the integrity of that file.
So, there we go, and you can see the timing, so we’ve got what, 533 on that one, and then, oh that’s 533 as well. 566, there we go, so we can see now we’ve corrected all the timing and it’s all in-sync. Remember, it can’t take the same image. Some of these systems, they can’t take the same image at the same moment in time and then record. So there’s very often you’re going to get that millisecond difference.
So it does take a bit of work and this is just on the same system. Don’t be worried about having real difficulties and having to explain that some things aren’t frame-accurate. Because if you’re using different systems with different time bases, different frame rates, different recording capabilities, getting them all synced up is nigh-on impossible. So you have just got to explain perhaps that you may have a one-, one-and-a-half-second error rate in your syncing, but you have then done a visual sync at certain points, and just explain it. It’s perfectly natural because it’s really hard to do. Okay. So, wanted person all the way through to court and presentation.
Let me just, new project, do you want to start a new project? No. Let’s see how we’re doing. Let’s see if I can do these. Writing. Oh yeah, that’s a bit of a shame. Now, I tell you what, I’m just going to mention this. The test card, which is here I’m hoping that Michelle will put the link to the test card on. It’s really worth it. This is one of the Amped test cards. I am in the middle of building another one. These colours and the greyscale ranges, you can validate yourself because each of these individual components have been created. So, If I go back to here, you can see as I’m hovering over here, I can see that we’ve got a 255-0-0, so my pixel is saying those colour ranges, that RGB value, sorry. And you can see, yes, 255-0-0. So 0-0-255, yes, 0-0-255, I’m just reading over here.
My greyscale ranges, I can see, yes, we’ve got 38 luminance, yes, 115 luminance, yes, 217 luminance, et cetera, et cetera. So, this is all in an Amped FIVE project, so then you can then write the file and you can then write the file like a MKV, FFB1, completely lossless. And you can confirm and validate that it is completely lossless. And then you can have a look at some of the transcoding options. MP4, h264, and the difference in the values. And then you can see those values change. If I just quickly load up another one into this, let’s go test card MP4 rather than the uncompressed MKV. And you’ll see now, and this is what I’ve used, and I used the default MP4 h264 values, you can see now that yes, okay, we’ve got good luminance, so our grey scales are all fine. If I go over red here we’ve got 254-0-0. Well, that was set for 255. So we’ve lost a tiny little bit of red: one value out of 255.
Down the bottom here, 180-16-16, 180-15-15. So we’ve got a little bit of a change here. So you can see what the differences are, and then remember video mixer. Whenever you’re doing testing to say, “Well, okay, I’m always going to go out to this”, remember video mixer and your similarity metrics and have a look at that.
Video mixer, remember, “Link” and then “Video mixer” and then have a look at your similarity metrics. So then you can then identify how different certain things are, like the original to what you’re presenting in a court. Yes, there are going to be some differences, but is there a dramatic difference, or is it just a very slight difference? If you need that data, it’s there for you.
How are we doing? We’re nearly there. Frame timing. All right. I’m going to quickly show you this and I’m going to do it in hopefully less than five minutes. New project, no. Here we go. I want to show you, oh, yeah, don’t worry. That’s because it’s got audio, but I’m in a virtual here which hasn’t got an audio. So, I’m just going to turn the audio off so I don’t see that all the time. There we go.
Now then, when we’re dealing with frame timing and frame analysis, remember, you’ve got to sort of compare what data you’ve got. I’m just going to go for PTS playback down here. Remember there’s several different ways you can play back video. There’s what, I mean, six to seven different possible ways that durations and frame rates can be calculated. You’ve got to verify which one is the most reliable. You may not be completely 100% accurate, but which one are you going to use and which one is the most reliable? And make a decision on that.
If I go to Advanced File Info now, you’ve got lots of different ways that the data is being presented because the different tools have different capabilities on reading that data. Think about Mediainfo and ffprobe, they’re using the same libraries at the back end, but they’re reading those libraries differently. So you can end up with slight differences, or you can end up with pieces of information in one, but not in the other. Sometimes ffprobe may be able to read that there’s an audio stream, but Mediainfo won’t detect the audio stream. Things like that.
Timing information. What is the frame rate? It’s variable. I’m just going to go to the frame analysis, press “Yes”. Obviously this is an MP4 now, so we have got PTS data within the container and it’s just sort of churning through.
We do a number of calculations now, so it takes a few minutes to churn that information through because then it’s retaining that information at the back end. Remember the coded pictures, as well? But, what I want to show you is this.
First of all, I’m going to go to Mediainfo. You’ve got to sometimes see and report on differences. Now, I’m going to show you something that I haven’t got an explanation for yet, because I only found it a couple of hours ago, and I’ve been using these files for ages, but the more you look, the more you find. And format settings, GOP, Mediainfo is saying 30. There are 30 frames between each reference frame, 30-frame GOP.
If I go to GOP analysis now, M1, n=60. It’s a 60-frame GOP. What is it? Everything in this file is saying that it’s a 60-frame GOP. I haven’t got time to now go into the macro block analysis, difference analysis, and even the hex data for the h264, everything is saying that it’s a 60-frame drop. Why is Mediainfo saying it’s a 30-frame drop? I have not got the answer, but I am going to try and find out. I’ve got some clues, I’ve got some hunches, but anyway, these are the things, and these are the things that you find. And it’s like, right, okay, is that an issue? Do I just need to document it for my investigation or does that affect my investigation? Does that infect the integrity? Has it been transcoded? Has there been something going on in this? So as it’s brought the data over, perhaps it was originally a 30-frame GOP, and as it’s brought the data over the network, it’s turned it into a 60-frame GOP. I don’t know, and so we’ve got to do a little bit more digging.
Now that we’ve done the frame analysis, I can just turn off the audio and the other frames. And you can see that we’ve now got, and it’s going to be coming out tomorrow, we’ve now got some calculations going on, PTS duration computed. So what it’s doing is it’s saying, “Well, what’s the difference between that one and that one.” And there’s my frame duration. Remember, you can display all of that over the top of your screen. One of the quick ways of doing it.
Oh, one last thing, two minutes, is don’t ever get tricked into thinking that packet duration is the same thing as PTS duration. It’s not, even though it’s the same in this one, sometimes your packet duration will be based on your time base, and if you’ve got a constant time base, you can have the same value there, but you’ll have differences in your PTSs. So just be aware of that. So, if you’re doing any sort of speed work or motion body mechanics, use of force, things like that, always be careful on exactly what your frame durations are.
Remember we were talking about presenting all of this on the screen? One of the things I have, I’m just going to close that and then go to “Assistant”. This is only in my virtual, so I’ve just done this just to sort of show you. I have a quick-text assistant and you can see I’ve just got this quick text here and I can just grab this and you can create these depending on what you want. So if you are a collision investigator, you can say, “Well, okay, I want to have this information.” You can use this in Annotate, as well. Obviously, in Annotate, you can have it frame-accurate. So you can say, “Well, okay, I want a countdown between this frame and this frame and have other things going on as well.” The world is your oyster, to be honest.
I’m just going to copy that and then do an add text. So this is one of the things that I would do straight away if I’m looking at a file. I’m just going to then do an Advanced File Info and then whilst that’s churning perhaps I’ll do this. And I’ll go, “Presentation”, “Add text”, just going to ctrl+v and then I’ll stick that left top and then we’ll do a back colour text. And now, I’ll tell you what, let me just do a range so you can see, let’s have alt+i for my input and then when the vehicle goes out, we’ll have that as out, then my add text, and now you can see my sequence number there. That’s my original frame number. That’s my file.
The sequence time. So I’m four minutes into the sequence time, not the original time. Ah, CCTV time, I didn’t put the CCTV time in. You can use add timestamp or you can load the timestamp time, but if you haven’t got any time within the data, then you can add it in if you knew that from another source, but you can then put all the details on that.
And the macros are, there are just loads now, and we’re always adding macros to give you all the information that is available, all the data is available, you can present that on the screen and you can use that data in any way. So you could use countdowns, chronometers now, and all of that information.
And it all goes into saying the final point of this, it’s all for the purpose of you being able to explain that you understand the video. I understand the video, I analysed the video, this is the data that I found in relation to the questions and tasks that I was being asked, this is the information, this information is all reliable. If you’ve had to interpret the information, how you’ve done that, then you’ve got your presentation, your PDF image sequence, your video, your data, your report. Done. And it proves straight away, the moment someone can read that, it says, “This person knows what they’re doing. This person has competency.”
And so those vital first steps are ensuring that you can prove your competence in understanding the video and understanding its limitations. And especially with CCTV, there are all sorts of limitations. You look at the GOP structure there on that one, why is it showing twice? There’s something’s gone on somewhere and it might just be a fault in that system in how it’s presenting that data to Mediainfo, but we don’t know. Thank you. Bye bye.
Jordan: Hello everyone, Jordan Portfleet here with the Oxygen Forensic training team. Today’s topic is going to be KeyDiver. Now, what is KeyDiver, you ask? Well, the time has come to introduce our new brute force module, the Oxygen Forensic KeyDiver. This is available to you at no additional cost inside of Oxygen Forensic Detective.
So, this module is actually designed to find passwords to decrypt partitions with BitLocker protection, partitions protected with FileVault 2, encrypted zip files, encrypted Apple Notes, passcode-locked Telegram Desktop app, and with Oxygen Forensic KeyDiver, you can create hash cracking attacks in a format supported by the Hashcat utility using various dictionaries and masks. You can run sequential attacks until the password is successfully guessed. You can also export the obtained password to decrypt the original data or use it for other purposes.
To access Oxygen Forensic KeyDiver, we can simply go to the homepage and Oxygen Forensics Detective, and it’s going to be listed under “Tools”. Now that I’m inside of the attack manager, I can see there are no entries assigned to the selected filter. I can also select the tabs at the top and look at “All”, “In progress”, “In queue”, “Paused”, “Success”, “Error”, and “Fail”. To create a new attack, all I need to do is click the “Create a new attack” button at the top.
Now we have a message here, “To create a new attack, upload an attack settings file (.json) or select a supported object type from the list.” So if I wanted to load, I could go ahead and just grab my .json file, and now I want to take a look at my list.
Different attack types here; type number one is going to be Encrypted ZIP file. “An archive in .zip format containing compressed files. When creating an archive, the user can set a password. This password can be bruteforced by extracting the encryption information from the archive.”
Next we have 7-zip encryption. “When creating an archive, the user can set a password. This password can be bruteforced by extracting the encryption information from the archive.”
Next, I have iTunes backup. “To protect the iTunes backup, the user can use a password and encryption. This password can be matched by extracting the encryption information of the backup. You can load the manifest.plist from your iTunes backup to automatically calculate the hash or paste the hash manually.”
Next we have Apple Notes Encryption. “Apple secret notes Brute Force. Secret notes are protected with end-to-end encryption using a user-generated password that makes it impossible to view the notes on iOS devices, iPadOS, macOS, and the iCloud website. Each iCloud account can have a different password.”
FileVault 2 volume encryption. “A partition of a disk or an entire disk can be protected using FileVault encryption. Data decryption requires a password set by the user when encryption was enabled. This password can be guessed using the encryption information extracted from the disk.”
Next we have BitLocker volume encryption. “A partition of a drive or an entire drive can be protected using BitLocker encryption. Data decryption requires a password set by the user when the encryption was enabled. If the disk has not been additionally protected with TPM, it is possible to guess the password using the encryption information extracted from the disk.”
Last but not least, we have Telegram Desktop encryption. “Telegram Passcode Lock brute force. This Passcode lock is local and used to protect a specific device, not an account. If a user uses a Telegram account on different devices, he/she needs to set up an individual code password on each of them.”
Now that we’ve seen some of our attack options, let’s take a closer look at creating a new attack. Like all Oxygen Forensics products, we do supply or provide a user-friendly interface to export the file with all the necessary information already stored in it. Once you have that file, you can transfer it to another PC, you can transfer it to another user, or you can run another password-cracking attack outside the PC where the disk image is stored and processed. All you have to do is click the “Import” button in the “Create new attack window” to upload the .json file in a key driver. Then you can proceed to specify the method of attack for the imported hash.
So, now we know all we have to do is either grab our .json file, or we can select from a list. One thing I did want to point out though, is if we do know the hash, what we have to do is specify the hash in hashcat format. So, you can see an example of exactly what that looks like in the gray bar.
Now, let’s take a look at how we can use Oxygen Forensic KeyScout to leverage KeyDiver. What I did was I went ahead and set up BitLocker encryption on my USB Drive (D:) here. Now what I’m going to do is I’m going to go to Oxygen
Forensic KeyScout and I want to acquire the external drive. Now I have my KeyScout open, I’m going to go to “New search” and I’m going to select “Drive”. Now all I have to do is point to the drive that I want to select and I’m going to select my USB flash disk.
Now I want to make sure I’m choosing the right partition here. I can see I have a BitLocker container and I need to enter the password. Now when I just double-click on that particular partition, I have this Decryption BitLocker. I have a couple different options here. If I know the password, I can put the password in. I can also start an attack or export my attack. Now we are in our attack window.
There are a couple of other things that I can do here before we get into the specific type of methods. I can grab a .json file and the hash for that encrypted BitLocker. If I just close this out, again, I could choose to decrypt if I wanted to. I’m going to select “Export attack”. I’m just going to go ahead and create a new folder. I’ll just call this folder “Bit Locker Webinar”. And that is where I’m going to be saving my .json file. You can see right there that .json.
So, I could either grab that .json file now and pull it into KeyDiver just by hitting “Load”. I could go in and grab the .json. Or I can grab the hash and because this is going to be an encrypted BitLocker what I’m going to do is select from “List”, go to my “BitLocker” option, And I’ll grab the hash.
So now what I’m going to do is just select start attack. And before we get into “Mask”, “Dictionary” and “Extraction-based dictionary”, we are going to take a look at how to grab the .json file and how to grab the hash. So, I’m going to open up my Oxygen Forensic KeyDiver, I could select “Create a new attack”. So I could load that .json that we just saved to our new folder, I could load that, or I could select from “List”, and I want to go into my .json file now, and I want to grab the hash that’s in there. So I’m just going to open up my .json file with Notepad, and now I can see I have my hash at the very top of my screen. I’m just going to copy that and paste it into the little gray bar, where we have to list our hash in the hashcap format. Get rid of my windows here. And now I can just right-click and paste and select “Next”.
Now I’m back in my BitLocker attack window. Now that we’re in our attack window, we have a couple of different attack methods that we can choose from. Method number one is a mask attack. So this is going to try to guess the target password by generating all possible character combinations of a specified length that matches a predefined pattern.
Method number two is going to be a dictionary attack. This uses a dictionary file that contains a pre-selected list of passwords, arranged one per line. Each password from the file is tested sequentially.
If a user is going to select the mask attack method, they have two different options. So they can set attack parameters or set a custom mask rule. If going with option one, the user should specify the number of characters and choose the language and character set that should be used. The password length is going to be anywhere between 4 and 60 characters, and if the same number is assigned to both from and to parameters, only passwords of the specified length will be generated.
Here we can see some of our parameters for mask attack, such as the character count from and to. We can specify the language we want, select character sets, so I could choose a symbol character set, I could choose custom variant, I could choose between lowercase symbols, uppercase, as well as numbers.
If a user wants to select option two for the mask attack, it’s going to be the custom mask rule. They can create their own rules for just like it says, a custom mask attack. In order to do that though, they must set a mask in the hashcat-supported format.
If the user chooses to run a dictionary attack instead of a mask attack, they can check their password candidates without any modifications using a dictionary file, or can apply additional password changes. To do this, all you need to do is hover over the dropdown in the dictionary field. The selected dictionary will be marked with a tick and will appear in the field to the right. The user adds their custom dictionary. After selecting the dictionary, the user can set these additional parameters. So we have “Characters case”, “Prefix”, “Suffix”, “Characters order” and “Characters skip”.
Now we can take a look at these system settings. This is where a user can set a temperature threshold to stop an attack. They can also choose one or more video cards. There are four different efficiency modes: “Low”, “Average”, “High”, and “Maximum” that the user can choose from.
Like many of the Oxygen Forensic tools, the system settings are going to be based on preference. So if you choose, say, maximum efficiency, that’s going to provide you with the best hash rate and maximize your hardware utilization, but it can also lead to higher temperatures and high power consumption. So KeyDiver does monitor the GPU temperature, and if the temperature threshold is higher than what you had set, the attack will stop, or KeyDiver will stop the attack. And this is just to prevent the GPU from becoming damaged.
So now we want to start our attack. As soon as you have selected your attack settings and at least one GPU, all you need to do now is select “Start”. You can only run one attack at a time and the attack will be “in progress” status. You can also add the current attack or new attack to a queue. They’ll be processed automatically and the next attack will start as soon as your previous attack is finished. Once you start your attack, you can monitor your progress in a separate window. This separate window is the attack manager window and this is where a user can view all ongoing, finished, paused, and queued attacks, manage them, and sort them by various parameters.
So when managing an attack, here are the following statuses. “In progress” means the attack is currently running. “In queue” means attacks that will be launched automatically once the current task is finished. “Paused”: attacks that were paused by the user. These attacks do not change their status automatically and should be resumed manually. “Success” means attacks that have finished with the password guest successfully. “Error” means attacks that encountered an error during their execution. User action is usually required before an attack can be resumed. “Not guessed” means attacks that have tried all the passwords possible for the selected attack method, but none of the password candidates match the hash.
So if a user selects the three dots at the right end of the line on the “In progress tab”, the user can view details of a particular attack, pause the attack, cue the attack, rename the attack, use the attack as a template, delete the attack. Once attacks are deleted, it is not possible to recover them.
Say you ran your attacks and the password was not guessed successfully. All you have to do now is run another attack and adjust your settings. You could try another dictionary or change your attack method. KeyDiver enables the user to utilize any existing attack as a template for a new attack. All you have to do is click that little “Template” button at the top of the screen.
If you were successful in your attack, the password is going to be displayed in the corresponding field. It can then be copied by clicking the corresponding icon in the password field. It is possible to save the password on the current device in .txt format using the “Save” button, or you can export the attack results in .json format using the “Export” button.
Let’s take another look at our BitLocker attack here. So I’m going to open up my KeyDiver, I’m going to go to “Create new attack”, and I’m just going to load up the .json file. You can see here my .json file. Go ahead and select “Open”, and now I can choose my method. So if I click on “Mask dictionary” or “Extraction-based dictionary”, I’m going to have a couple of different options below. So here I’m going to select “Mask”, pick “Parameters”, “Next”. For character set, I’m going to choose “Lowercase”, “Uppercase”, and “Numbers”. My character count can be anywhere from 4-60 characters. Now I’ll select “Next”, I’m going to change my hash rate and GPU load to “Maximum”, and I’m going to select my GPU. Now I can go ahead and select “Start”.
So with this mask that I had chosen, this mask attack, this is going to take a considerable amount of time. You’re going to see over on the right-hand side the amount of passwords that I need to attempt or that KeyDiver is going to attempt to use. So you’ll see here that it’s going to take something like two days. Yeah, two days it looks like.
So now what I want to do is I’m actually going to go ahead and pause this because we’re not going to wait two days to get the password for my BitLocker. So I’m going to go ahead and pause this extraction. Now, once I paused this attack, I can start taking a look at some other attack methods here.
Okay, so I can see that it’s on pause. If I wanted to export, all I’d have to do is hit that little button at the top of the screen to export and I can also delete, as well. So I can make a template, delete and export.
So now I’m going to explore some additional attack methods. I can see here that my attack has been paused by the attack manager. So now I’m just going to load up my .json file again, and I am going to take a look at some additional options here.
So let’s try a dictionary. So I just selected the most popular dictionary attack here. To try for some uppercase, I’m going to take a look at my other options: character order, fix, and we have the option to skip characters as well. So I’m going to choose not to mess with the prefix, suffix or character order, and I don’t really want to skip anything right now. So I’m just going to select “Next” and now I’m running this dictionary attack. This attack is going to take a considerable amount of time, but with some internet magic, we will get through this quickly.
So now I’m inside of my attack window. I can see the progress here currently at 0%. It should take about 20 minutes and 38 seconds. So we will speed this up a little bit and check in in just a moment.
So if I take a look back at my attack, I can see we’re at 98% and now I get a notification that my attack was not successful. So now all I have to do is just get back into my KeyDiver. I’m going to explore some additional attack options. So the first thing I’m going to do is load up my .json file. Now I can pick my parameters and let it take a little bit longer or I can use a custom variant. I also have the option when I select “method” and “Mask”, I can either pick parameters like we just saw, or through social engineering, I can obtain the password and write out the password myself. There’s absolutely nothing wrong with pausing an extraction and taking a guess on a password based on the information you’ve already gathered on a person.
Now I’ve done my social engineering, I have all this biographical data from this individual, I have street names, children’s names, likes and interests, job information, so I can start putting together a password list that I can try to run in KeyDiver. Once my attack is finished, I can see it was successful, I can see my password is listed out as “Password123”, I can also create a template, delete this attack or export this attack. I can save it out, as well.
Thank you everyone for joining me. Again, my name is Jordan Portfleet with the Oxygen Forensic training team. If you have any questions or concerns about anything that you’ve seen today, please reach out to us at Training@OxygenForensics.com. Thank you.
Julie O’Shea: Hi everyone. Thanks for joining our webinar today, Leveraging SaaS to Power Mobile Data Collections and Advanced Collections. I’m Julie O’Shea and I’m the product marketing manager here with Cellebrite Enterprise Solutions. Before we get started, there are a few things that we’d like to review. We’re recording the webinar, so we’ll share an on-demand version after the webinar is complete. If you have any questions, please submit them in the questions window and we will answer them in our Q&A. If we don’t get to your question today, we will follow up with you directly after.
Now, I’d like to introduce our speaker today. We have Monica Harris. Monica has decades of experience, specializing in the development, implementation and training of software for eDiscovery services such as KLDiscovery and Consilio. Before joining Cellebrite, she worked with the U.S. Food and Drug Administration, where she oversaw policy and procedure curation, enterprise solution rollout and training for enterprise solutions. She is an active leader and mentor in the eDiscovery community and has lectured on trending topics in eDiscovery at American University and Georgetown University, and is the co-project trustee for the EDRN Text Message Metadata project. Monica has previously served as the assistant director of the DC chapter of women in eDiscovery, and as a board member of the Masters Conference. She currently serves as immediate past president of the Association of Certified eDiscovery Specialists, DC Chapter, also known as ACEDS, and is a member of the EDRN Global Advisory Council. Thank you for joining us today, Monica. If you’re ready, I’ll hand it over to you now so we can get started.
Monica Harris: Thank you, Julie, and good morning, good afternoon, and good evening everyone. Welcome to Leveraging SaaS to Power Mobile Data Collections and Advanced Collections. For the next 30 minutes or so, I’m going to tell you a story. A story that’s going to focus on what is happening in our industry, or what’s called the why. And then we will talk a little bit about some mobile forensics education, the how, and then we will launch into our big reveal. So let’s get started talking about some industry trends or just some things at Cellebrite Enterprise Solutions that we have noticed are going on in the industry, often called pain points.
So let’s start with infrastructure, because we have noticed for enterprise level mobile data collections, when we’re talking to our community at large, infrastructure could be a major challenge. So what do we mean when we talk about infrastructure for mobile data collections? So some of you may have seen The Lincoln Lawyer on Netflix, particularly season one because I think there might be two seasons out now. In episode eight, the main character uses a UFED Touch to conduct a mobile collection. So he connects his device right in the car and does the collection on the spot. It’s absolutely amazing, and when we’re talking about mobile data collection and infrastructure challenges, that is not what we’re talking about.
Specifically, what we’re talking about when we talk about the infrastructure behind enterprise level mobile data collection is something similar to what you see here. This is the infrastructure behind Endpoint Inspector, which is Cellebrite Enterprise Solutions’ flagship remote collection product, for a remote collection of computers and phones, and then in addition to that, it also has the ability to collect from cloud. It’s a single point of glass, or it’s a single pane of glass, I should say, for several collection sources. Although this is specific to one product and one solution at Cellebrite, most of the architecture behind mobile data collection, when you’re talking about enterprise level, when you’re talking about large corporations, when you’re talking about service providers, it pretty much works the same.
So let’s dive into a little bit of the busyness that could be happening in the architecture behind an enterprise mobile data collection solution. So starting with the investigator. The investigator could be you or I. It’s the investigator, it’s the examiner, it’s the eDiscovery practitioner. It is the person that sits down at a workstation and begins the collection process. They are looking to collect data. Once they set up that collection, there’s generally a server. There is a server someplace, somewhere. That server is going to receive the request that comes from the examiner, the investigator, you or I, and then it’s going to begin talking to the endpoints that the collection is for. The server is kind of your project manager. Let’s say, it’s your project manager in your entire setup, in your infrastructure setup. And when it starts talking, it could potentially, in this particular scenario, be talking to an endpoint that could be located at the office.
It could be talking to an endpoint that could be located at home, or it could be talking to an endpoint that’s going to connect to a mobile device, that is then going to send back data to a storage repository. That storage repository and the server, they could be the same thing, they could be separate. But basically, the takeaway from this slide is that the architecture for enterprise level collection of data, not just mobile data, but data in general, is complex. It’s very complex, and that’s one of the trends that we’re seeing in the industry today. One of the reasons why the infrastructure behind enterprise level collection, mobile data, is so complex has to do with the way that mobile data evolves. It is an emerging data source.
So for those servers, if you think about that server that had the box around it before, in that previous infrastructure slide, whether that server is Windows’ or Mac’s, the first set of updates that you could see that would have to be applied to the server is the fact that it could be a Windows or Mac server. Right? So you’ll need all of the updates that go with those operating systems to make sure that they’re up to date and secure. Then there is the fact that those servers need to talk to endpoints, and those endpoints need to have the latest and greatest in terms of innovation to be able to collect from whatever device that it encounters. But not just the device, it could be an iOS 15 that is looking to be collected, in case you are fortunate enough at this early date to have one. Perhaps you’re like me and you’re waiting for Black Friday. Maybe you’ve got one at your desk right now, and that’s what the examiner or the investigators looking to collect from.
Maybe you don’t have an iOS or maybe you don’t have an Apple 15. Maybe you’ve got an iPhone 14, but your iPhone 14 has iOS 17 on it. iOS 17 came out about a month ago. By the time that I think a lot of you will be taking a look at this webinar, it could be that it came out about a month and a half ago. So right now, my phone is on 7.03 and that means about five weeks ago, 7.0, or 17.0 came out. So that’s, what, three updates in five weeks. That’s how often an operating system on an iPhone can update. Then add to that, Android has 14, that’s also recent, and with Android, although we don’t see as many consistent updates to the operating systems, the devices themselves that can work with the Android operating system, there is a plethora of those, whether it’s Samsung, Google, Huawei, the list goes on.
There are several Android manufacturers. So when that server is talking to endpoints, it needs to have all the intelligence that goes along with all of the various types of devices and all of the various types of operating systems that could be on devices, because not all of us always have phones that are up-to-date. It could be a matter of when your phone connects, when you’re connected to Wi-Fi, when you have power and connected to Wi-Fi, a combination of both. And then last but not least, add to that the applications that could be on the operating system, that could be on the device. On the right-hand side of the screen is a list of chat applications. Some of them might be familiar. Perhaps you’ve heard of Telegram, perhaps you’ve heard of Line or WeChat, but all… Or maybe even Discord, I think we used that for capture the flag not too long ago here at Cellebrite. But all of the applications that are listed on the right side of the screen are existing applications today, and if you’re not familiar with them, they could be applications that you could potentially run into in the near future.
So when we think about some of the challenges that come with maintaining servers, that do enterprise-level mobile data collection, there are several things to consider there. The server itself, the operating system that it’s running and making sure it’s up-to-date and secure. And then what the server does, its functionality, its purpose, the number of devices it can interact with, the number of operating systems that could be on those devices, and then the number of applications that could be on the operating systems, on the devices, that are touched by the server. All of that complexity is built into the trends that we see for server maintenance and server upgrades, particularly when, depending on who you are, these servers could be maintained by smaller groups. Let’s say, close groups of say, collection practitioners, forensic users. You may be in a company that’s in telecommunications, a company that’s in healthcare, a company that’s in insurance, and you’re in legal team, you’re an eDiscovery team, you’re on a forensics team. You may be running a skeleton crew, but yet part of your duties could be to maintain this infrastructure, to conduct this level of collection for hundreds of people.
Maybe you’re not inside a corporation, maybe you’re part of a service provider, right? So in a service provider, whether it’s a forensic service provider or an eDiscovery service provider, you are going to be dedicated to this work, but then it’s a different scenario there. Now you see hundreds and hundreds of customers, and you are much more likely to run into the different variations of phones, the different variations of operating system and the myriad of applications. So all of this factors into the complexity of infrastructure for enterprise level solutions and mobile data collection.
Then there are the data volumes that are increasing in eDiscovery. What you are looking at here are stats from a Logikcull blog that came out about a year ago, and I would be very interested, for reasons that we’re going to talk about in a moment, to understand if this is still the case. I think when we began to talk to the community at large about what they’re seeing, on average per case or per investigation, we’re going to find that these numbers have gone up. But right now, the community at large is aware of the fact that for a case, the average volume of data is about 130 gigs, about 130 gigs, and I wonder if that size has gone up. That means that that’s about 6.5 million pages of data because there wasn’t always an E in front of the discovery, and that the average number of custodians per case are 10 to 15.
So what does that mean and why may those volumes be going up? Well, that is due, in part, to the nature of the workforce. I think now we can call it post-pandemic era. I feel comfortable saying that, but about three years ago, we did start the trend towards remote work and then we begin to come back in hybrid fashion. But as of right now, these stats are from September, about 40% of us are either full-time remote or hybrid, a combination of the both. 40% of the workforce in the U.S. 65% of workers report wanting to work remote all of the time, and 98%, that is the highest debt in this entire presentation, 98% of workers want to work remote at least some of the time. And when we are not in the office, when you don’t have the ability to have those water cooler chats, when you don’t have the ability to walk up to someone in the office and begin to have a quick chat, what do you do?
You sit down at your computer, you pull out your phone, you pull out your tablet, and you start a conversation. Depending on what the nature of the conversation is, it may not be email, it could be text message, it could be any of the ways that we can continuously communicate with each other. So then you begin to have that short message, that continuous message, that text message conversation. But this is the reason why I think those stats that we saw in the previous slide, they’re going to go up because this slide indicates that we’re going to see more of a trend towards remote work, and as long as that’s happening, we’re going to communicate more, and we’re going to communicate more within emerging data sources. That is my prediction. But when we’re communicating with each other through those emerging data sources, we’re often using our own devices that are enabled for work and for remote work at that.
Right now, 80% of organizations support bring your own device as opposed to company-issued devices. And that brings with it challenges that we’ve been looking at, I think throughout the past three years, and we’re still considering, they’re still front of mind. First and foremost, being employee privacy. When you are using your own device to have these conversations, because you may be part of the 40% that are full remote or hybrid, or part of the 60% that would like to be, then when there’s a need to collect for an investigation for a case, there is a real concern about understanding that only the data that’s relevant to the matter is being collected. That is still at the forefront of everyone’s mind and still a trend in the industry that we’re seeing today. But the biggest trend, potentially, is the most sensational.
About a year ago, I used to ask, what would it take for mobile data collection to hit its tipping point? The answer to that… I’d say maybe about two years ago I was asking that question, I think the answer revealed itself last year in all of the sanctions that we began to see in case law, whether it be civil or criminal. If you are interested in some of those cases that carry sanctions, Lubrizol, Pork Antitrust, for example, there’s a wealth of information that could be found about case law that involves text message data, in sources like Doug Austin’s, eDiscovery Today blog. That’s a wonderful source for those cases. And then in addition to that, Kelly Twigger’s eDiscovery Assistant. Her Case of the Week, goes and dives into a lot of case law as well, and a lot of that, just because of the trends. And a lot of the trends that we’ve been talking about during this webinar, they talk about cases that deal with text message data, but when you’re talking in your own organizations, you can go either route.
You can take a look at eDiscovery Today’s blog or eDiscovery Assistant Case of the Week, and you could pull those cases. I believe these same cases are included there as well. The difference is, these cases are sensational because of who they involve. It may be a little bit more challenging to remember what happened in Lubrizol. It may be a little bit more challenging to remember what happened in Pork Antitrust, but it’s pretty straightforward to remember Bob Dylan, Johnny Depp and Brett Favre. As long as you can remember those three folks, then you automatically will have at your disposal three cases, three civil cases, that involve text message data.
So let’s jump in starting with Bob Dylan, Bob Dylan who dodged a bullet because there was a case that was brought against Bob Dylan that was dropped. In the case, he was accused of sexually abusing a twelve-year-old girl in the ’60s, and as it turned out, it was not chronologically possible based on what the opposing party said they had as discovery. And part of what they said they had as discovery was in text message format. When that data was requested, the case was dropped. The case was dropped and as a result of that, sanctions were then requested, as a result that the case was even brought up against Bob Dylan. So this is definitely a case of when text messages weren’t able to be produced, the case was dropped, and then as a result of that, sanctions were then requested from the party that brought the charges or the allegations to begin with. So Bob Dylan and text message data definitely go together in civil cases.
But not just Bob Dylan, Johnny Depp. And this may be a case that you’re familiar with because while this happened last year, Netflix, Netflix strikes again. Netflix has made it relevant for us, with a recent documentary that came out earlier this year. What’s the takeaway from the Johnny Depp case? The difference that text messages can make. I was fortunate enough to be joined by Kenya Dixon at Relativity Fest this year, who reported on this very specific case during our session, and we do have a difference of opinion. So here the text messages were admissible in the UK, which is where Amber Heard won her defamation… Or won the defamation case that Johnny Depp actually brought against her. But the same text messages were not admissible and the U.S. And therefore Amber Heard lost.
Now, I’m sure an attorney will tell you that there was several reasons why Amber Heard lost, but text messages, as you can see in these headlines, factored into the case. If you ask Kenya, she’ll tell you that that had to do a lot with the ability of the legal team. If you ask a legal technologist that works at a digital intelligence company, I’m going to tell you it had to do with the evidence. It had to do with the evidence and whether or not it was admissible. This is a second case that was won or made, or won or lost, or in this case since it was tried in more than one country and more than once, it was actually one end lost. And a lot of that had to do with the evidence that was presented, and a lot of that was text message data. Not a lot, but it factored in and it was important.
And last but not least, Brett Favre, this case is actually ongoing. So we can look forward to more information about this in the upcoming months, and I’m sure that I’ll be including it in another webinar. At a very high level, in this case, Brett Favre was looking to build a volleyball arena at the University of Southern Maryland. And in conversations that happened via text, it was decided… Excuse me, it was decided that welfare funds were going to be used to build that arena, which sounds like a big no-no. Marcin Krieger, who also joined me at Relativity Fest in my session, did an amazing job of presenting this case. But at a 50,000′ view, Brett is being axed for text messages, which he’s saying, not only does he have, but he cannot necessarily verify that the text messages came from him. He can’t authenticate them.
Of course, through triangulation, we can see that several other people have these text messages, but Brett seems to be answering a question that we’re not asking, in addition to not having evidence that has already been collected from other sources. So how this unfolds, this should be interesting, and I look forward to continuing this conversation with you all in future webinars. But these are three cases, three cases of celebrities, celebrities who have had civil cases brought against them. Text message data was part of the smoking gun, to be determined in some cases. We’re still trying to understand what’s happening with Brett Favre and the University of Southern Mississippi, but every headline that I have seen about this case involves text messages. So I’ll definitely be following this closely.
So for our industry trends, how does that summarize or what is our summary there? Enterprise solution architecture is server-based, that’s the most important takeaway from that really complicated diagram that we were looking at before. It doesn’t necessarily have to be a Cellebrite product. It doesn’t necessarily have to be our flagship product, Endpoint Inspector. When you’re talking about enterprise-level architecture, somewhere there’s a server and that server’s going to need to be maintained, and that server’s going to need to have updates because of the emerging and evolving technology that that server works with, whether it’s the device, the operating system or the application, or the applications, excuse me, that are on the mobile device.
Data volumes are growing. Data volumes are growing, and while we’re looking forward to new stats coming out in the new year, because we’re in Q4 of 2023 now, we know that’s happening because of trends within the workforce itself. We are leaning in or fully embracing remote work and hybrid work, and as a result of that, we still have to keep an eye towards privacy because while we’re at home, we are using our BYOD devices to communicate. And it’s not just the workforce, it’s celebrities too. It’s celebrities too, and there is plenty of civil case law to illustrate that, if you want to have these communications or if you want to start these conversations in your organizations about why text message data should be a part of all of your cases and all of your investigations.
So now that we’ve set the why, or talked a little bit about trends, let’s talk a little bit about mobile data collection itself. When I have an opportunity to talk to our community at large, there are some questions that consistently come up, and one of the most frequent questions that I encounter is when you receive a Cellebrite report, regardless of what format you might receive that in. Maybe it’s a UFDR and you’re using a Physical Analyzer, our Reader, to take a look at that, or maybe you have a CSV, the dreaded CSV, and you’re taking a look at the data in that. How do you know what was collected? How do you know what was done? So we have this slide here. If you’ve been with me in webinars before, you may have seen this slide. So I’ll try to walk through this and explain what mobile data extraction is for us here at Cellebrite.
So really this slide boils down to three type of extraction types. There’s logical, full file system and physical. For logical, you can break that up into two types. There’s logical and advanced logical. Logical is going to be the most basic form of extraction. You’ll get text messages, you’ll get contacts, call history, and then you’ll get some photos that go along with that. Well, just generally media, so it’s not just photos, it could be your audio as well. For your advanced logical, which usually means we are working with a backup. You usually see this when you’re working with iPhones. This is not an absolute, but it is very, very common when you are looking at iPhones. So you get everything that I just talked about, but because we are doing forensic extraction, you get a little bit more than the text message, the contacts, the call history and the media. You can also get a manifest of applications.
What does that mean? That means that if I were to do an advanced logical extraction of my iPhone 14, I would be able to see every application that’s on my phone. I wouldn’t be able to see what I use those applications or how long I use those applications for, but I could still see a manifest. For companies, let’s say for those who are in the financial industry, and you are regulated, and you’re looking to understand if your employees are talking to each other through unsanctioned applications, if you had an application manifest, you could then say, “Okay, I’m looking at about 10 employees. Those 10 employees all share applications,” maybe one of the applications I showed you in an earlier slide, maybe they all have Viber. If you’re noticing that all of your employees have Viber, it may be worth doing a full file system extraction to understand how long they’re using it, how often they’re using it, and then actually to collect the data from Viber itself. That is the difference between the advanced logical and the full file system.
The advanced logical is going to give you those applications, some device info, but the full file system extraction is going to give you a lot more. So what do we mean by device info? For example, device info will tell you things like whether or not automatically delete text messages in 30 days was set up on your phone. That’s a very important question. In some of the case law that we see, not the sensational case law that we went over earlier, but in some of the case law, there have been times when sanctions have not been imposed because custodians had their phones set up to wipe every 30 days, in terms of their text messages, and that was done far in advance of when the case actually began. So understanding information like that, that’s part of device info. That is not something you’re going to see in a logical extraction, but you can see it in some cases in advanced logical, you can absolutely see it in a full file system.
The full file system is the new holy grail, I call it, in terms of the amount of information that you’re going to be able to collect because it allows you to get into some of the more sensitive parts of the phone. That would be secured folder locations if you have an Android or maybe even an iOS keychain, but more importantly, that deleted data. We know with iPhones that was made easier than ever for no reason than we’re retaining text messages for 30 days. You don’t need a forensic extraction for that, but then what happens to the data after that? Full file systems have the potential to collect that data because they’re still the eDiscovery, it depends. It depends, and then of course there is physical extraction, that bit by bit copy or image of the phone.
As our phones have evolved, when you think about what we were talking about with server infrastructure and all of the complexity there. So as the devices themselves have evolved and we started to see things like full disk encryption, file encryption, and just several different layers of encryption that can be found on a phone, physical extraction, a physical imaging of phones has become more challenging with newer devices. So that is why we call full file system the new holy grail because you really see physical imaging of phones in older models, but not necessarily some of the newer ones like the iOS 15 that some of us may be clamoring to get. So at a 50,000′ view, that is mobile data extraction explained, but since we talked about condensing it down, right? So that physical imaging we usually see for older phones, and then the logical is contained in advanced logical.
So let’s take a moment to talk about the advanced logical versus the full file system and expand that out. In the previous slide, I’m showing you maybe about five or six things per extraction type, but this drills in far more in depth, in terms of what you can get with an extraction type. So for instance, if you’re looking at one of those Excels that Cellebrite has the ability to generate after you see an extraction. And your Excel has, oh, I don’t know, one, two, three, four, five, six, seven, eight, nine, 10, 11, so that’s 22, 24. If you see about 24 tabs in your Excel, then more than likely you are looking at a full file system extraction. All right, that is not something you’re going to see in advanced logical extraction. I can tell you that much. Now, whether or not it was a logical or advanced logical extraction, there is a few things that go into that. For instance, you’re not going to see browser history in a logical extraction, that was not on the previous slide. You’re not going to see documents in a logical extraction, that was not on the previous slide. You’re not going to see a list of applications in a logical extraction, that was not on the previous slide either.
So hopefully between this slide that’s very specific to the slide that you saw before, that will help you identify, when you’re looking at various reports that could be handed to you, if you are not the collecting party, if you’re the receiving party and you’re wondering what happened, well, we encourage you to have the conversation. That’s always the best way, but there’s also ways when you’re looking at the actual report that’s a part of the extraction, to understand what happened, or when you’re requesting a mobile data collection to know what to ask for. If it’s not enough to know if the application exists on the phone, if you actually want to know if it’s being used, then that’s a different conversation that you’re having with your collection practitioners. Same if it’s important to you to understand some of the more complex things that you see in the green box, like locations, for example, like cell phone towers. Those are all the power of advanced extraction. That’s not what you’re going to see in a logical or advanced logical extraction.
Also, I think what’s important to note on this slide is, when we’re talking about logical or advanced logical extraction, that can happen remotely. That can happen remotely. You do not need to be in the same room as the custodian or the employee. You do not need to take their phone away from them and disrupt them from business. That can happen remotely. Whereas advanced extraction, advanced extraction that is almost similar or akin to imaging or a physical copy of the phone, you must have the phone in hand for advanced collection. That cannot happen remotely. And due to the nature of that technology, as intuitive as it is, that is not something I would walk a custodian through. No, that would not be a great custodian experience.
All right, iCloud versus iTunes backup. What’s the difference? So just bearing in mind conversations I’ve had in the field with the community at large, oftentimes when I am talking to folks, I get the impression that, although the word iTunes is being used in conversation, they’re actually talking about iCloud and they’re not the same thing. So I thought that we would take some time to dive into that and talk a little bit about what the difference is. So iTunes, what’s the takeaway from this slide? The takeaway from this slide is for iTunes, you need a cable. It’s going to be the charging cable of the device. Well, it’s iTunes, we’re talking about Apple. So it’s that proprietary charging cable that the phone came from, and you need to install iTunes on a computer. So an installation on a computer is necessary. In addition to that, you’re going to need the charging cable of the phone, and in that way, you can move data back and forth between the computer and a device, and a device, as opposed to iCloud, which automatically backs up everything on your phone with a Wi-Fi connection, assuming you don’t go into your phone and change what iCloud backs up.
So you do have the ability with iCloud to say, “No, I actually do not want certain applications on this phone backed up.” But if you don’t, by default, iCloud will back up almost everything on your phone and it does it automatically, as long as you’re connected to Wi-Fi. So there is a difference. iTunes backups don’t happen automatically. That has to be initiated. It’s a very manual process. It requires a download and it requires hardware, whereas iCloud happens automatically and contains a lot more data, quite a lot more data. It’s almost the difference between a logical and advanced extraction. Almost, but it’s a good mnemonic device if you want to think about the two.
So then what’s the difference between an iTunes backup and an advanced logical extraction? Well, they’re very, very similar. They’re very similar in terms of what you can bring back, but there definitely are some differences. I think at the heart of the difference is the fact, that with the advanced logical extraction, that you are getting a forensically sound collection. So there’s a little bit more data that comes across in that extraction. And with that, we’re talking about things like your applications, your device information, and that partial file system data. That’s important, and that does make up a difference, like that application manifest, like the ability to understand, for example, some of the settings on the phone and then also the forensic container that the advanced logical extraction comes in. If you think that for any reason you may find yourself testifying in court, then the forensic container that’s hashed, that can prove that that’s a forensically sound extraction, that is the way to go.
That is the way to go for sure. But sometimes the smoking gun can be in the picture that is painted, and that is the difference within advanced logical extraction. About 60%, a little bit higher, of us in the United States have iPhones. We are a nation of iPhone users, so advanced logical extractions are normally what we see. We understand that there are Androids, but most of the time when I have the ability to talk to folks like the ones who are attending this webinar today, we’re talking about iPhones, we’re talking about iTunes, and so that brings advanced logical extraction to the forefront.
So the industry education summary. So in the past couple of minutes we have talked about mobile data extraction types, and they really boil down to three. Logical, whether that’s logical or advanced, logical, full file system or physical extraction. Logical extraction or advanced logical extraction for Cellebrite is what we call remote collection. Full file system is our advanced collection, and then physical extraction, which we have seen begin to become sunsetted as we see devices evolve. There are significant difference between iTunes and iClouds. Please do not use the two interchangeably, and then there are also differences between iTunes and device extractions, and I’m talking about those three extraction types that we talked about previously, primarily in how forensically sound it is in terms of that container, and primarily, and some of that more pertinent information that can come through in an advanced logical extraction that you may not see in an iTunes backup.
So now that we have talked a little bit about trends, and now that we’ve had a little bit of education about mobile device collection, let’s talk about the innovation. This is what we call the SaaS Trilogy, and this is why I’m so excited in today’s webinar today, because this is our big reveal. So in the SaaS Trilogy, we’ve talked a little bit about those trends, a quick summary of that, and then we’ve also talked about industry education, but it wouldn’t be a trilogy unless we added the innovation piece. So trends, education and innovation, because after all, who doesn’t love a good trilogy? Who doesn’t love a good trilogy? We all love a good trilogy. So with that, we also talked a little bit about advanced logical remote collection, right? And then we also talked a little bit about full file system.
Earlier, you may have remembered when we were looking at that very complex diagram of enterprise level architecture for on prem and cloud mobile device, I also talked about it’s not just mobile device collection, it can also be computer collection, and maybe with computer collection you might be doing incident response, and maybe conceptually that would be a trilogy. But today, today specifically, we are talking about the release of our very first SaaS product. The first SaaS product in a line of SaaS products, and that product is called Endpoint Mobile Now. With Endpoint Mobile Now, we have taken a look at all of the trends that we just discussed at the beginning of this webinar. The why, it is a SaaS solution that eliminates the complexity of that diagram that you saw earlier. So there’s no need for deployments, there’s no need for deployments at SaaS. There’s no need for maintenance or updates. The maintenance or updates that you know are consistent, because the operating systems on our devices are consistently updating, because the devices are consistently updating, and because the applications on the operating systems on the devices are constantly updating.
Imagine how many times you would have to update a server for enterprise level mobile data collection solution just to stay up to date with that. Well, now you don’t have to worry about that at all because there is SaaS. Real time mobile data collection, meaning you can collect when you need to collect. I wonder, out of all of those sensational cases that we talked about, would Brett Favre still have his text messages if we could collect from him in real time? It’s a question. It’s a question, we don’t know. We don’t know, but also real time mobile data collection is not as ubiquitous as we hope that it will be a year from now. But right now, you have the ability to collect mobile data that is always up-to-date for the latest technology that you can encounter in real time, and it is scalable.
It is scalable. If you collect 50 phones a year or if you collect five phones a year, this solution will work for you. And it’s actually tailored not towards the 50, it’s more tailored towards the five. It’s tailored towards organizations that are still understanding the trends and the education that I’ve presented in this webinar. It is tailored towards organizations that are seeing growth in mobile data collections, in their cases and investigations. So let’s show you, for those organizations that are seeing those trends, that are seeing that growth, that want to bring a mobile data collection system in-house, but perhaps do not have large teams that can work with the server maintenance updates, that can work with all of the evolving technology. We design this to be as simple and as easy as possible so that, when you get that investigation or when you get that case, you can get to that data as quickly as possible.
Now, I’ve done a lot of talking. I’ve done a lot of talking, so let’s show you what that actually looks like. This is Endpoint Mobile Now. When you first have access to the SaaS solution as the examiner, right? When you think back to that complex architecture diagram, there’s actually only two pieces. There’s only two pieces to that architecture when you’re talking about SaaS. And one is I am the user, I am you. I am the examiner, I am the investigator, I am the collection practitioner, and I know that I need to initiate a collection. So what do I do? I come into this product and I start with start collection. When I click on start collection, the application is going to ask me for some very, maybe about three, four pieces of information. So I can come in here and I can give this a collection name. I can set up a storage repository because the data that you’re going to collect is going to go to a storage location that you designate here.
No data that is collected is stored in the SaaS solution. I’m going to say that again. No collected data is stored with Cellebrite, none of it. The SaaS server that is set up here, its only job is to send emails and then tell the data to go back to a storage repository that is designated by the examiner. The examiner can choose a network location behind their VPN firewall to send that data to, they can choose an Amazon S3 bucket, or they can choose Azure Storage. Did I say SFTP? SFTP too. That list is growing. That list is growing, so we have a few options there. In addition, to anyone that does not want to send data over the network, you can also save it locally, right? So there’s several options for where the data goes, but the most important takeaway, when it comes to Mobile Now, is the data is not stored by Cellebrite. We are not hosting any data, so that data can stay within your network.
You’ll need the name of the custodian that you’re collecting data from, and you’ll need the custodian’s email address. If you choose, you can add an extra layer of security by adding a password to this collection, and if you choose, you can add notes for the collection, anything that will be helpful for the collected data later on. That’s it. From there, you can go in and choose the data to be collected. This might look very similar to those slides that we showed when we were talking about advanced logical collection because that’s what this is. This is targeted advanced logical collection, forensically sound, that will bring back data, that will help you paint a picture of the case. You can choose to bring all of it back or you could target what you want to see specifically. This will keep employee privacy at the forefront of the mind of the custodian, in the event that you are working with a BYOD device. It will also bring data back to you faster if the scope of the collection is smaller, and then that’s it.
From there, you can start the collection and just that quickly, you set up a mobile data collection. For the custodian, the process is also very straightforward. The custodian is going to receive an email, and in that email it’s going to tell them to download a mobile application. Very similarly to how you would have to install iTunes on the computer of the custodian if you wanted to do an iTunes backup. It’s also going to have an activation token that the mobile agent is going to need. This is the mobile agent and it’s operating system agnostic. It can work on either a Mac or a Windows computer. Once you launch it, it’s going to ask you for that activation code that was in the email. As the custodian, you can enter that activation code, grab the proprietary cable that came with your phone, or the charging cable that came with your phone, if it’s an iOS or Android. Again, very similar to an iTunes backup collection, with more data returned.
From here, there’s instructions on the phone. So the custodian can tell the agent, either I have an iOS or I have an Android, and the agent’s going to give the custodian some very simple instructions, things like, please make sure your display doesn’t turn off, or that it doesn’t lock, and connect your phone. And from there, we are off because the examiner, the user, you or I, when we were in the web interface for Mobile Now, we already told the agent we want text message data, or we want the actual advanced logical collection. So the agent already knows what to do. Once the phone is connected and the agent sees the activation code, that’s all the information it needs. It will start the collection process and then it will send the data back to the storage repository that the examiner set up, that SFTP, that network location, that Amazon S3 bucket or that Azure Blob, maybe it’s even stored locally. And that’s it. It’s that simple, collection done.
So summary for industry innovation. Cellebrite Enterprise Solutions is proud to announce that we have launched our first SaaS solution. That SaaS solution is for the remote targeted collection of mobile data as quickly as possible, whenever you need it, wherever you need it, and that solution is called Endpoint Mobile Now. It takes into account the trends that we have seen for infrastructure, allowing you to reduce your tech debt. It is always up-to-date with the latest innovation so that you can stay on top of evolving trends. And it’s scales with you, whether it’s three, five, or one, the number of mobile collections that you need are readily available. There are two steps to setting up this very simple process for collecting patent-pended targeted remote collection. That was a tongue twister. Let’s say that again. There are two steps for collecting and setting up this patent-pending targeted remote collection. There you go.
It’s logical collection for Androids and advanced logical collection for iOS devices, so that you have a more holistic view of the data. And for the custodian or the employee, it was designed to be as minimally evasive as possible. You get to keep your phone, you will not be parted with your phone. It’s targeted collection. So we are only looking for the evidence that is relevant to the device, and with a small download, similar that you would do with iTunes, you can just connect your phone, it will do all the work for you, and then send the data back to the person that requested it, the requesting party. No data is kept from Cellebrite. And that is all we have. Thank you very much for joining me today, for today’s webinar. Julie, do we have any questions?
Julie O’Shea: Yes, we sure do. Thanks, Monica. Let’s start with, how do you filter out business verse personal text messages on a BYOD?
Monica Harris: That’s a great question. So when were looking at Mobile Now, and we were looking at advanced logical extraction, you saw that we did not have to collect, for example, contacts, call logs, any of those things, pictures that were not attachments to a message, you could just target messages. But from there, I take this question being more specific, you’re then asking specifically, how do you collect the messages that are between myself and Julie, when we’re talking about a webinar, leveraging SaaS, in the month of October and the year of 2023? I think that’s what this question’s about.
So once, right now, during this webinar, we focused on the collection process. There’s a second piece. There’s a second piece to collection, and it’s called decoding and analysis. And all of this happens before you are ever in a review platform. Rather, you’re an investigative platform at this point. When you’re an investigative platform, like Cellebrite’s Physical Analyzer, you have the ability to do exactly what I just said. You can say, I’m only looking for the text messages between Monica and Julie, where they’re talking about today’s webinar, in the frame of October, 2023. And in that way, you can ensure that before you ever convert that data, so that it can be loaded to a review platform, that you’re only looking at the relevant data for the case. Great question.
Julie O’Shea: Thanks for clarifying that. Let’s see here. How about, is it possible to collect without an installation or cables?
Monica Harris: That’s a great question because we went through multiple collection methods, whether it was iTunes or logical, or advanced logical, or full file system, or even a physical collection. And the answer is from Cellebrite’s standpoint, no. No, there’s no, I think what you’re asking for is, wireless collection, and even now there’s no wireless collection. Well, how do I answer this question? You could do a wireless collection, but I don’t know if that would be the desired effect because of the level of security and encryption on a phone, and what would need to happen to the phone in order to be able to collect from it wirelessly. So assuming that you don’t want an undesired effect on the phone after collection, the answer is no. No, and from that standpoint, you are going to need the proprietary cable for the phone, and that’s across the board, regardless of the type of collection that takes place.
I know that’s a very frequent question. Sometimes it can even come across as, say, covert collections, but regardless of the multiple different types of collection, whether it’s remote and it’s the custodian that’s conducting the collection, or whether it is, say, advanced, and then that means that you’ve got a practitioner on site and they are working with a more, basically, almost all the contents on the phone. There will need to be a connection of some type. Great question.
Julie O’Shea: Got it. And this one, we have a need to collect from some of the applications that are on an earlier slide. Can you collect from Discord and WhatsApp?
Monica Harris: Oh, those are great questions. Yes. Yes, you can. So when we talked, we focused a lot in this webinar about text message data specifically, but those are chat applications or third-party chat applications that can be found on the phone, and they’re a little different between the two. WhatsApp is, last time I checked, statistically the most commonly used application for business in the U.S., and oftentimes we see WhatsApp data in advanced logical collections of iOS devices. In addition to that, that is also available in our flagship product, Endpoint Inspector. But when you began to talk about some of the newer applications, like Discord, which is gaining in popularity. We did use that for capture the flag in terms of communicating with our participants and then also with our dream team here at Cellebrite.
We can collect from those too, but those are different collection types and different solutions for that reason. When you think about a little bit of the education that we went through during the webinar, when you’re talking about Discord, Snapchat, any of the more, I call them complex, there are probably applications that come with end-to-end encryption. Maybe they are ephemeral. Then you’re looking at advanced collection methods in order to be able to work with that, and potentially you want advanced collection methods that have a SaaS architecture, very much like what we just talked about with Mobile Now because of how consistently those applications can update. So that was a very long answer. The answer is yes. Yes, we can, but you typically see that with our advanced collection solutions, as opposed to our remote or targeted collection solutions. Great question.
Julie O’Shea: Good, thank you. And last one, we’re going to have time for here today. Does Mobile Now store any collected data?
Monica Harris: No. No, Mobile Now does not store any collected data. The data is sent directly back to the examiner, directly back. We are not in the hosting business here at Cellebrite. That is not what we do. There are other companies that do it and they do it well, but we do not keep data here. We are committed to the extraction of data, and once we extract it, we want to get it to you as quickly as possible so that it can then be used for investigations, analysis and downstream processes. But Mobile Now does not store any collected data. Great question.
Julie O’Shea: Wonderful. Thank you, Monica. Well, like I mentioned, we are running out of time for allotted time here today. So we’re going to wrap this up and we will reach out to you individually after the webinar to answer the questions that we didn’t get to today. And I want to give a big thank you to Monica. That was a great discussion on how investigators and eDiscovery professionals can really leverage SaaS to power their mobile data collections and utilize those advanced collections as well. Never thought I would hear Bob Dylan, Lincoln Lawyer and Netflix so much in a webinar, but it was so informative and I’m sure our audience loved it as well. So thank you. And for any additional questions or to learn how you can get started with any of these solutions, you can reach out to us at enterprisemarketing@cellebrite.com or visit us at our website. Thank you Monica, and thank you everyone for joining us today. Hope everyone has a great rest of their day.