Presenters: Chet Hosmer, Founder of Python Forensics, Inc. and author of Python Forensics; James Habben, Master Instructor, Guidance Software Training; Robert Bond, Product Marketing Manager, Guidance Software
Welcome everyone, and thank you for attending Guidance Software’s webinar, EnCase & Python: Extend Your Investigative Capabilities. We have two fantastic guests with us today. We have Chet Hosmer, who’s the founder of Python Forensics. He’s also author of two Python Forensics books. He’s an HTCIA Chapter President – as a matter of fact, we’re at an HTCIA conference right now, you might hear a little music in the background. We’re also going to welcome James Habben, he’s a master trainer here at Guidance Software, he’s an author of 14 apps on EnCase and App Central, and excellent [EnScriptor].
Before we get started with Chet and James, my name is Robert Bond. I’ll be your host and moderator for this session. If you have any questions during the session, please put them in the bottom right portion of your WebEx screen, and we’re going to answer your questions at the very end of the presentation. If you do not get a chance to ask your question today, go ahead and either plug it in the WebEx screen or send it to me at [email protected] Or, if you’re watching an archived version of the webinar, you can send it to me as well.
Before we get started with Chet and James again, I’m certain all of you have received this email. We have a great limited-time offer, a V6 to V7 upgrade offer. It’s half off the upgrade price, plus SMS. That package includes a free three-day transitions training, that transitions training would retail before for about $2500. That is the first time we’ve ever discounted that upgrade, and of course that comes with any additional upgrades, including 7.11, which will be available in the November timeframe.
So just to give you some resources – we always give you a list of resources to go to. This one’s a little bit different. Chet and James teamed up to create a two-part digital blog series. It’s called EnScript and Python: Exporting Many Files for Heuristic Processing. You can read that on Digital Forensics Today. And a lot of this webinar is going to be based, or at least tied into, that blog post. You can also go see a webinar that was conducted about a year ago, it’s called Increasing Functionality and Efficiency in EnCase with EnScripts. It’s specifically about how to get started with EnScript. It’s excellent if you’ve never written in EnScript before.
In terms of Chet’s Python resources, he has two books. One is Python Forensics, the other is Passive Python Network Mapping. You can get them anywhere where you buy books.
And again, we always talk about this resource – it’s free startup training. We have EnCase Forensic, EnCase Enterprise, and EnCase Cybersecurity training. It’s all on demand; if you go to our training page, then Resources, you can find the list of those separate trainings on demand. Just click on them, and you can see a menu, and you can actually traverse through the menu for whatever type of training that you’re looking for.
And then finally, we have our certified training classes. We have an EnScripting class. Today we’re going to highlight our CFI and CFII training. They are the cornerstone training for EnCase Forensic. They’re also the cornerstone training for folks who want to get their NC. What we always recommend is the annual passport – that gives you unlimited training, whether that’s on demand in class or the class. We encourage you to call up and ask about that if you have any questions. With that, let’s go ahead and get started with Chet. Chet, please take it away.
Chet Hosmer: Thank you, Robert, for that introduction, and let me add my welcome to the many participants in the webinar this morning. We’re going to cover several aspects today about integrating EnCase and Python. Some background information, then we’re going to step right into some examples. We’re going to start by talking about why we might integrate EnScript with Python, and then we’re going to provide a set of practical examples, starting very simple, and moving into more complex examples, where you can do a lot between the two languages.
Before I dive right in, I thought I’d give you just a quick background on Python Forensics Inc. We’re a non-profit research institute, and we’re focused on the development of open-source Python-based technologies for investigative purposes. We encourage you to join us in this pursuit. Everything we do is free and open-source, and we want to encourage you to participate in whatever level you can. In other words, if you’re a developer and you want to add scripts to the repository, and provide those out to other investigators, that would be great; if you have ideas or challenges that you’re currently facing that you’d like to see a Python solution for that’s integrated with EnCase, let us know; and if you see a solution that we’ve put out there that you think could be improved or you’d like to see new capabilities, let us know, and we’d be happy to try to accommodate that and get that out to the community.
But anyway, welcome, and hopefully we’ll hear from you in the near future.
Your first question might be “Why should we consider integrating these two environments at all?” Today we work in a rapidly changing investigative environment. These changes require that our investigative and incident response for toolkits really be easily extensible and adaptable to new situations and challenges. They should also be inclusive and open-source, such that a wide range of investigators, developers, and practitioners can really participate. These extensions should be able to provide immediate value to our investigation and the challenges that we face. And finally, it would be nice if they were easy to use, understand, and explain, and even evaluated for soundness.
From my vantage point I see a lack of overlap, if you will, between the social science and the computer science aspect of what we do during investigations. As the Venn diagram depicts, this overlap is somewhat limited, or at least slower than we would like. I could see us all benefitting if investigators, examiners, and behavioral scientists, for example, would interact more closely with researchers, developers, software engineers, in the process of actually creating new solutions to the problems that we face in advancing the toolkits that we actually work on everyday. I think we can all see a distinct advantage to strengthening the connection between the social and computer science side of things, and I think Python and EnScript can help broaden that interaction.
The key to this process is collaboration between both sides to make sure that we are actually communicating. I think we also want to create an on-ramp for investigators to participate, and those examiners that are facing the challenges that they have, so that we can actually develop new solutions as the situations arise. I think this can create meaningful analytics regarding the kinds of cases that we’re looking at; it can automate the behavioral analysis of the technologies versus just format and display; and it can capture investigator knowledge as they bring in their views regarding the case or a set of cases or the type of case that they’re working, and produce a better semantic understanding of the cases that they’re actually working on; and finally, link cases together using heuristics for both local and global cases, in order to connect things, since we live in a very global world and it’s becoming much more flat as we go day by day. I think this is going to become very, very important.
Now I’m going to actually turn the briefing over to James Habben from Guidance Software to provide a little background on both EnCase and EnScript, and where he sees this actually moving forward.
James Habben: Thanks, Chet. A little bit about EnScript, for those that aren’t aware. EnCase provides enhancements with EnScripts. They’re very powerful, they give you a lot of capabilities to do further enhancements that EnCase doesn’t have natively inside. Anyone can write them. So it gives us a lot of capability to utilize EnCase as a platform to help us out with our investigations, to automate a lot of things.
The catch though is that EnScript can’t quite handle everything. So a couple of things that I’ve run into before is compression, for the different compression algorithms that are out there, EnScript has a very standard, generic… I’m not sure which one it is, but there’s a very standard, generic function called Compress and Decompress, and it’s used for storing files by EnScript. But for extracting data out of files that have been carved out of data, of files, or unallocated or something like that, the compression algorithms are going to be different, depending on what kind of data it is.
So EnScript doesn’t have the ability to use these different types of compression, so using something like C# or Python gives us the capability to utilize some of these compressions that are out there. There are libraries already built for them, so we don’t have to go and implement this in EnScript, we can just utilize those libraries that are built into C# and Python.
Another thing that EnScript can’t handle is encryption. A lot of different encryption methods are out there, and EnScript, again, it’s got a very generic one, but we can’t deal with the different types out there, and the different sized keys and all that kind of stuff. So utilizing C# like we’re… well, we’re going to talk about Python here.
Utilizing C# and Python gives us really a lot more power over the top of EnScript because we can bring in all the capabilities that C# and Python have to offer, and all the libraries that they have available to them.
So then, a couple more questions that we commonly get asked in the training classes: “I want to learn EnScript. How should I start?” This is a very common question, and the problem is there’s no books to buy off the shelf. There is a help file inside of EnCase, and there’s this free fundamentals training manual available now. This is… It’s approximately a day and a half of material from our EnScript course that was removed from the course now, and is available for free in a PDF that you can download, whether or not you’re planning on taking the course. The course was modified to build on top of this, so that this is available for free, and it’s expected or implied that you have gone through this to where you can then learn the more specifics and the more advanced things on EnScript.
So the training manual, then there’s the examples in the help file and in the support portal, and then lastly there is actually a training course, 32 hours, and we walk you through all the different areas of EnScript that are commonly used, including building GUIs and making them advance, like we’re going to show you at the webinar here today.
Then another question is: “Where can I get these EnScripts?” There’s an EnCase App Central that is the central repository for all these EnScripts. So this is designed to be available to all users of EnCase, so that you can download scripts, you can search for them, you can download them. The majority of them are free. So go on to the site and check it out, do some searching, do some browsing, and see what’s available there.
Next, then, just a couple of examples of scripts that are available in EnCase App Central. These are the most downloaded EnScripts in the store. You’ll see there’s a [ShellBag] parser, there’s a Volume Shadow Service Examiner, the VSS examiner. These two are built by the training department. The Volume Shadow Examiner allows you to analyze the volume shadows and extract files from there that are different from the main evidence file that you have. There’s a couple of others – the SEEB and the Memory Analysis are written by third parties, people that don’t work for Guidance. The Link File and Jump List Parser is written by Training, the Parse setup is not, the Prefetch Parser, that particular one is not, the Copy Web Browser Files is not, and the Exit Viewer Plugin is.
So you can see the most downloaded, some of them come from Guidance Software training department, and a good amount of them actually come from examiners that have explored the world of EnScript and wanted to get into doing their own development and enhancing of EnCase. Now, I’m going to hand it back over to Chet, to give you a little bit of history on Python and how that got around, and then more into the integration between the two that he and I have been working on.
Chet: Thanks, James. I appreciate it. A little history and foundation of Python – it’s always nice to know where these things came from. And Python is the brainchild of Guido van Rossum, who grew up in the Netherlands, and received various degrees in computer science and mathematics from the University of Amsterdam. He is considered, even today, by his peers, as the benevolent dictator for life for Python, and driving force for where this technology will actually go in the future.
“Who uses Python today?” might be a question. In other words, are you going to be the only ones using this? And the answer is absolutely not. Companies like Google, and Industrial Light & Magic, and NASA, Dropbox… two companies, Google and Dropbox that von Rossum works for, or has worked for. Los Alamos National Laboratories, Pixar, NOAA, and many, many other organizations use this technology on a daily basis because of its ease to be able to produce.
Google and Dropbox use it as the infrastructure behind the technology they develop at those two companies, and many other companies are using this everyday. Many of the weather models that we see today out there are actually prototyped initially in Python, and maybe even continued in Python, because of its ability to actually deal with mass amounts of data.
What platforms today support Python? This is one of the really interesting aspects of Python. You can develop applications on a Windows platform, port them directly over to Linux, and then to a Mac, and then mobile device platforms that are out there, whether it’s phones or tablets, and the reason is that the scripts are portable. There are over a hundred specific operating systems today that support Python. Many of them are shipped directly. Most Linux boxes and Macs today actually come with Python pre-loaded. So you can start developing and accessing and utilizing scripts within those environments immediately.
In the March issue of the Communications of the ACM, one of the forefront magazines in computer science, noted that schools including MIT, Carnegie Mellon, Cal-Berkeley, Python has emerged as the leading language to teach novices. In other words, it’s going to be the first language that new computer science students are going to be using. I personally teach a graduate level scripting class at Champlain College in Vermont that uses Python and EnCase as part of that class. These are students that have very little computer science background but are great digital investigators. And this is catching on quickly, so that they can try out their hand at developing enhancements to current investigative environments that they’re involved in.
So then, what is Python Forensics? Well, a couple of years ago, when I was actually developing the first book on Python Forensics, I decided to come up a definition in order to help define this not only for me but for everybody else. And I consider Python Forensics as a free and open environment for the collaborative development of new methods and techniques for investigating cybercrime. And I think this kind of frames the problem that we’re after, and what we’re trying to do with Python and forensics, and now the integration of Python and forensics with EnCase.
So now, let’s actually dive in and look at how and what are the possible ways that we can integrate Python with EnCase, and actually examine some of the specific details about how we go about doing that. I’m going to actually walk through three basic methods – well, two basic methods and one a little bit more advanced method. The first two are going to first execute a Python script against an EnCase object, specifically a file that you’ve selected. Next, we’re going to actually run that same script, that same Python script, but in this case we’re going to include the results back to EnCase from the Python script.
Then we’re going to move to a little bit more advanced example, where we’re going to do two things – one, we’re going to exchange multiple files from EnCase through in EnScript, with Python, and we’re going to do a lot more processing on the Python side and deliver those results back to EnCase. So let’s start with the simplest method – executing a Python script against an EnCase object, for example, a file. And for this particular example, we don’t actually have to use an EnScript. We’re going to use a file viewer in order to be able to access the Python script and specify a specific file to operate on that script.
So the way we do this in EnCase is quite simple. We’re going to create, in this case, the pyBasic script that we have developed in Python, which I’ll show you in a second, and what we’re going to do is we’re going to select a specific file within EnCase, and we’re going to execute the file viewer pyBasic. When we do that, we’re going to actually get the pyBasic script to execute as the file viewer against that specific EnCase object. So in order to do that, we have to actually create, first, the actual file viewer. In order to do that, we actually go into Edit File Viewers within EnCase. And the first thing we do is we specify the name of the new viewer. In this case, the name is pyBasic.
Second step is to specify what we’re going to execute on the Windows box. In this case, we’re simply going to execute the command.exe – in other words, the command prompt, in order to be able to launch the Python application. And finally what we’re going to do is we’re going to specify the Python script that we want to execute. So we’re going to basically create a command line that we’re going to execute on the Windows box, and we’re going to pass in a file. So in this case, we’re going to specify the Python 2.7 executable, and we’re going to specify the specific script, in this case, pyBasic.py is the name of the script, and we’re going to pass in the object file that was selected within EnCase.
So to take a closer look at what we actually do within EnCase in order to perform this operation is we first want to select the file that we want to actually operate on within the Python script, and once we get there, we’re going to actually right-click on that particular file, which is going to bring up the EnCase option, and we’re going to select ‘Open With’. Remember, we just created the pyBasic file viewer that we’re going to select in order to execute this operation. So again, we’re going to select the Cat file, we’re going to right-click on it, we’re going to specify ‘Open With’, and then we’re going to select the pyBasic file viewer that we just created within EnCase.
What this is going to cause is the script to execute against that file object, in this case, Cat.jpg is going to be used, and all we’re doing within the Python script, being the simplest one we’re going to develop, is to specify the file name that we actually utilize, the file size attribute that we collected from the file, and the last modified access, and created times of the specific file. But then we generated that information from within the Python script, not from within EnCase. And I know this is a very simple script – getting that metadata out of the file wasn’t that interesting. But it allowed us to provide that connection between EnCase and the Python script, and produced meaningful results.
The next step is to take a deeper look at the Python script that actually performs this simple operation. And I’ve broken the Python script down into three separate slides, just to kind of build and narrow in on the different things that I’m doing. Many of you might not have seen a Python script before, so I’ll take my time and go through this a little bit more carefully.
Python scripts typically start with a set of import commands, where they’re importing libraries. In this case, I’m only using libraries from the Python standard library which are available for all of the platforms. So these same libraries are duplicated on the different platforms, whether it’s Mac, Windows, Linux, mobile devices, etc. So you can use these across platforms. So some scripts, and some libraries that are out there have specific capabilities that are only available on certain platforms, but for the most part, if you stick with the standard library, you should be fine.
So the three standard libraries that I’m using are, first, parse, which is used to parse the arguments that are provided in the command line. In this particular application, obviously, there’s only one argument that is passed on the command line, which is the name of the file that we’re supposed to process.
The second import is the time module, and in that particular case I’m using the time module in order to convert the Mac times into a human-readable form. Mac times are stored as [apac] values, and the time method will allow me to convert those [apac] values into human-readable form. And finally, I’m importing the OS module, which allows me to get information, for example, the metadata that’s associated with specific files.
So in this first step, I’m going to create a parser object from the argument parser that allows me to actually perform methods that are available within the argparse. The first thing I’m going to do is I’m going to add an argument, and that argument is the only argument that’s going to be passed in this command line, called the file. That’s going to be the name of the argument that we’re going to use. And finally I’m going to go ahead and execute the parse command to parse the arguments from the command line.
So once this particular script is execute, [it will] execute these lines in sequence and parse the arguments. And finally, I’m going to set a variable, the File equal to the argument, that comes in from the parse command, and I’m going to use that filename in order to get it. So what the file’s going to contain basically is the filename that was passed from EnCase into the script. So that’s step one: parse the command line, set up the import statements of the standard libraries that I’m going to use in order to be able to perform the operation.
The next step, I’m going to do a couple of very simple things – one is I want to print a message out to indicate this is the Test Python Application integrated with EnCase v7. The first thing I’m going to do is I’m going to use that OS module, and I’m specifically going to use the stat method of OS in order to take the information from the file that was passed on the command line, and retrieve the parameters that are associated, or the attributes that are associated with this particular file. So this one line of code will actually perform that operation for me.
Next, I’m going to go ahead and get the Mac times that were delivered back from the OS stat function into the file stat variable. And I’m going to do that by putting those Mac times within a list. So I’m simply going to create an empty list within Python called Mac Times. Now I’m going to start adding values to the list. I’m going to first add the modified time, then the access time, then the created time to that. Then I’m going to use the time method that I mentioned earlier in order to convert that value, which is an [apac] value, into something that’s human-readable.
Finally what I’m going to do is I’m going to I’m going to enquire the FileStat variable, I’m going to get the size of the particular file that was returned from the OS stat function. So on this page, what I’m going to do is extract the information about the file. I’m going to get the Mac times, and I’m going to get the file size information as well. So that’s step two of the process.
Finally, step three is pretty simple – what I’m going to do is print out the information that I’ve collected, in this case, I’m going to print out the filename, then I’m going to print out the file size, then I’m going to print out the last modified, last access, and created times from the list that I created on the previous slide. So that’s basically step three – to print out the results.
So three steps – one is set up, collect the information from the arguments; step two is to use the OS module in order to extract the metadata associated with the file; and step three, just simply print out the results.
Now we’re going to take a look at method two. Method two executes the same Python script that we just executed in method one. But it actually does it in a little bit different fashion. First of all, it is actually launched by an EnScript versus a file viewer. So we’re going to show you how that process works. And the second thing that it does – it will capture the results that are generated by the Python script, and return those results back to EnCase. So those are the two main differences – one, this Python script, the same one that we executed a minute ago, we’re going to execute that using an EnScript instead of the file viewer, and we’re going to report the results back to EnCase.
So this is the whole script, but I’m going to actually break it down into a couple of different parts here, so I can zoom in on the areas that are important to be able to consider in developing this script. It’s fairly straightforward. James Habben developed this originally, and I made a couple of minor modifications to it, but it is essentially the Python script that James had developed.
So to dive into the EnScript just a little bit, the important aspects of this are two. First of all, get the current item that is selected by the user. So in this example, the user is going to select a specific file within EnCase, and then we’re going to want to be able to take that file and actually convey it to the Python script just like we did in example one, but in a different method that we’re going to use there.
So the first thing we have to do is actually get that item that is selected by the user. The next thing that’s going to happen is we’re going to create a file that we’re going to transfer over. So what we have to actually do is copy the file from EnCase into a temporary directory and a temporary folder, and then write that as the output file that we’re going to provide to Python. And that’s what this does – we get the input file, we create a temporary folder, and then we write the output file into the input file that we’re going to provide to Python as part of the setup here. So again, we’re using an EnScript in order to be able to copy the file that was selected by the user, and actually put that into a temporary folder, and then write it into an output file so that we’re going to be able to provide to the Python script.
So next, what we have to do is we have to set up some variables that are going to hold the information that’s necessary. First, we have to basically specify the path of Python. In this case, it’s C:\\Python27\\python.exe as the actual path to the actual Python program, and as in the previous example, the second aspect of this is to specify the script that we want the Python interpreter to execute, in this case, pyBasic. And finally, we want to actually provide the output file name as the argument to the program that we wish to execute within the Python environment.
So finally we’re going to use the execute class to basically execute something within the Windows environment. We’re going to set the folder, and we’re going to set the application as the Python path, and then we’re going to finally set the command line up to include the pyScriptPath, which is the path to the Python script we wish to execute, and finally we’re going to include the arguments, in this particular case the filename that we want the Python script to occupy and to operate on within the script.
The final step in the process is to actually execute the script on the Windows box, using the exe start method that is provided within the EnScript. And what we’re going to do is write out that line and actually execute that output. We’re going to give it a thousand seconds to complete – obviously it’s not going to take anywhere near that long, but we want to be able to give it some time.
Once that operation completes, exactly what we saw in the previous Python script, we’re going to go ahead and create a bookmark. In this case we’re going to call the bookmark “Python Basic”, and we’re going to write the output that came from the Python script into that bookmark, so that we can include those results that were generated by the Python script into EnCase using the bookmark method.
The next step of course is to put the pieces together. So going back into EnCase, we want to actually select the file that we want to actually operate on – in this case, Cat.jpg – and what we’re going to do after we select Cat.jpg, we’re going to go ahead and we’re going to bring up the EnScript menu, so that we can select the specific EnScript that we want to execute. In this case, we’re going to execute PythonBookmark, and that is the actual script that we just went through a couple of minutes ago from the EnScript side. So now we’re going to have to go ahead and execute that PythonBookmark, which is going to do everything that we talked about – collecting the file, copying the file, actually setting up those parameters, and actually launching the Python script from within EnCase.
Then we can actually go and take a look at our bookmarks, because one of the things that we added to this particular script was the ability to write the information back to EnCase in the form of a bookmark from the Python script. So we’re going to select the Python Basic bookmark – if you recall, that was the name of the bookmark that we established within the EnScript – and we’re going to have to go in and examine that particular bookmark that we looked at. So to kind of blow this up a little bit, we’re going to select the Python Basic bookmark from the Bookmarks list, and that’s going to basically generate this output, which is the name of the file, Cat.jpg, the specific filename that was placed in that temporary directory that was created, the file size that’s associated with that particular file, and the last modified access and created times that we generated.
As promised, we’ll actually take a little bit more advanced method we want to use that will create an EnScript that’s more advanced, being able to process and pass multiple files via a folder to the Python script. We’re also going to create a much more advanced and elaborate Python script that does some real work for us that will deliver results back to EnCase that are actually really meaningful.
The first two steps are important to lock in the different methods in order to do this, but this method actually, the advanced one, is actually more useful and practical to an examiner or an investigator that might want to use both his Python script and the EnScript together in order to produce some meaningful results to their case.
So this Python script uses heuristics to discover probably words from a set of input files selected by the user. So in other words, the user is going to select instead of one file, is going to select multiple files, and the Python script, using this heuristic method that I’ve created, will actually generate probably words. We’ll talk a little bit about what probably words are in the English language – for example, there are many, many dictionaries that exist, but sometimes users misspell words, words are not in a particular dictionary so we can’t actually identify them. This method actually uses a different method to do that, and I’ll describe that method in just a minute.
So in this case, the EnScript is going to provide a directory of files selected by the user to process instead of a single file. And the Python script is going to process each file and discover any possible words or probably words that were associated with that file. Now, this file can be a jpeg that has embedded data in it, it can be a memory dump, it can be a text file or a document. It doesn’t make any difference what the file is, so it’s not just restricted to text files – it can be any type of binary data as well.
It’s going to finally report the words found along with the number of occurrences of each word. It’ll also produce an alphabetic list, but the first list it’s going to produce is a rank list by the number of occurrences of a particular word for each of the files that it’s going to process.
So how does this actually work? Well, the script basically creates a heuristic model based on a set of word dictionaries. So the concept here is to create a model that is based on the characteristics or the construction of words versus the words themselves. So instead of having a static dictionary, we can get a flavor or heuristic or rule of thumb of how English words, for example, are created, and what is their construction. And then, based on that, we can build a model. From that, we’re going to get files transferred from EnCase into the heuristic analyzer, the heuristic indexer, and then the results of that that match any of the files and any of the contents of the files that match these heuristic models will then be reported back to EnCase. So each of the files will be processed, all of the probably words will be found, and that result will be piped back to EnCase as part of the process on the Python side.
Next, I want to actually dive into the EnScript that makes this possible, and I’m going to give you just a little bit of an excerpt out of the EnScript that actually performs this process. The EnScript is very similar to the previous one, with the exception of some of the additional capabilities that James has put into this particular EnScript to give us this ability to process multiple files. The entire script will be available at this link down at the bottom, so if you want to actually look at the entire script and walk through it or study it, you’ll be able to do that afterwards, so you don’t have to worry about writing this down.
So what James has done is using the item iterator class and the ability to iterate through the files that were selected by the user, and then, from there, he creates a folder, which is kind of an export path. This is where all the files that were selected by the user are going to be captured, copied, and then stored in the export path that’s going to then be provided to Python. So the result is going to be the ability to provide a path or a folder that has multiple files in it that were extracted from the EnCase environment and provided to Python in order to be able to process.
The final step is to actually create the same kind of arguments that we had before – the Python executable, the pyIndex.py, which is the actual Python script we wish to execute, and then finally, the arguments that we’re going to pass to Python is going to be the path of the folder that contains all of the files that were copied out of EnCase into that particular folder or path, whatever you prefer.
Now I want to take a quick look at the Python excerpt – in other words, what is going on in the Python script in order to be able to handle this? And this is certainly simplified, and again, you can get the entire script, if you want to study all of the things that are happening in this script, from this website. And you’ll be able to get all that information.
But I want to focus on a couple of specific things that relate to the integration between the EnScript and Python, and how we actually transfer these files that are going to be processed. So I use the same method as before in using a function called parse command line, and in this case, instead of getting back a file, I’m going to get back a target path. This is going to be the path where those files are stored, and now I have to extract the names of those individual files and then process them.
So once again I’m going to use the OS module from within Python, but I’m going to use a different method this time. I’m going to use something called list directory or listdir. What it’s going to do is actually create a list, a specific target list that contains all of the filenames that are included in the target path. This is the path that James had created within the EnScript in order to pass to the Python script. So now we’ll be able to get all of those filenames. I’ll point out it’s just the names of the files that listdir will give me, and not the full path.
So then I’m going to create a simple loop, and the loop is going to be for each file within that target list. So these again are just the filenames. For each one of those files in the target list, as you can see, Python is quite readable. Anyone who doesn’t even have a programming background should be able to see that I’m going to loop through each of the files within the target list just by the words that I’ve used.
Then it’s going to create a full path, so it’s going to join the target path that came from the command line with each of the files, building the full path name of that particular file. And finally, it’s going to call the magic function – again, you can look at this within Python-forensics.org – that actually performs this process of indexing all of the probable words within the file that we’re going to process, and then, using the heuristic method actually we’ll identify probably words and print those out in an order that by occurrence. So the words that occurred the most within that process will be printed out.
I should note that within the application of the script, I’m using stop words and I’m filtering those out. Stop words are words like “whenever”, “always”, etc, that really don’t have any [probative or… value]. Those will be filtered out, so you’ll only get words that potentially have meaning to your investigation as part of that process.
As we did before, we’re going to jump now back into the EnCase environment, and you notice in this example the user has specified and selected multiple files within the case, with the blue checkmarks. Based on that, we’re going to go ahead and execute, in this case, the script called pyIndexHeuristic, which is the EnScript that we just looked at that will actually take the files that were selected by the user, and incorporate them into a folder that will actually be passed to Python instead of the individual files. Once that’s completed, we can actually then look at the bookmark that was created in the output, and you notice that all the words are specified here again, with their occurrence rank, for each of the files that we found.
So each of the files will have associated with it all of the probable words that were found within that particular file in their ranked occurrences. As I mentioned, if I were to scroll down here, you would actually see the alphabetic list as well, but this list is ranked by the number of occurrences.
That concludes the presentation portion of the webinar. At this point we’re going to move into the Q&A or the Question and Answer period. And as Robert Bond from Guidance mentioned at the beginning of this webinar, please submit your questions via the Q&A panel on the right side of the screen, so that we can actually handle and hopefully answer a bunch of your questions. I know we have a few minutes left, so hopefully you’ll ask some questions that will add to this conversation and move it forward.
I’ll put the screen up for you, so that if you need to contact me after this – this is Chet Hosmer from Python Forensics obviously, and you can access my website, send me email, or follow me on Twitter, at any time, and hopefully I will hear from you not only through the Q&A but also after the presentation.
Robert: Great. Thanks very much, Chet. That was an excellent, excellent presentation. For all the people that asked, yes, the presentation will be available. Go ahead and send us an email. You can either send it to [email protected] or you can send it to me directly at [email protected] We’ll make sure you get a copy of Chet’s presentation.
Alright, right now we’ve collected several questions, but at this time I’d like to welcome the audience to submit any additional questions for either Chet or James through the Q&A panel on the bottom right hand corner of your screen. If you don’t get a chance to submit a question today or, again, are watching an archive version of the webinar, you can email me at [email protected] I’ll make sure Chet and James do address your questions.
First, which version of Python were you using in your presentation?
Chet: This example used the 2.7 strain of Python, but the 3.4. strain would be fine as well. [Indecipherable] will support either one. The difference is… both strains are still supported by Python, and I have found that, in teaching this, the 2.7 strain is a little bit easier for folks to use if they’ve not done a lot of development before. 3.4 has some capabilities in it that make it just a little bit more difficult to utilize, but there’s such small differences between the language elements, there’s even converters that will convert your 2.7 script to 3.4 if that’s what you need to do.
But I’m using 2.7, to answer the specific question, and I think it’s a great place for folks to start if they haven’t. As a matter of fact, it’s just been extended, the life cycle. The life cycle of 2.7 has been extended by the Python group for a couple more years. So it’s not going away any time soon. And the reason for that is a lot of the libraries that are certainly relevant to forensics in other areas have yet to be converted to the 3.x strain, so you might not be able to use some of the libraries that you want to be able to use.
Robert: Okay, great. So there was a few questions about developing solutions inside of Python versus developing them with 100% EnScript. Can you speak to the differences there?
Chet: Yeah. I think James touched on this a little bit in his presentation, which I thought was really good. There are several reasons why you want to look at multiple languages when you’re trying to solve this problem. And I think there was even another question that popped up about whether or not you should be using open-source software to do this versus proprietary software, and I think there’s certainly a whole set of questions and answers that could arise from that. But fundamentally, when you’re developing… the EnCase and the EnScript capabilities are terrific at being able to extract and examine evidence that’s within the EnCase. And I think they do a terrific job and they’ve set up a lot of capability in order to be able to extract information from that, format it, display it in different forms.
But in some cases, you need to be able to do other types of processing – which you could develop, don’t get me wrong, in EnScript, but it may be more difficult and more time-consuming to do it within that environment and that language. The second answer is that… it goes to the third-party libraries that James touched on, not only the libraries that are built into Python, which there are over a hundred, but there are thousands of libraries outside of the [pro] libraries that can deal with specific issues. James mentioned a couple – [impression], and encryption, and some of those technologies that are out there that have been developed by a third party.
The second aspect of that is that there are literally thousands of developers out there that can help, and since it is an open-source community, if you have a question or if you need something, you can pretty much go to Google, and type “Python” and ask the question, and somebody has probably addressed it at least at some level, and you’ll be able to get some assistance to get started and what libraries are available to address that particular issue. So for example, if you’re trying to extract data from multimedia files, you could type “Python multimedia data extraction”, and you would definitely get a bunch of responses from that of folks that have developed specific libraries and additions to Python that would address those at different levels, depending on what you want to be able to do.
The third aspect of it is though there are so many developers out there that
are doing Python, and as I mentioned in the brief, with the ACM basically talking about how this is now the first language being taught in computer science programs, and some of the best ones around the country, and now even being taught in colleges that are teaching forensics, like what I’m doing at [indecipherable] College and also at Champlain. There’s a whole cadre of folks out there that can actually help and accelerate the development of new solutions.
Because the problem that we face is that the data that we have to examine, analyze, and come up with solutions for is changing everyday. So it’s very difficult to be able to do that from one source. So I think the combination of languages, using EnScript with Python or EnScript with C# or other environments, to be able to take advantage and leverage both the developers that are out there, the third-party libraries, and the solutions that have already been done, really make a nice marriage between the two environments.
Robert: Okay, great. So I want to take a quick break and I want to go right to resources. We’ve had a lot of questions about where to get more and more resources on Python. First, I want to address the few questions that we’ve had on where to get resources about EnScripting and learning how to write EnScript code.
The first place I want to direct folks is to KB Forensic, which is Lance Mueller’s website. It’s an excellent website. We have nothing to do with it here at Guidance Software. But an excellent website to get your feet wet around EnScript.
The second is of course EnCase App Central. You can go there and download any number of a 160 EnScripts. Most of them are free, most of them are written by Guidance Software trainers, and they do any one of a number of different things.
And then, Chet, can you take over here, and just talk about some of the resources around Python? And you talked about Google here, and of course we talked about your two books, but maybe just delving into the books a little bit more, or something outside of your books as a resource.
Chet: Yeah. There are resources galore online, there are online training classes that are free that you can take, and go through and get a fundamental understanding of Python and the language and the environment. There’s just literally thousands of examples of virtually anything you want to try to solve within Python. There’s also that developer resource that I mentioned in there, literally hundreds of books. If you go to Amazon and search on Python, you’re going to find a lot of fundamental texts that introduce you to the language and others that deal with specific areas, whether it be multimedia or large data or whatever [indecipherable] that you’re interested in. You’ll be able to find books that will address those specific topics.
So there’s the fundamental text that will actually provide you with baseline… obviously, this is being taught at universities and colleges around the country too if you want to go that route. But there are a lot of great training materials out there that you can go to and find online.
If anybody has specific questions about that and wants a little bit more narrow definition, I don’t have those in front of me, because they’re links, but I’d be happy to send them the links for the free training courses that are online that folks can take, as well as recommend a few of the books that I have in my library that would address specific areas. But I need a little bit more information from them as to what they’re trying to solve, in order to be able to point them in the right direction for a specific book, for example.
Robert: Yeah, and any general resources, you can send them in. I’ll go ahead and package that up, with the PDF and the presentation, and send it out to people that are requesting it.
Chet: That’s fine. No problem at all.
Robert: Alright. So let’s get back into some of the more technical questions that we have here as well. We have a question around the file viewer. In the file viewer example, is there any way to pass multiple files to the Python script?
Chet: Both James and I have scratched our heads on that, and there isn’t at this time. But I think it is definitely something that could be fed back to the developers at Guidance. I think it would be a great idea. Right now you just type “(file)”, and it’ll take the single file and pass it, but if you typed “(directory)” for example, it could take all of the selected files and move them over. I think that would be a nice enhancement to EnCase at the core, in order to be able to do that. And then that would solve that particular problem that James and I solved using an EnScript in order to be able to do that.
There is an advantage to using an EnScript however. So for example, if I wanted to do that with the file viewer, the file viewer is going to be fairly simple, and it’s just going to basically pass those files. But if we wanted, in the EnScript, to do some filtering in order to be able to only pass files within the specific type, for example, if users have selected directories, for example, or folders within EnCase, but we only wanted to pass actual jpegs to the Python script, we could make that happen within the EnScript very quickly, and then only the files that were, let’s say, jpegs or pngs or tifs or whatever they were, image types, would be passed over, regardless of the global selection that the user made.
So there are advantages of doing this with the EnScript versus with file viewer beyond just passing multiple files, because you could put greater criteria on what I want to actually pass over. And also, the big issue in my opinion with moving to the EnScript and why it’s more powerful is the ability for the EnScript to capture the output from the Python script, and actually store it as a bookmark associated with each individual file. So now you actually have a way to basically push information back to EnCase and store that information within the case file itself, that basically were the results that came from the Python script.
Robert: Got it. And Chet, I’m happy to say we do have James on a separate phone. We weren’t able to get him linked through WebEx, but I wanted him to kind of elaborate or extend that answer that you just provided. James?
James: [Indecipherable], can you hear me?
James: Okay. The EnScripts are great in that they can process… we collect a lot of evidence in [indecipherable] files, and the Python scripts, in order for them to access the [indecipherable] files, etc, you have to bring in other libraries [that are] available out there. So it just complicates some of the simple tasks that need to be done. So with the [decision] of allowing the EnScript to go in and extract the individual files, like Chet was saying, all we wanted were jpeg files for that task. So we can go in with the EnScripts, find the jpeg files, pass it on to the Python script, and it allows the Python script to remain kind of in simple form, to where [it can] process things on its own. So for those that aren’t using EnCase or maybe for that specific case you’re not using EnCase, [it’s simple] file collection or something like that, you can have the Python script directly [indecipherable] and then the EnScript can [additionally feed into] the Python script, to extract files from them, from the [indecipherable] case. So it just makes the Python side a lot more flexible to work with EnCase in combination.
Robert: We’re getting this question a lot – what is the general advantage of using Python over EnScript in most cases? Is it speed, is it flexibility, is it extensibility? Which word or words would you use – and Chet, I’ll start with you – to describe that advantage?
Chet: I would say that, again, my goal in doing this from the beginning was to be able to create this integration between the two environments that allowed [they] to be extended independently. I think James brought up a great point in the previous question, which was that you could then take Python scripts that are already written and already work, and perform a specific function, and you could develop an EnScript that will provide data to the Python script in the same way that the Python script is expecting. So you basically get this great… if I had to use one word, I would use ‘leverage’. See, you want to be able to leverage the base of solutions that are out there on the Python side from EnCase with limited modification of [indecipherable] on the Python side.
Robert: Do you have a different answer, James, or just elaborate on Chet’s?
James: I agree with that. That’s [indecipherable] I’m not sure [if there’s] a single word that I would use, but the biggest benefit I think is that there’s a much larger community of Python writers out there, and [Chet touched on this earlier]. We can take advantage of people who have already written these tasks in the Python community. EnCase is a little bit more specific, so by opening up our world to the Python, we can take advantage of the code that’s already been written, like he said, and we can, with minimal, maybe just [indecipherable] modifications, we can [indecipherable] data over to those Python scripts and spend very little time, and focus more time on getting the cases done.
Robert: Okay, great. And Chet, can the scripts that you created be used outside or independent of EnCase and EnScript?
Chet: Absolutely. Everything that I write can be used, and what we’re really trying to do between EnCase and the Python scripts that I’m developing is provide that [indecipherable], provide that interface between the two. I always write the scripts on the Python side, so that they can run standalone, but also, [indecipherable] make sure that the EnCase side is providing the input to the Python script in the same manner, and they can run together. So that’s kind of that [indecipherable] that James and I have worked out, to provide the demonstrations first, but also to provide the more advanced script, and being able to handle heuristic processing [indecipherable].
Robert: Alright, great. I know we have a ton more questions, but we always try and end at the top of the hour, and I want to give you all an opportunity to kind of fill in any gaps, because I know you’ve seen some of the questions come along. Chet, why don’t we start with you? Is there anything the audience needs to know before this webinar ends about Python and just getting started?
Chet: I think that the most important thing, and the reason we did this from the beginning is to get people excited about both about writing EnScripts and also while integrating those with Python. And I think we have to integrate this community of developers and practitioners together in order to be able to solve this problem, because the changes in the environment are happening too quickly to be able to rely on only one solution to do that. So I think that Python offers that extensibility and that on-ramp for investigators that don’t have [indecipherable] background potentially to be able to participate directly and communicate with the developers.
Robert: Great. And what would you add to that, James?
James: Well, on the EnScript side, there’s a smaller community – so EnCase has a small community, and then EnScript side there’s an even smaller community. So resources for writing that kind of stuff are very limited. The support portal has a forum for it, and there’s a lot of great people there that are willing to help. So the most important thing I think is to ask for help. If you’re looking to learn EnScript, [indecipherable] to help. Put some effort in to learn it. There’s a free PDF that I mentioned before, it’s a prerequisite to the EnScript course that is offered from EnCase training.
And the class is really great – it goes through a ton of the fundamental functionalities like making bookmarks and filtering for certain styles, so that we can only get the jpegs or only get the executables and things like that. So it’s really a benefit, for those wishing to do these kind of integrations, to attend the course. Because it’s not just how to do stuff inside of EnCase, it’s how to function [inside] of EnCase but still [feed] things out to things like Python scripts, and then take the data back in, like Chet was saying, and be able to bookmark it and prepare your reports, and give really a fully baked solution, as opposed to having to manually extract things, get these external reports, and bring them back in to integrate things all together.
Robert: Alright, great answer. Everyone, thank you so much for attending the webinar. If you’re an EnCase v7 or v6 user, thank you so much for using our product, and as always, please fill out the survey at the end of the presentation, which is right now. All your ideas are going to be used and taken into consideration when we invite folks like Chet and James to present to you. Thanks again, everybody, and have a great day.
End of Transcript