KS – an open source bash script for indexing data

KS – an open source bash script for indexing data

ABSTRACT:  This is a keywords searching tool working on the allocated, unallocated data and the slackspace, using an indexer software and a database storage .

Often during a computer forensics analysis we need to have all the keywords indexed into a database for making many searches on it in a fast way.

We could use strings and grep, for searching the keywords, but we cannot have a database and an engine, then we can’t search them inside many formats, like compressed files, including the ODT, DOCX, XLSX, etc..

So, I tried to solve this problem, first of all we need to extract, what I call “spaces”:


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.


Unsubscribe any time. We respect your privacy - read our privacy policy.

1)      Allocated space;

2)      Unallocated space;

3)      Slackspace;

Then we can run the indexer against these three spaces and we can extract all the keywords inside them.

We must remember that we have two kind of unallocated spaces, the first is all the deleted files and the second is all the files those are not in the deleted set, but they are still on the memory device (hard disk, pendrive, etc.).

For extracting these file we need to use the data carving technique, that consists into the search for the file types by their “magic numbers” (headers and footers), this technique is filesystem-less, so we can gather all files, allocated and unallocated (including the delete files too), so we need to eliminate duplication generated by carving.

The slackspace can be extracted by the TSK (The SleuthKit ) tools and put into a big text file, we have to remember that slackspace is all the file fragments present into the unused cluster space.

Inception

We have to create a directory named, for instance, “diskspace”.

We can mount our disk image file (bitstream, EWF, etc) into a sub-directory of diskspace, e.g. /diskspace/disk and so we can have all the allocated space.

Now, we have to extract all the deleted files including their paths and put them into “/diskspace/deleted”.

We have to run the data carving and put all the results into “/diskspace/carved”, we can use the data carving only on the freespace of the disk and then we must delete the duplicates with the deleted files.

Finally we can extract all the slackspace, if we need it and put it into “/diskspace/slack”.

Now we got:

/diskspace
|_disk
|_deleted
|_carved
|_slack

We only need a “spider” for indexing all these spaces and to collect all the keywords into a database.
For this purpose there is a program in the open source world: RECOLL that indexes a content of a directory and allows various quests. (http://www.lesbonscomptes.com/recoll/)

After the indexing we have all to perform our researches.

All these operations are made by my bash script called KS.sh  http://scripts4cf.sourceforge.net/tools.html

KS – This is a keywords searching tool. sudo bash ks.sh for running it. It mounts a DD image file; It extracts all deleted files; slackspace; It makes a data carving on the freespace only; It indexes all by RECOLL.
You need:
The Sleuthkit (last release)
Photorec
MD5Deep
RECOLL

It stores the index DB and the recoll.conf in the chosen output directory.
NEW file formats added and README.txt for the HowTo expand the search range.
Website:
http://scripts4cf.sourceforge.net/tools.html

This is the bash script code:

#!/bin/bash
#
# KS – by Nanni Bassetti – digitfor@gmail.com – http://www.nannibassetti.com
# release: 2.2
#
# It mounts a DD image file or a block device, it extracts all deleted files,
# it makes a data carving on the unallocated space, the it runs recoll
# changing automatically the variables in recoll.conf.
#
# many thanks to Raul Capriotti, Jean-Francois Dockes, Cristophe Grenier,
# Raffaele Colaianni, Gianni Amato, John Lehr, Alessandro Farina

echo -e “KS 2.2 – by Nanni Bassetti – digitfor@gmail.com – http://www.nannibassetti.com \n”
while :
do
echo -e “\nInsert the image file or the device (absolute path): “
read imm
[[ -f $imm || -b $imm ]] && break
done
while :
do
echo “Insert the output directory (absolute path):”
read outputdir
[[ “${outputdir:0:1}” = / ]] && {
[[ ! -d $outputdir ]] && mkdir $outputdir
break
}
done

(! mmls $imm 2>/dev/null 1>&2) && {
echo “0”
echo “The starting sector is ‘0’”
so=0
} || {
mmls $imm
echo -e “\nChoose the starting sector of the partition you need to index”
read so
}

HASHES_FILE=$outpudir/hashes.txt # File output hash
DIR_DELETED=$outputdir/deleted # Deleted File’s Folder
DIR_SLACK=$outputdir/slackspace # Slackspace’s Folder
DIR_FREESPACE=$outputdir/freespace # Carved File’s Folder
BASE_IMG=$(basename $imm) # Basename of the image or device

[[ ! -d $outputdir/$BASE_IMG ]] && mkdir $outputdir/$BASE_IMG

off=$(( $so * 512 ))
mount -t auto -o ro,loop,offset=$off,umask=222 $imm $outputdir/$BASE_IMG >/dev/null 2>&1 && {
echo “Image file mounted in ‘$outputdir/$BASE_IMG'”
}

# recovering the deleted files
echo “recovering the deleted files…”
[[ ! -d $DIR_DELETED ]] && mkdir $DIR_DELETED
tsk_recover -o $so $imm $DIR_DELETED

# extracting slack space, comment if you don’t need it
echo “extracting slack space…”
[[ ! -d $DIR_SLACK ]] && mkdir $DIR_SLACK
blkls -s -o $so $imm > $DIR_SLACK/slackspace.txt

# freespace and carving

[[ ! -d $DIR_FREESPACE ]] && mkdir $DIR_FREESPACE || {
rm -R $DIR_FREESPACE
mkdir $DIR_FREESPACE
}

# using photorec to carve inside the freespace

photorec /d $DIR_FREESPACE/ /cmd $imm fileopt,everything,enable,freespace,search

# taking off duplicates from carving directory
echo “taking off duplicates from carving directory…”
[[ $(ls $DIR_DELETED) ]] && md5deep -r $DIR_DELETED/* > $HASHES_FILE
[[ $(ls $DIR_FREESPACE) ]] && md5deep -r $DIR_FREESPACE/* >> $HASHES_FILE
awk ‘x[$1]++ { FS = ” ” ; print $2 }’ $HASHES_FILE | xargs rm -rf
[[ -f $HASHES_FILE ]] && rm $HASHES_FILE

# RECOLL configuration to have a single recoll.conf and xapiandb for each case examined.
echo “RECOLL is indexing…”
rcldir=$outputdir/recoll
recollconf=/$rcldir/recoll.conf
mkdir -p $rcldir/xapiandb

cat > $recollconf << EOF
topdirs = $outputdir
dbdir = $rcldir/xapiandb
processbeaglequeue = 1
skippedPaths = $rcldir $rcldir/xapiandb
indexallfilenames = 1
textfilemaxmbs = -1 # for indexing txt files greater than 10Mb thanks to Alessandro Farina
usesystemfilecommand = 1
indexstemminglanguages = italian english spanish
EOF

recollindex -c $rcldir -z >/dev/null 2>&1
case $(tty) in
/dev/tty*) echo -e “\nStart on terminal from graphic interface the following command:”
echo -e “recoll -c $rcldir\n”
exit 1
;;
*) recoll -c $rcldir >/dev/null 2>&1 &
exit 0
;;
esac

recoll1

1- RECOLL in action.

The RECOLL allow the search for keywords also working in compressed files and email attachments in short, once indexed all the content you had to be able to search for keywords or phrases, just as you would with Google.
As all the open source projects I have to thank to the collaboration of some friends  and developers.

Author
Nanni Bassetti, Digital Forensics Expert, C.A.IN.E. Linux forensic distro project manager, founder of CFI – Computer Forensics Italy, mailing list specialized in digital forensics topics, codeveloper of SFDumper and founder of the web site http://scripts4cf.sf.net.
Personal website: http://www.nannibassetti.com – e-mail: digitfor@gmail.com

2 thoughts on “KS – an open source bash script for indexing data”

  1. in the release 2.2 I added this:
    textfilemaxmbs = -1
    after the line
    indexallfilenames = 1 for indexing text files greater than 10Mb
    Thanks to Alessandro Farina

Leave a Comment

Latest Videos

Si and Desi interview Emi Polito from Amped about how to become an Amped FIVE Certified Examiner (AFCE). They discuss the exam requirements, format, timeline for certification, and Amped’s future plans. Emi explains that the certification is aimed at demonstrating competency with the Amped FIVE video analysis software after completing training. The exam consists of multiple choice questions on theory and practical exercises using the software. Emi talks about the online exam format and process for passing or failing.

Emi also discusses the broader challenges many organizations face with validation and accreditation. He emphasizes Amped's commitment to developing tools that facilitate that process. The hosts reflect on the confusing accreditation landscape and Amped’s passion for improving training and certification in forensics. This episode provides an overview of Amped's new certification and perspective on challenges in the field of video forensics.

Show Notes:

Introducing The AFCE Certification (Amped FIVE Certified Examiner) - https://www.forensicfocus.com/news/introducing-the-afce-certification-amped-five-certified-examiner/

Video Evidence Principles With Amped Software - https://www.forensicfocus.com/podcast/video-evidence-principles-with-amped-software/

Digital Image Authenticity And Integrity With Amped Authenticate - https://www.forensicfocus.com/podcast/digital-image-authenticity-and-integrity-with-amped-authenticate/

File Analysis And DVR Conversion Training From Amped Software - https://www.forensicfocus.com/reviews/file-analysis-and-dvr-conversion-training-from-amped-software/

Amped FIVE Speed Estimation 2d Filter And Training From Amped Software - https://www.forensicfocus.com/reviews/amped-five-speed-estimation-2d-filter-and-training-from-amped-software/

Amped Software’s Martino Jerian on Key Challenges and Opportunities for Video Evidence - https://www.forensicfocus.com/podcast/amped-softwares-martino-jerian-on-key-challenges-and-opportunities-for-video-evidence/

LEVA 2023 Training Symposium - https://www.leva.org/

Forensic Collision Investigation & Reconstruction Ltd - https://www.fcir.co.uk/

Amped FIVE Certified Examiner - https://ampedsoftware.com/afce-certification 

Introducing the Amped FIVE Certification Program - https://blog.ampedsoftware.com/2023/10/04/introducing-the-amped-five-certification-program

Amped Software YouTube - https://www.youtube.com/ampedsoftware
How to Use the Validation Tool in Amped FIVE - https://blog.ampedsoftware.com/2023/03/29/how-to-use-the-validation-tool-in-amped-five

Si and Desi interview Emi Polito from Amped about how to become an Amped FIVE Certified Examiner (AFCE). They discuss the exam requirements, format, timeline for certification, and Amped’s future plans. Emi explains that the certification is aimed at demonstrating competency with the Amped FIVE video analysis software after completing training. The exam consists of multiple choice questions on theory and practical exercises using the software. Emi talks about the online exam format and process for passing or failing.

Emi also discusses the broader challenges many organizations face with validation and accreditation. He emphasizes Amped's commitment to developing tools that facilitate that process. The hosts reflect on the confusing accreditation landscape and Amped’s passion for improving training and certification in forensics. This episode provides an overview of Amped's new certification and perspective on challenges in the field of video forensics.

Show Notes:

Introducing The AFCE Certification (Amped FIVE Certified Examiner) - https://www.forensicfocus.com/news/introducing-the-afce-certification-amped-five-certified-examiner/

Video Evidence Principles With Amped Software - https://www.forensicfocus.com/podcast/video-evidence-principles-with-amped-software/

Digital Image Authenticity And Integrity With Amped Authenticate - https://www.forensicfocus.com/podcast/digital-image-authenticity-and-integrity-with-amped-authenticate/

File Analysis And DVR Conversion Training From Amped Software - https://www.forensicfocus.com/reviews/file-analysis-and-dvr-conversion-training-from-amped-software/

Amped FIVE Speed Estimation 2d Filter And Training From Amped Software - https://www.forensicfocus.com/reviews/amped-five-speed-estimation-2d-filter-and-training-from-amped-software/

Amped Software’s Martino Jerian on Key Challenges and Opportunities for Video Evidence - https://www.forensicfocus.com/podcast/amped-softwares-martino-jerian-on-key-challenges-and-opportunities-for-video-evidence/

LEVA 2023 Training Symposium - https://www.leva.org/

Forensic Collision Investigation & Reconstruction Ltd - https://www.fcir.co.uk/

Amped FIVE Certified Examiner - https://ampedsoftware.com/afce-certification

Introducing the Amped FIVE Certification Program - https://blog.ampedsoftware.com/2023/10/04/introducing-the-amped-five-certification-program

Amped Software YouTube - https://www.youtube.com/ampedsoftware
How to Use the Validation Tool in Amped FIVE - https://blog.ampedsoftware.com/2023/03/29/how-to-use-the-validation-tool-in-amped-five

YouTube Video UCQajlJPesqmyWJDN52AZI4Q_VKk-mhlae1c

Becoming An Amped FIVE Certified Examiner (AFCE)

Forensic Focus 1st December 2023 4:25 pm

Subscribe to the Forensic Focus Podcast: https://www.forensicfocus.com/podcast/

Si and Desi are joined by Brittany and Ailsa from digital forensics software company ADF Solutions. They discuss how ADF is addressing key challenges for digital forensics practitioners, including handling the massive volumes of data from mobile devices and the cloud.

The guests outline ADF's focus on developing their software as an easy-to-use onsite triage tool that can help quickly identify pertinent evidence. Key features include advanced handling of video files, AI-assisted classification of images, and new screen recording capabilities for mobile devices that allow suspects to safely share relevant data. 

The hosts and guests also explore ADF's ongoing research into areas like facial recognition, handling new device types like games consoles and smart watches, and identifying deepfake media.

00:00 – Introduction to Ailsa and Brittany
03:00 – The challenge of vast amounts of data
05:50 – Recovering data from Chromebooks
08:50 – Triaging using ADF tools
12:30 – Benefits of using ADF Solutions’ tools
15:50 – Limitations in types of apps
17:20 – Keeping up with technological advancements
19:15 – ADF customer base
21:00 - Artificial intelligence in classifying images
30:00 – ADF Solutions’ triaging kit
37:00 – Training with ADF
40:00 – Target user
44:50 – Roadmap of future devices to examine
51:30 – Main focus for ADF Solutions going forwards

Show Notes:
AI-generated CSAM article on Sky News - https://news.sky.com/story/thousands-of-ai-generated-child-abuse-images-being-shared-online-research-finds-12991727

Subscribe to the Forensic Focus Podcast: https://www.forensicfocus.com/podcast/

Si and Desi are joined by Brittany and Ailsa from digital forensics software company ADF Solutions. They discuss how ADF is addressing key challenges for digital forensics practitioners, including handling the massive volumes of data from mobile devices and the cloud.

The guests outline ADF's focus on developing their software as an easy-to-use onsite triage tool that can help quickly identify pertinent evidence. Key features include advanced handling of video files, AI-assisted classification of images, and new screen recording capabilities for mobile devices that allow suspects to safely share relevant data.

The hosts and guests also explore ADF's ongoing research into areas like facial recognition, handling new device types like games consoles and smart watches, and identifying deepfake media.

00:00 – Introduction to Ailsa and Brittany
03:00 – The challenge of vast amounts of data
05:50 – Recovering data from Chromebooks
08:50 – Triaging using ADF tools
12:30 – Benefits of using ADF Solutions’ tools
15:50 – Limitations in types of apps
17:20 – Keeping up with technological advancements
19:15 – ADF customer base
21:00 - Artificial intelligence in classifying images
30:00 – ADF Solutions’ triaging kit
37:00 – Training with ADF
40:00 – Target user
44:50 – Roadmap of future devices to examine
51:30 – Main focus for ADF Solutions going forwards

Show Notes:
AI-generated CSAM article on Sky News - https://news.sky.com/story/thousands-of-ai-generated-child-abuse-images-being-shared-online-research-finds-12991727

YouTube Video UCQajlJPesqmyWJDN52AZI4Q_4z-EgH54KZk

The Power Of Digital Forensics: How ADF Solutions Is Revolutionizing The Digital Forensics Industry

Forensic Focus 30th November 2023 2:57 pm

This error message is only visible to WordPress admins

Important: No API Key Entered.

Many features are not available without adding an API Key. Please go to the YouTube Feed settings page to add an API key after following these instructions.

Latest Articles