Hullo all,
Theoretical question
A number of applications create an index of a disk image initially to make keyword searches quicker in analysis. I've googled a bit, and the method of doing this suggested appears to be that the disk is indexed against an existing wordlist, with references to words (or file fragments or whatever) in that list being added to an index of locations within the file.
Can anyone confirm this ? Are there other algorithms at play ? Is there a forensically oreiented wordlist anywhere ? I know of plenty of online dictionaries/wordlists having used them extensively in password cracking, but they obviously wouldn't include file headers or similarly potentially useful keywords …
On the back of that - anyone actually know which search algorithms are employed ?
Thanks for indulging me -)
Azrael
FTK 1.7x utilizes dtSearch.
How dtSearch works
Special dtSearch Desktop features include
-a scrolling word list, for instant feedback as you type in a search.
-a look-up word feature, detailing the effect of fuzzy, phonic, wildcard, stemming and thesaurus search options.
browse and customize thesaurus options.
-a field button, showing all indexed document fields.
-a search history display.
-search reports, showing hits in retrieved documents, along with the requested amount of context.
-clipboard options, file launching, and other tools for working with retrieved data.
-exporting of search results in various data formats, for easy use with other programs.
-special forensic indexing and searching tools.
See also
At first I was confused by this post and didnt actually know how to respond but since we are talking about how to index google desktop search i've found the following summary taken from
A system for searching an object environment … create a search database and one or more indexes into the database. A scoring application determines the relevance of the objects … One or more of the indexes may be implemented by a hash table or other suitable data structure … A ranking scheme sorts searchable items according to an estimate of the frequency that the items will be used in the future.
Theres a paper on google desktop search
BitHead - Thanks, that's very helpful.
Ronan - Sorry, not my finest moment of clarity - it's been a long week -P When you load your acquired image into EnCase or FTK or whatever, it churns away for a bit "indexing" the image so that subsequent keyword searches are quick.
I guess that the algorithms are not far removed from the Google Desktop Search either - I hadn't considered that route of looking at it. Ta muchly 😉