Looking for help with a grep expression?
I am wanting to search for "Town Hall Arson" "Townhall Arson" I also want a words near version. ie.. Town hall within 6 words of arson
What tool are you using to search this regular expression? Each implements regular expressions slightly (sometimes drastically) different.
It is also important to have some details about the source document. Is it plain ASCII text, a PDF file, UTF-16. Does the text have line breaks in sensible places? How big are the file(s)?
Just FYI, the EnCase grep implementation would look like this
town.?hall arson
Make sure the "case sensitive" checkbox is unticked, and any relevant codepages are ticked.
Edit you can't do a "x words near" type search in EnCase without creating an index. You could do a rough approximation by trying to find "arson" within a number of characters of "town hall". It would look like this
town.?hall.{1,36}arson
This would hit when "arson" appears up to 36 characters after "town hall" or "townhall".
X-Ways would be similar, although in my opinion it would be better to do seperate searches for "town.?hall" and "arson" and then use the XWF search result filter to only show you data where both terms appear.
I'll try the "town.?{1,36}arson"
I am using EnCase.
I do not have X-ways. Does Encase have any filters that will Combine search results?
Not that I'm aware of - but then, I am using v6. Maybe in the mythical v7?
I'll try the "town.?{1,36}arson"
As your requirement was 'within six words of'', you must also try ut with 'arson' in front of 'town'.
Just be aware that 36 characters is not the same as six words.
Reconnoitre uses Lightgrep so this is what I am most familiar with, although Lightgrep is PRCE compatible grep. You *may* need to change the below to "Encase grep".
Given that you can't do "within x words of" and are limited to "within x characters" you may want to give some thought as to what characters are allowed between arson and town.
I am by no means a grep expert but AIUI the named character class "." used above will search for any character, or rather any byte so you might want to modify your search to include on asciie character a-z, A-Z, 0-9 etc. this may or may not result in a lot of spurious hits
Lightgrep has two named character classes
\s whjich is ascii white space, tab, linefeed, formfeed, eol and space
\w which is a-zAZ09_
So in Lighgrep the search would become
town[\w\s]{1,36}arson (I have missed out hall as it is superfluous to your requirements)
Of course restricting in this way may result in a match that contains some sort of control characters that do not fall into the character|whitespace specification - you need to decide what is acceptable )