Notifications
Clear all

Grep

8 Posts
6 Users
0 Reactions
1,181 Views
(@dndschultz)
Eminent Member
Joined: 15 years ago
Posts: 24
Topic starter  

Looking for help with a grep expression?

I am wanting to search for "Town Hall Arson" "Townhall Arson" I also want a words near version. ie.. Town hall within 6 words of arson


   
Quote
(@twjolson)
Honorable Member
Joined: 17 years ago
Posts: 417
 

What tool are you using to search this regular expression? Each implements regular expressions slightly (sometimes drastically) different.


   
ReplyQuote
Passmark
(@passmark)
Reputable Member
Joined: 14 years ago
Posts: 376
 

It is also important to have some details about the source document. Is it plain ASCII text, a PDF file, UTF-16. Does the text have line breaks in sensible places? How big are the file(s)?


   
ReplyQuote
Chris_Ed
(@chris_ed)
Reputable Member
Joined: 16 years ago
Posts: 314
 

Just FYI, the EnCase grep implementation would look like this

town.?hall arson
Make sure the "case sensitive" checkbox is unticked, and any relevant codepages are ticked.

Edit you can't do a "x words near" type search in EnCase without creating an index. You could do a rough approximation by trying to find "arson" within a number of characters of "town hall". It would look like this

town.?hall.{1,36}arson
This would hit when "arson" appears up to 36 characters after "town hall" or "townhall".

X-Ways would be similar, although in my opinion it would be better to do seperate searches for "town.?hall" and "arson" and then use the XWF search result filter to only show you data where both terms appear.


   
ReplyQuote
(@dndschultz)
Eminent Member
Joined: 15 years ago
Posts: 24
Topic starter  

I'll try the "town.?{1,36}arson"
I am using EnCase.
I do not have X-ways. Does Encase have any filters that will Combine search results?


   
ReplyQuote
Chris_Ed
(@chris_ed)
Reputable Member
Joined: 16 years ago
Posts: 314
 

Not that I'm aware of - but then, I am using v6. Maybe in the mythical v7?


   
ReplyQuote
(@Anonymous 6593)
Guest
Joined: 17 years ago
Posts: 1158
 

I'll try the "town.?{1,36}arson"

As your requirement was 'within six words of'', you must also try ut with 'arson' in front of 'town'.

Just be aware that 36 characters is not the same as six words.


   
ReplyQuote
PaulSanderson
(@paulsanderson)
Honorable Member
Joined: 19 years ago
Posts: 651
 

Reconnoitre uses Lightgrep so this is what I am most familiar with, although Lightgrep is PRCE compatible grep. You *may* need to change the below to "Encase grep".

Given that you can't do "within x words of" and are limited to "within x characters" you may want to give some thought as to what characters are allowed between arson and town.

I am by no means a grep expert but AIUI the named character class "." used above will search for any character, or rather any byte so you might want to modify your search to include on asciie character a-z, A-Z, 0-9 etc. this may or may not result in a lot of spurious hits

Lightgrep has two named character classes

\s whjich is ascii white space, tab, linefeed, formfeed, eol and space
\w which is a-zAZ09_

So in Lighgrep the search would become

town[\w\s]{1,36}arson (I have missed out hall as it is superfluous to your requirements)

Of course restricting in this way may result in a match that contains some sort of control characters that do not fall into the character|whitespace specification - you need to decide what is acceptable )


   
ReplyQuote
Share: