Notifications

Clear all

Grep

General (Technical, Procedural, Software, Hardware etc.)

Last Post by PaulSanderson 11 years ago

8 Posts

6 Users

0 Reactions

1,181 Views

RSS

Dndschultz

(@dndschultz)

Eminent Member

Joined: 15 years ago

Posts: 24

Topic starter 10/04/2014 12:38 am

Looking for help with a grep expression?

I am wanting to search for "Town Hall Arson" "Townhall Arson" I also want a words near version. ie.. Town hall within 6 words of arson

Quote

twjolson

(@twjolson)

Honorable Member

Joined: 17 years ago

Posts: 417

10/04/2014 5:07 am

What tool are you using to search this regular expression? Each implements regular expressions slightly (sometimes drastically) different.

ReplyQuote

Passmark

(@passmark)

Reputable Member

Joined: 14 years ago

Posts: 376

10/04/2014 6:42 am

It is also important to have some details about the source document. Is it plain ASCII text, a PDF file, UTF-16. Does the text have line breaks in sensible places? How big are the file(s)?

ReplyQuote

Chris_Ed

(@chris_ed)

Reputable Member

Joined: 16 years ago

Posts: 314

10/04/2014 1:41 pm

Just FYI, the EnCase grep implementation would look like this

town.?hall arson
Make sure the "case sensitive" checkbox is unticked, and any relevant codepages are ticked.

Edit you can't do a "x words near" type search in EnCase without creating an index. You could do a rough approximation by trying to find "arson" within a number of characters of "town hall". It would look like this

town.?hall.{1,36}arson
This would hit when "arson" appears up to 36 characters after "town hall" or "townhall".

X-Ways would be similar, although in my opinion it would be better to do seperate searches for "town.?hall" and "arson" and then use the XWF search result filter to only show you data where both terms appear.

ReplyQuote

Dndschultz

(@dndschultz)

Eminent Member

Joined: 15 years ago

Posts: 24

Topic starter 10/04/2014 7:01 pm

I'll try the "town.?{1,36}arson"
I am using EnCase.
I do not have X-ways. Does Encase have any filters that will Combine search results?

ReplyQuote

Chris_Ed

(@chris_ed)

Reputable Member

Joined: 16 years ago

Posts: 314

10/04/2014 8:16 pm

Not that I'm aware of - but then, I am using v6. Maybe in the mythical v7?

ReplyQuote

Anonymous 6593

(@Anonymous 6593)

Guest

Joined: 17 years ago

Posts: 1158

10/04/2014 8:43 pm

I'll try the "town.?{1,36}arson"

As your requirement was 'within six words of'', you must also try ut with 'arson' in front of 'town'.

Just be aware that 36 characters is not the same as six words.

ReplyQuote

PaulSanderson

(@paulsanderson)

Honorable Member

Joined: 19 years ago

Posts: 651

11/04/2014 3:23 pm

Reconnoitre uses Lightgrep so this is what I am most familiar with, although Lightgrep is PRCE compatible grep. You *may* need to change the below to "Encase grep".

Given that you can't do "within x words of" and are limited to "within x characters" you may want to give some thought as to what characters are allowed between arson and town.

I am by no means a grep expert but AIUI the named character class "." used above will search for any character, or rather any byte so you might want to modify your search to include on asciie character a-z, A-Z, 0-9 etc. this may or may not result in a lot of spurious hits

Lightgrep has two named character classes

\s whjich is ascii white space, tab, linefeed, formfeed, eol and space
\w which is a-zAZ09_

So in Lighgrep the search would become

town[\w\s]{1,36}arson (I have missed out hall as it is superfluous to your requirements)

Of course restricting in this way may result in a match that contains some sort of control characters that do not fall into the character|whitespace specification - you need to decide what is acceptable )

ReplyQuote

8 Forums
15.7 K Topics
92.3 K Posts
279 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed