EnScript to search ...
 
Notifications
Clear all

EnScript to search for PII or SSN numbers

16 Posts
8 Users
0 Likes
1,602 Views
noahb2868
(@noahb2868)
Posts: 50
Trusted Member
Topic starter
 

Does anyone know of an enscript that can search for PII or SSN numbers? Please advise.

Thanks,
Noah

 
Posted : 31/07/2009 3:51 am
keydet89
(@keydet89)
Posts: 3568
Famed Member
 

PII?

 
Posted : 31/07/2009 4:11 am
hogfly
(@hogfly)
Posts: 287
Reputable Member
 

Lance Mueller released an enscript to search for credit cards some time ago.

http//www.forensickb.com/2008/06/enscript-to-do-credit-card-luhn-test.html

SSN's are going to be messy search-wise. From experience they're stored in far too many formats and permutations. You're better off using something like Spider or IdentityFinder against a loop mounted disk image (imdisk, mountimagepro, smartmount, linux…).

 
Posted : 31/07/2009 4:16 am
(@kovar)
Posts: 805
Prominent Member
 

Greetings,

There is a blog post with a lot of useful comments here

http//forensicir.blogspot.com/2008/06/finding-pii-data.html

Many of the comments are from people you'll recognize from this forum….

Edit … including keydet89, Noah.

-David

 
Posted : 31/07/2009 4:17 am
noahb2868
(@noahb2868)
Posts: 50
Trusted Member
Topic starter
 

Keydet89 it means Personal Identifiable Information

Hogfly, thanks for the info. I have been looking at that and so far seems the best since I have around 30 to 40 images to look at.

 
Posted : 31/07/2009 4:18 am
(@patrick4n6)
Posts: 650
Honorable Member
 

PII?

Personally identifiable information.

e.g. SSN, driver's license number, bank account and credit card numbers, DOB, and so on.

 
Posted : 31/07/2009 4:22 am
keydet89
(@keydet89)
Posts: 3568
Famed Member
 

I'm aware of what PII means, as well as how it's defined by a number of states. The fact is that much of the individual items you've defined are not specifically PII…they are considered part of PII, in conjunction with other data.

For example, per CA SB 1386
"(e) For purposes of this section, "personal information" means an
individual's first name or first initial and last name in combination
with any one or more of the following data elements…"; some of what you've listed above are not individually PII
http//en.wikipedia.org/wiki/Personally_identifiable_information

Rather, that information must be in combination with other data to be considered PII.

So this takes me back to the original question. The OP never asked about credit card numbers…too bad. Investigation into that would reveal shortcomings with respect to what GSI defines as "credit card numbers" and what PCI considers the range of valid credit card numbers that fall within their purview. CCNs require three primary checks (length, BIN, Luhn formula), which reduces (but does not eliminate) false positives. Track data is even less prone to false positives, due to the additional checks that must be made.

How is PII defined? How do you search for any person's first or last name across all space of a hard drive? Last name? In order for this to be considered PII, the person's name has to be included with something else…such as an SSN? So you need to search for all sequences of 9 numbers, in all of the following formats

123456789
123-45-6789
123 45 6789

Then, consider this…you need to search for names geographically close in the data to a valid SSN…so you need to define "close" and "valid SSN".

Say an organization does not store SSNs or CCNs, but instead uses account numbers. What format do you look for?

Searching for "Patrick" is relatively trivial. Searching for all possible first names is a daunting task to say the least…how do you do that? Do you include only American first names? Define that.

I'm not trying to be flippant here, or difficult. All I'm saying is that I've been down this road several times before, and each time its ended up the same way…if you're using EnCase and EnScripting one has to assume that you're working with acquired images and that you have to search both allocated and unallocated space. Your customer may say, "we store this information in database files", but if you limit your search to ONLY database files, you do your customer a disservice. I have seen the contents of databases bumped to text files through SQL injection. I have seen credit card numbers pulled out of the virtual memory used by specific processes, as that was the ONLY place on the system where the data was not encrypted.

It's easy to say, "uh…Harlan…dude…'PII' means 'personally identifiable information'. Sheesh." Okay. Now, if its so easy, write me a Perl script that locates PII in a file.

 
Posted : 31/07/2009 6:44 am
hogfly
(@hogfly)
Posts: 287
Reputable Member
 

How is PII defined? How do you search for any person's first or last name across all space of a hard drive? Last name? In order for this to be considered PII, the person's name has to be included with something else…such as an SSN? So you need to search for all sequences of 9 numbers, in all of the following formats

123456789
123-45-6789
123 45 6789

Then, consider this…you need to search for names geographically close in the data to a valid SSN…so you need to define "close" and "valid SSN".

If I may…( I expect you know all of this but I am putting it out there anyways) Having done some rather large data cleanups and investigations as I'm sure you have…The persons name is never searched for, it's as you described - nearly impossible and fruitless. PII data element searching is quite honestly a bit of best effort with 'state of art' tools, whatever state of the art may be differs from day to day as it's a developing field.
Searching for SSN data as I said was messy. Yes you have identified 3 of the popular storage formats. I've seen them padded up to 12-15 digits to fit a database field format, and let's not forget they can be stored in scanned documents and saved as images, and I'm sure you've seen your fair share of weird and stupid storage methods.

The goal of any PII search or search tool is not to be 100% effective - there's never a 100% guarantee of finding it. The goal is to identify files containing hits to give an approximation of potential data loss and it then requires manual verification and validation. In addition, the organization whose data is being searched is vital to the search and cleanup process. They know their data storage formats.

An SSN by itself is just a string of digits. A valid SSN is well defined by the government but I get what you're saying because of storage format. Again, this is why the process always requires manual intervention. Proximity or geographical relation to the PII data element is generally interpreted to be a file, a record or series of fields that contain an identity. This could be a single row in a spreadsheet or a customer database spread across tables that can be easily combined with select and join statements, or a word document.

The hardest data element to search for is bank account number. There is absolutely no standard here. For this, you need to look for permutations of bank, account, acct etc…followed closely by a string of digits, and what's to guarantee it's digits?

 
Posted : 31/07/2009 8:07 am
(@bperk)
Posts: 24
Eminent Member
 

Harlan that was a great post. One for my reference library. Time to break out the GREP manual.

 
Posted : 31/07/2009 10:26 pm
keydet89
(@keydet89)
Posts: 3568
Famed Member
 

Hogfly,

Okay, so after all that…do you have an EnScript for the OP? Do you have an EnScript that searches for PII?

 
Posted : 01/08/2009 3:55 am
Page 1 / 2
Share: