Join Us!

EnScript to search ...
 
Notifications
Clear all

EnScript to search for PII or SSN numbers  

Page 1 / 2
  RSS
noahb2868
(@noahb2868)
Member

Does anyone know of an enscript that can search for PII or SSN numbers? Please advise.

Thanks,
Noah

Quote
Posted : 31/07/2009 4:51 am
keydet89
(@keydet89)
Community Legend

PII?

ReplyQuote
Posted : 31/07/2009 5:11 am
hogfly
(@hogfly)
Active Member

Lance Mueller released an enscript to search for credit cards some time ago.

http//www.forensickb.com/2008/06/enscript-to-do-credit-card-luhn-test.html

SSN's are going to be messy search-wise. From experience they're stored in far too many formats and permutations. You're better off using something like Spider or IdentityFinder against a loop mounted disk image (imdisk, mountimagepro, smartmount, linux…).

ReplyQuote
Posted : 31/07/2009 5:16 am
kovar
(@kovar)
Senior Member

Greetings,

There is a blog post with a lot of useful comments here

http//forensicir.blogspot.com/2008/06/finding-pii-data.html

Many of the comments are from people you'll recognize from this forum….

Edit … including keydet89, Noah.

-David

ReplyQuote
Posted : 31/07/2009 5:17 am
noahb2868
(@noahb2868)
Member

Keydet89 it means Personal Identifiable Information

Hogfly, thanks for the info. I have been looking at that and so far seems the best since I have around 30 to 40 images to look at.

ReplyQuote
Posted : 31/07/2009 5:18 am
Patrick4n6
(@patrick4n6)
Senior Member

PII?

Personally identifiable information.

e.g. SSN, driver's license number, bank account and credit card numbers, DOB, and so on.

ReplyQuote
Posted : 31/07/2009 5:22 am
keydet89
(@keydet89)
Community Legend

I'm aware of what PII means, as well as how it's defined by a number of states. The fact is that much of the individual items you've defined are not specifically PII…they are considered part of PII, in conjunction with other data.

For example, per CA SB 1386
"(e) For purposes of this section, "personal information" means an
individual's first name or first initial and last name in combination
with any one or more of the following data elements…"; some of what you've listed above are not individually PII
http//en.wikipedia.org/wiki/Personally_identifiable_information

Rather, that information must be in combination with other data to be considered PII.

So this takes me back to the original question. The OP never asked about credit card numbers…too bad. Investigation into that would reveal shortcomings with respect to what GSI defines as "credit card numbers" and what PCI considers the range of valid credit card numbers that fall within their purview. CCNs require three primary checks (length, BIN, Luhn formula), which reduces (but does not eliminate) false positives. Track data is even less prone to false positives, due to the additional checks that must be made.

How is PII defined? How do you search for any person's first or last name across all space of a hard drive? Last name? In order for this to be considered PII, the person's name has to be included with something else…such as an SSN? So you need to search for all sequences of 9 numbers, in all of the following formats

123456789
123-45-6789
123 45 6789

Then, consider this…you need to search for names geographically close in the data to a valid SSN…so you need to define "close" and "valid SSN".

Say an organization does not store SSNs or CCNs, but instead uses account numbers. What format do you look for?

Searching for "Patrick" is relatively trivial. Searching for all possible first names is a daunting task to say the least…how do you do that? Do you include only American first names? Define that.

I'm not trying to be flippant here, or difficult. All I'm saying is that I've been down this road several times before, and each time its ended up the same way…if you're using EnCase and EnScripting one has to assume that you're working with acquired images and that you have to search both allocated and unallocated space. Your customer may say, "we store this information in database files", but if you limit your search to ONLY database files, you do your customer a disservice. I have seen the contents of databases bumped to text files through SQL injection. I have seen credit card numbers pulled out of the virtual memory used by specific processes, as that was the ONLY place on the system where the data was not encrypted.

It's easy to say, "uh…Harlan…dude…'PII' means 'personally identifiable information'. Sheesh." Okay. Now, if its so easy, write me a Perl script that locates PII in a file.

ReplyQuote
Posted : 31/07/2009 7:44 am
hogfly
(@hogfly)
Active Member

How is PII defined? How do you search for any person's first or last name across all space of a hard drive? Last name? In order for this to be considered PII, the person's name has to be included with something else…such as an SSN? So you need to search for all sequences of 9 numbers, in all of the following formats

123456789
123-45-6789
123 45 6789

Then, consider this…you need to search for names geographically close in the data to a valid SSN…so you need to define "close" and "valid SSN".

If I may…( I expect you know all of this but I am putting it out there anyways) Having done some rather large data cleanups and investigations as I'm sure you have…The persons name is never searched for, it's as you described - nearly impossible and fruitless. PII data element searching is quite honestly a bit of best effort with 'state of art' tools, whatever state of the art may be differs from day to day as it's a developing field.
Searching for SSN data as I said was messy. Yes you have identified 3 of the popular storage formats. I've seen them padded up to 12-15 digits to fit a database field format, and let's not forget they can be stored in scanned documents and saved as images, and I'm sure you've seen your fair share of weird and stupid storage methods.

The goal of any PII search or search tool is not to be 100% effective - there's never a 100% guarantee of finding it. The goal is to identify files containing hits to give an approximation of potential data loss and it then requires manual verification and validation. In addition, the organization whose data is being searched is vital to the search and cleanup process. They know their data storage formats.

An SSN by itself is just a string of digits. A valid SSN is well defined by the government but I get what you're saying because of storage format. Again, this is why the process always requires manual intervention. Proximity or geographical relation to the PII data element is generally interpreted to be a file, a record or series of fields that contain an identity. This could be a single row in a spreadsheet or a customer database spread across tables that can be easily combined with select and join statements, or a word document.

The hardest data element to search for is bank account number. There is absolutely no standard here. For this, you need to look for permutations of bank, account, acct etc…followed closely by a string of digits, and what's to guarantee it's digits?

ReplyQuote
Posted : 31/07/2009 9:07 am
bperk
(@bperk)
New Member

Harlan that was a great post. One for my reference library. Time to break out the GREP manual.

ReplyQuote
Posted : 31/07/2009 11:26 pm
keydet89
(@keydet89)
Community Legend

Hogfly,

Okay, so after all that…do you have an EnScript for the OP? Do you have an EnScript that searches for PII?

ReplyQuote
Posted : 01/08/2009 4:55 am
jhup
 jhup
(@jhup)
Community Legend

There are some SSN standards… Sort of…

There will never be a SSN where the sub-set is all zeros, i.e. 000-nn-nnnn, nnn-00-nnnn, or nn-nnn-0000 are invalid.

The first two parts have to be below 772-80-nnnn, as of today. They represent a geographic location where the SSN was applied to or issued, depending on the issue time (pre and post March 1972), and a batch number (second set). The Social Security Administration publishes monthly the batches issued.

666-nnn-nnnn is unassigned (albeit not officially acknowledged).

987-65-4320 through 987-54-4329 are used for example purposes.

The last four digits are just sequence numbers.

Here is some reference - http//www.socialsecurity.gov/employer/ssnvhighgroup.htm

Hope this helps with that grep.

ReplyQuote
Posted : 01/08/2009 5:18 am
hogfly
(@hogfly)
Active Member

Hogfly,

Okay, so after all that…do you have an EnScript for the OP? Do you have an EnScript that searches for PII?

*sigh* No, haven't written one..though it's been planned by me and one other person. I suggested very valid alternatives that satisfy 90-95% of PII searches. The rest can be satisfied with a regex.

ReplyQuote
Posted : 01/08/2009 10:14 am
seanmcl
(@seanmcl)
Senior Member

How is PII defined? How do you search for any person's first or last name across all space of a hard drive? Last name? In order for this to be considered PII, the person's name has to be included with something else…such as an SSN?

It is even more complicated that than, if you are in the United States. Of the 50 states, there are 43 different legal requirements for reporting loss/theft of PII. Name and SSN are the most obvious. Until recently, it was the default in Arizona that the driver's license number was the SSN and the Blues (health insurance) used to use SSN as the identifier for the policy holder.

In some states, name and driver's license number or name and birthdate with one other identifier (address, phone number, mother's maiden name, etc.) is reportable.

Name, account number and PIN are reportable in many states, but you don't need to have the complete name. For example, if I have the user login and PIN for a brokerage account, that is PII. I had a case where someone had purchased a stolen laptop and examination of the disk revealed valid userids and PINs for a number of customers of a brokerage firm which could have been used to manipulate the account holders' holdings. These were discovered only because Internet history data showed signs that a single account had been accessed and using the account parameters (stupidly passed as parameters in the URL) we were able to find the file containing the rest of the accounts. But there were no names, per se.

Bottom line, as others have mentioned, is that PII can be very difficult to spot unless you know what you are looking for. You can try doing searches for things like "account" or "username" or "userid" or "passwd" or "pass"… well, you get the idea.

ReplyQuote
Posted : 01/08/2009 5:47 pm
keydet89
(@keydet89)
Community Legend

I'm not sure that some of what we're seeing posted in this thread is going to be beneficial.

Case in point…while I was a member of a team on the PCI QIRA list (I'm still on the team, but we're not on the list any longer), during one of my recertification sessions, a tool for locating PCI data was discussed. I was one of the few forensic responders in the session, as most of the attendees were assessors. I made the point that the tool mentioned was insufficient for use by QIRA teams, as it only notified you that a file had been found to contain PCI data. My point was that for assessors, only one valid credit card number needs to be identified on a system…for QIRA forensic responders, *ALL* possible credit card numbers need to be identified.

My point is that we have to take care in how we search for PII data, because the same conditions are true. All PII data needs to be identified for the purposes of notification. There can be serious consequences if only 80% of the PII data is revealed and only those individuals are notified.

Another issue that needs to be recognized is that not all PII (or PHI) is in an easily searchable format. I have run a variety of searches on a system for PII data, all of which came up negative…only to find significant amounts of PII in scanned images (.TIF, etc.).

I guess the overall revelation about this issue is that examining a single system (say, just a laptop hard drive image) can be an intensive, iterative, manual (and hence expensive) process.

ReplyQuote
Posted : 01/08/2009 6:17 pm
seanmcl
(@seanmcl)
Senior Member

My point is that we have to take care in how we search for PII data, because the same conditions are true. All PII data needs to be identified for the purposes of notification. There can be serious consequences if only 80% of the PII data is revealed and only those individuals are notified.

Absolutely. On the other hand, there is tremendous cost to offering credit protection to individuals on the basis that their data may have been compromised. In addition to the direct costs, there is the indirect cost of damage to the reputation of the client who was keeper of the data (look at the Heartland case as an example).

Once you find what may be a valid name/SSN pair, verification is not inexpensive, especially if you have a large number of names. Credit reporting agencies typically charge between $15 and $45 per name/SSN pair whether they ultimately prove to be valid or not.

In other words, false positives can be as costly as false negatives. Either way, closing the barn door after the cow has escaped is expensive.

ReplyQuote
Posted : 01/08/2009 7:09 pm
Page 1 / 2
Share: