Identifying PHI and...
 
Notifications
Clear all

Identifying PHI and PII - keyword lists and regexp

3 Posts
3 Users
0 Reactions
3,472 Views
(@theredmoose)
Active Member
Joined: 14 years ago
Posts: 17
Topic starter  

The first thing I usually get asked when investigating a healthcare system is to identify whether PHI or PII reside on the system. People ask this because if it does then HIPAA regulates that the owners of the system may need to send out notifications within a specific time period.

Does anyone have any techniques they use to determine if PHI or PII resides on a system?

We should be able to do a keyword search on a drive after indexing the contents. I found this short list of regular expressions here.

^.*(ssn|social|security).*$
^.*name.*$
^.*address.*$
^.*city.*$
^.*state.*$
^.*zip.*$
^.*county.*$
^.*precinct.*$
^.*(email|e-mail|mail).*$

Has anyone else compiled such or list or have any other ideas on how to automate this task?


   
Quote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

The first thing I usually get asked when investigating a healthcare system is to identify whether PHI or PII reside on the system. People ask this because if it does then HIPAA …

The first thing I usually do when seeing acronyms on a new post is to ask the OP to explicit them
http//www.acronymfinder.com/PHI.html
http//www.acronymfinder.com/PII.html
http//www.acronymfinder.com/HIPAA.html
even if they are identifiable by the context.

jaclaz


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
 

When I dwell into a new cultural sub-category, I like to get samples of known data.

That is, in your case I would get a database of personally identifiable information (PII) as they are structured in the target system and extract key words from that.

I would do the same for protected health information (PHI), and anything else covered by Health Insurance Portability and Accountability Act (HIPAA).

It is easier and much better results, in my opinion to use a sample of known data to find similar data than to attempt and guess.


   
ReplyQuote
Share: