Notifications

Clear all

Identifying PHI and PII - keyword lists and regexp

General (Technical, Procedural, Software, Hardware etc.)

Last Post by jhup 11 years ago

3 Posts

3 Users

0 Reactions

3,472 Views

RSS

theredmoose

(@theredmoose)

Active Member

Joined: 14 years ago

Posts: 17

Topic starter 14/11/2014 4:14 am

The first thing I usually get asked when investigating a healthcare system is to identify whether PHI or PII reside on the system. People ask this because if it does then HIPAA regulates that the owners of the system may need to send out notifications within a specific time period.

Does anyone have any techniques they use to determine if PHI or PII resides on a system?

We should be able to do a keyword search on a drive after indexing the contents. I found this short list of regular expressions here.

^.*(ssn|social|security).*$
^.*name.*$
^.*address.*$
^.*city.*$
^.*state.*$
^.*zip.*$
^.*county.*$
^.*precinct.*$
^.*(email|e-mail|mail).*$

Has anyone else compiled such or list or have any other ideas on how to automate this task?

Quote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

14/11/2014 5:52 pm

The first thing I usually get asked when investigating a healthcare system is to identify whether PHI or PII reside on the system. People ask this because if it does then HIPAA …

The first thing I usually do when seeing acronyms on a new post is to ask the OP to explicit them
http//www.acronymfinder.com/PHI.html
http//www.acronymfinder.com/PII.html
http//www.acronymfinder.com/HIPAA.html
even if they are identifiable by the context.

jaclaz

ReplyQuote

jhup

(@jhup)

Noble Member

Joined: 16 years ago

Posts: 1442

14/11/2014 8:42 pm

When I dwell into a new cultural sub-category, I like to get samples of known data.

That is, in your case I would get a database of personally identifiable information (PII) as they are structured in the target system and extract key words from that.

I would do the same for protected health information (PHI), and anything else covered by Health Insurance Portability and Accountability Act (HIPAA).

It is easier and much better results, in my opinion to use a sample of known data to find similar data than to attempt and guess.

ReplyQuote

8 Forums
15.7 K Topics
92.3 K Posts
260 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed