Skip to content

Redacts the PII information. This package uses Stanford NER package to identify and scrub PII data. It redacts email,ssn,driver license,passport no. It aggressively removes any number with more than 4 consecutive digits. Use AddToWhitelist to whitelist any pattern.

License

Notifications You must be signed in to change notification settings

Musfiqur01/PIIRedact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PIIREdact

Redacts the PII information. This package uses Stanford NER package to identify and scrub Name, Organization and location. It also redacts email,ssn,driver license, passport no. It aggressively removes any number with more than 3 consecutive digits. Use AddToWhitelist to whitelist any pattern.In order to use this you must have java installed.

Getting Started

Install the nuget package to get started.

The usage is: var redactor = new PIIRedactor(); var redactedData = redactor.GetRedactedData("My name is John Doe. My email is m@n.o");

The redacted string looks like : My name is xxxx xxx. My email is x@x.x

If you want to whitelist any pattern i.e any number with 6-8 consecutive digits, it should be done as follows: redactor.AddToWhitelist(new RegexFinder("\b\d{6,8}\b"));

Similarly to add a new redactable pattern will redact any word 6-8 consecutive digits. redactor.AddToWhitelist(new RegexFinder("\b\d{6,8}\b"));

Prerequisites

In order to use this package , you have to have java installed. If you dont want to use java, you have to disable IncludeEntityRedaction = false;

Versioning

We use appveyor for versioning.

Authors

Musfiqur Rahman

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

This project uses Standford NER package.

About

Redacts the PII information. This package uses Stanford NER package to identify and scrub PII data. It redacts email,ssn,driver license,passport no. It aggressively removes any number with more than 4 consecutive digits. Use AddToWhitelist to whitelist any pattern.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages