Using NLP to redact PII/PHI and protect patient information

Streamlining federal health agency provider inquiries using intelligent automation and protected data handling

A focused healthcare professional in a white coat analyzes data on dual monitors in a modern medical office.

Challenge

Each year, the federal health services quality platform has a set time when program participants can review and question their scores. During this period, providers can look over their results and send “inquiries” to the staff and research team if they see something they want reviewed.

In the past, this process was mostly manual. Staff and researchers had to reply through the quality system, keep track of each inquiry using JIRA tickets, and work together to figure out how to resolve issues.

Team Flexion decided to automate this process to make it faster and more efficient. While doing this, we found a serious risk: personal and health information (PII/PHI) could accidentally be shared across different tools. To fix this, we needed a way to remove or hide PII/PHI from the automation pipeline to keep sensitive data safe.

Approach

We utilized the spaCy framework to develop a system that learns from both correct and incorrect results, enabling us to continually refine the model for our specific needs.

At first, we tried AWS Comprehend, but we quickly found a major limitation: we could only customize data that the built-in model already identified as sensitive. If it didn’t catch something, we couldn’t adjust it. Because of this, we switched to spaCy and used the ‘en_core_web_sm’ model for entity recognition.

spaCy offered flexibility and easy customization, which allowed us to design a privacy-first, context-aware solution. “Privacy-first” means that if there’s any doubt, the system marks the data as sensitive. “Context-aware” means the model looks at how a word is used. For example, “Jane Doe” in a signature refers to a person’s name and should be marked as sensitive. But “Jane Doe’s Medical Facility” refers to an organization and isn’t necessarily sensitive.

To train the model, we created many mock examples and scenarios that covered different types of context. We then built a feedback loop to fine-tune the system and make the code more efficient after each live run.

Outcomes

For the live period from August 2025 to September 2025, we cut the false negative rate by 50%. A false negative in this case means sensitive data, like MBI, that should have been redacted but wasn’t because the old system didn’t classify it correctly.

Government compliance requires that all PII/PHI must be redacted in the JIRA tickets created by the inquiry project. When the system misses something, the quality control vendor team has to fix it manually. By using the spaCy tool, we made the system more accurate and reduced the amount of manual work needed to catch errors.

The spaCy framework fully replaced AWS Comprehend’s built-in PII/PHI model, giving us more control. While it’s difficult to add real-time feedback and fine-tune the AWS model, spaCy made it easy to integrate a feedback loop to adjust for false positives and false negatives as we went.

Client

A federal healthcare agency

Project Name

Health services quality platform inquiry process

Duration

2 months

Tech Stack

Backend Development

Python
spaCy

Cloud Services

Lambda
Comprehend

Version Control

GitHub

Ready to change the way you’re doing business?

← Back to all case studies

Previous Case Study

Meaningful modernization of Medicare Claims Processing

Next Case Study

Using NLP to redact PII/PHI and protect patient information

Streamlining federal health agency provider inquiries using intelligent automation and protected data handling

Challenge

Approach

Outcomes

Client

Project Name

Duration

Tech Stack

Ready to change the way you’re doing business?

Meaningful modernization of Medicare Claims Processing

Modernizing Dental Claims Processing for MCADS

Using an agile, human-centered design mindset, we transform digital technology to create powerful experiences for all.

Contact us

Case studies

Services

Industries