See what our clients say about working with Bonami Software across 200+ projects for 18+ industries. EXPLORE NOW!
We don't just build software. We deliver results. EXPLORE NOW!
See why businesses choose Bonami Software for reliable, scalable solutions. EXPLORE NOW!
We turn ideas into scalable products with proven delivery across 18+ industries. EXPLORE NOW!
See what our clients say about working with Bonami Software across 200+ projects for 18+ industries. EXPLORE NOW!
We don't just build software. We deliver results. EXPLORE NOW!
See why businesses choose Bonami Software for reliable, scalable solutions. EXPLORE NOW!
We turn ideas into scalable products with proven delivery across 18+ industries. EXPLORE NOW!

Patient Data Has Enormous Value for AI and Research. De-Identification Is How You Use It Without HIPAA Constraints.

We de-identify protected health information using HIPAA's Safe Harbor and Expert Determination methods — structured data pipelines, clinical text NLP, limited dataset creation, and re-identification risk validation so your analytics and AI programs can access the data they need.

BrowserStack
Persistent
Yatra
Kellton
Jade Global
Optum
PokerBaazi
Walmart
BrowserStack
Persistent
Yatra
Kellton
Jade Global
Optum
PokerBaazi
Walmart

Book a De-Identification Consultation

Talk to our team about your dataset and analytics use case. We reply within 24 hours.

  • We respond within 24 hours, fully NDA-protected.
BrowserStack
Persistent
Yatra
Kellton
Jade Global
Optum
PokerBaazi
Walmart
BrowserStack
Persistent
Yatra
Kellton
Jade Global
Optum
PokerBaazi
Walmart

Trusted by startups and global leaders

BrowserStack
Persistent
Yatra
Kellton
Jade Global
Optum
PokerBaazi
Walmart
BrowserStack
Persistent
Yatra
Kellton
Jade Global
Optum
PokerBaazi
Walmart

What PHI De-Identification Services Cover

Healthcare data is among the most valuable raw material for clinical research, analytics, and AI development. De-identification removes the elements that link data to individual patients — creating datasets that retain clinical value while exiting HIPAA's privacy and security requirements entirely.

Safe Harbor De-Identification

Removal of all eighteen HIPAA-specified identifier categories — names, geographic data smaller than a state, dates beyond year, phone numbers, SSNs, medical record numbers, IP addresses, device identifiers, biometric data, and more. Ages over 89 are aggregated. After removal, the organization must have no actual knowledge the remaining data could identify an individual.

Expert Determination Method

A qualified statistical or scientific expert applies generally accepted principles to establish that re-identification risk is very small. More flexible than Safe Harbor — some identifiers may be retained if risk analysis supports it — but requires documented methodology and qualified expertise. Expert analysis and documentation are essential to a defensible determination.

Structured Data De-Identification

Systematic transformation of identifier fields in relational databases, HL7, and FHIR data — including generalization of exact dates to year-only, ages to ranges, and suppression of rare attribute combinations that function as de facto identifiers. Automated pipelines process large record volumes consistently and reproducibly, with documented provenance.

Clinical Text De-Identification

Natural language processing techniques that recognize and redact or replace named entities in clinical notes, radiology reports, and discharge summaries — patient names, physician names, locations, dates, and other identifiers embedded in unstructured narrative text. The same identifier appears in many forms across clinical text, requiring trained NLP models.

Limited Dataset Creation

HIPAA's limited dataset framework removes direct identifiers while retaining certain indirect identifiers such as city, state, zip code, and dates — usable for research, public health, and healthcare operations under a Data Use Agreement. A practical middle ground between fully identified PHI and fully de-identified data for analytics use cases.

Re-identification Risk Assessment

Even after obvious identifiers are removed, combinations of demographic and clinical attributes can re-identify individuals in small populations or unusual clinical profiles. Re-identification risk assessment evaluates how remaining data could be combined with other available information — and guides suppression, generalization, or noise-addition decisions.

De-Identification Is the Mechanism That Unlocks Healthcare Data for AI and Research

Hover to explore the HIPAA safe harbors, technical methods, and risk considerations that govern PHI de-identification.

How PHI De-Identification Is Implemented

A structured process from data inventory to risk validation — each step with specific technical and compliance deliverables that determine whether a de-identified dataset can be used for analytics and AI development without HIPAA privacy constraints.

STEP 1 — Data Inventory and Identifier Mapping

Catalog all data elements in the dataset and map each against the eighteen Safe Harbor identifier categories and any additional indirect identifiers that could contribute to re-identification. For structured data, this means reviewing every field in the schema. For clinical text, it means identifying the categories of information embedded in free-text documents. The inventory determines the scope of transformation required.

STEP 2 — Method Selection and Expert Engagement

Select the appropriate de-identification method based on the dataset's intended use and the flexibility required. Safe Harbor is straightforward to implement and audit. Expert Determination allows retention of identifiers where risk analysis supports it — but requires a qualified statistical or scientific expert whose analysis must be documented. For most AI training datasets, Expert Determination provides the data richness that Safe Harbor's blanket removals do not.

STEP 3 — De-Identification Pipeline Implementation

Build and run the de-identification transformation — structured data pipelines that replace, generalize, or suppress identifier fields; NLP models that detect and redact identifiers in clinical text; and date shifting or aggregation logic that handles temporal data. Automated pipelines process records consistently at scale and create documented, reproducible transformation logs that support provenance requirements for research use.

Why De-Identification Matters for Healthcare AI and Analytics

HIPAA's privacy constraints on identifiable PHI significantly limit how healthcare data can be used for model development and analytics. De-identification is the mechanism that unlocks that data — click through to see what changes.

Book a De-Identification Consultation
AI Training
Training clinical AI models requires large volumes of patient data. De-identified data processed under Safe Harbor or Expert Determination can be used for model training and validation without the consent and access restrictions that apply to identifiable PHI — simplifying data governance significantly for AI development programs.
No Consent
HIPAA's Privacy Rule requires patient authorization for most uses and disclosures of PHI beyond treatment, payment, and healthcare operations. Properly de-identified data exits HIPAA's requirements entirely — it is no longer PHI and can be used without authorization, consented access, or the Institutional Review Board oversight that research with identifiable data requires.
Analytics
Digital health companies building analytics products can derive aggregate insights, population health benchmarks, and product intelligence from de-identified patient data that can be shared across health system customers and used to improve product functionality — business models that identified PHI's legal constraints would make impossible.
Research
Clinical research and population health studies routinely depend on de-identified datasets. Organizations that can produce well-documented de-identified datasets from their clinical operations become valuable research partners — an institutional relationship that generates both revenue and validation for healthcare products.
Risk Real
De-identification done incorrectly creates real liability. A dataset released as de-identified that can be re-identified exposes the organization to HIPAA enforcement, reputational damage, and potential civil liability. Proper implementation — with documented methodology and risk validation — is the difference between safe harbor and exposure.
Documented
The Expert Determination method requires that the expert's analysis be documented and available to demonstrate the basis for the de-identification determination. Undocumented de-identification is not defensible if a dataset's HIPAA status is ever questioned by OCR or a data recipient. Documentation is the compliance record.

The 18 Safe Harbor Identifiers HIPAA Requires Every One to Be Removed

Under 45 CFR 164.514(b), Safe Harbor de-identification requires removal of eighteen specific identifier categories. After removal, the organization must have no actual knowledge the remaining information could identify any individual.

Personal IDs

Personal Identifiers

Direct identifiers that link data to a specific named individual.

  • Names
  • Social Security numbers
  • Certificate and license numbers
  • Account numbers
  • Health plan beneficiary numbers
  • Medical record numbers
Contact Info

Contact and Location Data

Geographic and contact information that can locate or reach an individual.

  • Geographic data smaller than state
  • Telephone numbers
  • Fax numbers
  • Email addresses
  • Web URLs
  • IP addresses
Dates

Dates and Age Data

Temporal data requires special handling — dates beyond year and ages over 89 are identifying.

  • All dates except year (for individuals under 90)
  • Ages over 89 aggregated into 90+
  • Admission and discharge dates
  • Birth and death dates
  • Date of service
  • Procedure dates
Device & Bio

Device and Biometric Identifiers

Physical and technical identifiers unique to devices or biological characteristics.

  • Device identifiers and serial numbers
  • Vehicle identifiers and serial numbers
  • Biometric identifiers (fingerprints, voiceprints)
  • Full-face photographs
  • Comparable images
  • Any unique identifying number or code
Expert Method

Expert Determination Flexibility

The Expert Determination method allows retention of some identifiers when risk analysis supports it.

  • Qualified expert required
  • Statistical risk analysis
  • Very small re-id risk required
  • Methods documented
  • More clinical richness retained
  • Expert analysis on file
Limited Sets

Limited Dataset Framework

Limited datasets remove direct identifiers but may retain indirect ones — usable under a Data Use Agreement.

  • Direct identifiers removed
  • City, state, zip retained
  • Dates retained
  • Data Use Agreement required
  • Research and public health use
  • Simpler than full authorization

The De-Identification Stack We Build With

NLP models, structured data pipelines, and risk validation tooling — selected to match the data formats, volumes, and compliance posture of clinical datasets used in healthcare AI and analytics.

spaCy NLP S spaCy NLP
AWS Comprehend Medical A AWS Comprehend Medical
Azure Health NLP A Azure Health NLP
Google Healthcare API G Google Healthcare API
BERT Clinical Models B BERT Clinical Models
Healthcare Data Has Enormous Value for Research and AI. De-Identification Is How You Access It.

Safe Harbor or Expert Determination, structured pipelines or clinical NLP, limited datasets or fully de-identified — we build the de-identification framework that matches your data formats, your analytics use case, and your compliance posture. Book a consultation and we will map the right approach for your dataset.

Book a De-Identification Consultation
AI Readiness

Award-Winning AI Development & Consulting

2025

100 Fastest Growth Companies

2025

Global Spring Winner

2025

Top App Development Company

2024

AWS Partner Network

2024

Google Cloud Partner

2025

Highly Rated on Trustpilot

2024

Verified Agency

2024

Top App Development Company

2024

ASSOCHAM Member

Frequently Asked Questions

[ 1 ]

Can de-identified data ever be re-identified?

De-identified data produced under HIPAA's Safe Harbor or Expert Determination methods is treated as non-PHI for legal purposes, but this does not mean it is impossible in practice to re-identify individuals from such data. Research has demonstrated that combining de-identified health data with other publicly available data sources can sometimes re-identify individuals, particularly those with unusual demographic characteristics or rare conditions. Organizations working with de-identified data should implement contractual and technical controls to prevent re-identification attempts and should treat the data with appropriate care even in the absence of HIPAA's legal requirements.

[ 2 ]

What is a Data Use Agreement and when is it required?

A Data Use Agreement is a contractual arrangement required by HIPAA when a covered entity shares a limited dataset with another party for research, public health, or healthcare operations purposes. The limited dataset recipient must agree to use the data only for the purposes specified in the agreement, to not attempt to identify the individuals in the dataset, and to implement safeguards to prevent unauthorized use or disclosure. Data Use Agreements are simpler to execute than full HIPAA authorizations but are still legally binding agreements with specific required provisions.

[ 3 ]

What is the difference between Safe Harbor and Expert Determination de-identification?

Safe Harbor requires removing all eighteen specified identifier categories from the dataset — it is straightforward to audit and implement but removes more data than may be necessary. Expert Determination requires a qualified statistical or scientific expert to apply accepted principles and document that re-identification risk is very small. Expert Determination is more flexible because it allows retention of some identifiers when risk analysis supports it, but it requires qualified expertise to implement and the expert's analysis must be documented. For AI training datasets where richer clinical data improves model performance, Expert Determination often provides better outcomes than Safe Harbor's blanket removals.

[ 4 ]

Does HIPAA de-identification apply to de-identified data shared with AI vendors?

Yes. Once data has been appropriately de-identified under 45 CFR 164.514 — through Safe Harbor or Expert Determination — it is no longer PHI and is not subject to HIPAA's Privacy and Security Rules. It can be shared with AI vendors, used in cloud model training environments, and incorporated into product development workflows without the Business Associate Agreement requirements that apply to identified PHI. However, organizations should implement contractual prohibitions on re-identification attempts and should evaluate the AI vendor's data handling practices independent of HIPAA requirements.

Global presence

Two offices. One team.

Hi, I'm ARIA. Ask me anything about Bonami's AI agents.