Social Security Numbers exposed in the NDH bulk export
Independently verifies the 2026-04-30 Washington Post finding that the 2026-05-08 CMS National Provider Directory bulk export contains provider Social Security Numbers, leaked through "incorrect entries of provider or provider-representative-supplied information in the wrong places" (CMS). AINPI scans the entire FHIR JSON of every Practitioner and Organization resource for the dashed SSN format and classifies hits by JSON location.
Headline
41 of 50 flagged Practitioner resources in the 2026-05-08 NDH bulk export contain a Social Security Number, independently verifying the 2026-04-30 Washington Post finding. Of those, 41 appear in the qualification[].identifier[].value slot (state-license credential), 0 are embedded in the name[].given[] slot (literally as a name token), and 0 in name[].family. 5 additional matches are international phone-format false positives (e.g. Italy 39-XXX-XX-XXXX), filtered out. 0 Organization resources also carry SSN-pattern strings.
41 / 7.4M = 0.00%
unit: count
What this means
Regulators
The NDH bulk export already shipped publicly with provider PII in two consecutive monthly releases. CMS attributed it to "incorrect entries... in the wrong places," consistent with our JSON-location breakdown: most SSNs are in qualification.identifier.value (state-license slot). The April 2026-04-09 release carried 46 confirmed exposures; the May 2026-05-08 release dropped to 41 (CMS partially scrubbed but did not eliminate). Validation logic at NDH submission time would have caught every one of them. Treat as a directory-quality signal alongside the deactivated-but-listed and duplicate-organization flags AINPI tracks.
Provider data teams
If your provider data management platform pushes practitioner data to the NDH, audit the qualification identifier and name.given pipelines for SSN-pattern strings before serialization. The 41 AINPI flagged in May (down from 46 in April) are the tip of the iceberg — undashed 9-digit SSNs are not detected by this scan because they collide with too many other 9-digit identifiers (EINs, account IDs, claim IDs).
Researchers
AINPI replicates the WaPo finding using the same publicly-distributed bulk file CMS released. The value-add is a precise count, JSON-location breakdown, and per-state distribution that the WaPo article did not publish. The SSN values themselves are NOT republished by AINPI even though they remain in the public NDH bulk — responsible-disclosure posture.
Everyone using NDH
Any pipeline that consumes the NDH bulk export should run a PII scrub pass before downstream use. Specifically: regex-strip `\d{3}-\d{2}-\d{4}` from `qualification[].identifier[].value` and `name[].given[]` strings; flag for human review.
Null hypothesis
Zero Practitioner or Organization resources in the NDH bulk export contain a Social Security Number anywhere in their FHIR JSON.
Denominator
7,441,211 Practitioner resources + 3,414,375 Organization resources in the 2026-05-08 NDH bulk export.
Data source
BigQuery scan of `cms_npd.practitioner` and `cms_npd.organization` for the regex `\d{3}-\d{2}-\d{4}` in `TO_JSON_STRING(resource)`, with classification by JSON location (`qualification[].identifier[].value` vs `name[].given[]` vs `name[].family`) and false-positive guard against international phone formats (`\d{2}-\d{3}-\d{2}-\d{4}`). Source: AINPI replication of the public Washington Post reporting (2026-04-30).
Notes
Independently verifies the 2026-04-30 Washington Post finding by scanning the 2026-05-08 NDH bulk export (already loaded into BigQuery as `cms_npd.practitioner`/`cms_npd.organization`) for the dashed SSN format \\d{3}-\\d{2}-\\d{4} in the full resource JSON. WaPo reported 'dozens'; the AINPI scan identifies 41 confirmed exposures across 15 states. CMS attributed the leak to 'incorrect entries of provider or provider-representative-supplied information in the wrong places' — borne out by the JSON-location breakdown: most SSNs are in qualification.identifier.value (the state-license slot), with 0 cases of providers entering their SSN literally as a name token. Privacy posture: AINPI publishes counts, JSON locations, NPIs (professional IDs, not PII), and state breakdowns. The SSN values themselves are NOT republished in this finding's output, even though they remain in the public NDH bulk file CMS distributed. State Medicaid PI teams that want to validate or remediate should contact CMS NDH operations directly.