H9H10H11H12H13publishedNPD release 2026-04-09

NPI and taxonomy correctness

Do NDH NPIs pass the Luhn check, exist in NPPES, and agree with NPPES on name and primary specialty? NUCC taxonomy validity + currency.

Headline

95.72% of 10.9M NDH NPIs clear NPPES (0.79% ghost, 3.49% deactivated). Practitioner name agreement: 94.9% exact → 95.3% normalized → 97.9% Jaro-Winkler ≥0.85. Organization name: 56.3% exact → 88.0% normalized → 98.8% Jaro-Winkler ≥0.85 (closes the 44-point exact-match gap to 1pp). NDH carries NUCC on Practitioner.qualification (99.83% valid) AND Medicare Specialty codes on PractitionerRole.specialty (99.98% valid against the CMS-published crosswalk). Internal cross-system consistency: 85.8% of 3.3M Practitioner↔Role pairs agree via the crosswalk. External NUCC agreement NDH↔NPPES: 93.7% match NPPES's switch='Y' TRUE primary, 99.7% match any of the 15 slots, 6.0% match only a secondary. Slot_1 is NOT always the true primary (14.93% of rows).

2.9M / 3.3M = 85.79%

H10 NPPES match OK95.7%
H10 not in NPPES0.789%
H10 deactivated in NPPES3.49%
H11 Prac exact94.9%
H11 Prac normalized95.3%
H11 Prac JW ≥0.8597.9%
H11 Org exact56.3%
H11 Org normalized88.0%
H11 Org JW ≥0.8598.8%
H12 NUCC valid99.8%
H12 CMS code valid100.0%
H13 internal crosswalk85.8%
H13 NDH↔NPPES slot 192.0%
H13 NDH↔NPPES true primary93.7%
H13 NDH↔NPPES any of 1599.7%

unit: percent

What this means

Payer data teams

When comparing NDH specialty to NPPES, match against all 15 NPPES taxonomy slots — NOT just slot 1. 15% of NPPES records have their TRUE primary (switch=Y) in a non-slot-1 position, and 6% of NDH Practitioners legitimately match only an NPPES secondary board (dual-specialists).

FHIR implementers

NDH uses TWO specialty code systems on two resources — NUCC on Practitioner.qualification, CMS Medicare Types on PractitionerRole.specialty. A consumer filtering on one won’t interoperate with one using the other. Apply the CMS-published Medicare/NUCC crosswalk (updated quarterly) to bridge.

Regulators

0.79% of NDH NPIs (86K) don’t exist in NPPES at all. 3.49% (379K) are deactivated in NPPES but still live in NDH. NDH’s update cadence lags NPPES by the gap window between releases.

Researchers

99.98% CMS structural validity + 99.83% NUCC validity = the underlying code quality is excellent. The interesting signal is inconsistency BETWEEN code systems for the same practitioner (14% fail the crosswalk check), not invalid codes themselves.

Null hypothesis

NPI structural validity is ≥99.9% and NDH-to-NPPES agreement on name and primary specialty is within documented drift thresholds.

Denominator

All `Practitioner` and `Organization` resources with an NPI identifier.

Data source

CMS NPD bulk export joined against the NPPES monthly full dissemination file (V.2) and the current NUCC quarterly code set.

Notes

Source: bigquery-public-data.nppes.npi_raw (updated 2026-02-09, 9.37M NPIs) + .healthcare_provider_taxonomy_code_set_170 + CMS Medicare Provider and Supplier Taxonomy Crosswalk (2025-10, 565 rows, 1-to-many). H11 v2 methodology — three tiers: (1) exact match on UPPER(TRIM), (2) normalized match that strips business suffixes (LLC/INC/CORP/PC/PA/PLLC/LLP/LTD/CO/COMPANY/THE for Orgs; JR/SR/II–V/MD/DO/PHD/RN/NP/PA-C/FNP-BC/DMD/DDS/DVM/PHARMD for persons), drops non-alphanumeric, collapses whitespace, (3) Jaro-Winkler ≥0.85 via a BQ JS UDF. Practitioner name: 6,893,725/7,139,700 family exact, 6,805,038 normalized full match, 6,990,597 at JW≥0.85, 6,833,522 at JW≥0.95. Organization name: 1,840,638/3,270,089 exact, 2,878,882 normalized, 3,229,845 at JW≥0.85, 3,119,323 at JW≥0.95. H12: NUCC codes on Practitioner.qualification (7,112,042/7,124,017 valid in NUCC v17.0); Medicare codes on PractitionerRole.specialty (3,344,800/3,345,518 valid in the crosswalk). NDH PractitionerRole._specialty_code carries a leading 'NN-' prefix (e.g. '14-50'); stripping recovers the canonical Medicare code. H13 internal: 3,337,053 Practitioner↔Role pairs, 2,862,934 agree via crosswalk. H13 confusion matrix — top 10 inconsistent (Medicare → qualification-NUCC) pairs: C6 (PRACTITIONER - HOSPITALIST) ↔ 207R00000X (Internal Medicine /): 39,150; 80 (PRACTITIONER - CLINICAL SOCIAL WORKER) ↔ 104100000X (Social Worker /): 21,026; 30 (PRACTITIONER - DIAGNOSTIC RADIOLOGY) ↔ 2085R0204X (Radiology / Vascular & Interventional Radiology): 17,599; 29 (PRACTITIONER - PULMONARY DISEASE) ↔ 207RC0200X (Internal Medicine / Critical Care Medicine): 12,819; 08 (PRACTITIONER - FAMILY PRACTICE) ↔ 207P00000X (Emergency Medicine /): 11,904; 68 (PRACTITIONER - CLINICAL PSYCHOLOGIST) ↔ 103T00000X (Psychologist /): 8,637; 06 (PRACTITIONER - CARDIOVASCULAR DISEASE (C) ↔ 207RI0011X (Internal Medicine / Interventional Cardiology): 8,133; 26 (PRACTITIONER - PSYCHIATRY) ↔ 2084P0804X (Psychiatry & Neurology / Child & Adolescent Psychi): 7,084; 05 (PRACTITIONER - ANESTHESIOLOGY) ↔ 390200000X (Student in an Organized Health Care Education/Trai): 6,835; 50 (PRACTITIONER - NURSE PRACTITIONER) ↔ 207Q00000X (Family Medicine /): 6,628. H13 external (v3 — switch-aware): NPPES stores 15 (taxonomy_code, primary_switch) pairs per NPI; exactly one should have switch='Y' (the TRUE primary). Four buckets: • Match NPPES true primary (switch='Y' slot): 6,672,407 (93.66%) • Match any slot: 7,099,905 (99.66%) • Match slot_1 specifically:6,555,738 (92.02%) • Match only a secondary (switch='N'): 427,498 (6.00%) • Disagree entirely (not in any slot): 24,112 (0.34%) Slot-ordering observation: 1,063,861 rows (14.93%) have the NPPES TRUE primary in a slot other than slot_1 — so the prior 'slot_1' proxy for 'primary' was slightly wrong. 0 rows (0.00%) have no switch='Y' at all (NPPES data-quality edge). Known caveats: NPPES vintage 2026-02-09 vs NDH 2026-04-09 — 8-week gap means taxonomy changes in that window show as disagreement; Jaro-Winkler ≥0.85 is a permissive threshold that recovers common variations (whitespace, DBA suffixes, casing) but also accepts some false positives (e.g. 'Smith Medical' vs 'Smith Medicare'); the 0.95 column is the strict signal. v2 upgrade candidates: pinned quarterly NUCC; NPPES secondary-taxonomy match; phonetic fallback (Soundex / Metaphone) for names where JW misses transpositions.