{
  "slug": "npi-taxonomy-correctness",
  "title": "NPI and taxonomy correctness",
  "hypotheses": [
    "H9",
    "H10",
    "H11",
    "H12",
    "H13"
  ],
  "status": "published",
  "release_date": "2026-04-09",
  "generated_at": "2026-04-21T17:43:05+00:00",
  "methodology_version": "0.6.0-draft",
  "commit_sha": "pending",
  "headline": "95.72% of 10.9M NDH NPIs clear NPPES (0.79% ghost, 3.49% deactivated). Practitioner name agreement: 94.9% exact \u2192 95.3% normalized \u2192 97.9% Jaro-Winkler \u22650.85. Organization name: 56.3% exact \u2192 88.0% normalized \u2192 98.8% Jaro-Winkler \u22650.85 (closes the 44-point exact-match gap to 1pp). NDH carries NUCC on Practitioner.qualification (99.83% valid) AND Medicare Specialty codes on PractitionerRole.specialty (99.98% valid against the CMS-published crosswalk). Internal cross-system consistency: 85.8% of 3.3M Practitioner\u2194Role pairs agree via the crosswalk. External NUCC agreement NDH\u2194NPPES: 93.7% match NPPES's switch='Y' TRUE primary, 99.7% match any of the 15 slots, 6.0% match only a secondary. Slot_1 is NOT always the true primary (14.93% of rows).",
  "numerator": 2862934,
  "denominator": 3337053,
  "chart": {
    "type": "bar",
    "unit": "percent",
    "data": [
      {
        "label": "H10 NPPES match OK",
        "value": 95.72
      },
      {
        "label": "H10 not in NPPES",
        "value": 0.789
      },
      {
        "label": "H10 deactivated in NPPES",
        "value": 3.49
      },
      {
        "label": "H11 Prac exact",
        "value": 94.9
      },
      {
        "label": "H11 Prac normalized",
        "value": 95.3
      },
      {
        "label": "H11 Prac JW \u22650.85",
        "value": 97.9
      },
      {
        "label": "H11 Org exact",
        "value": 56.3
      },
      {
        "label": "H11 Org normalized",
        "value": 88.0
      },
      {
        "label": "H11 Org JW \u22650.85",
        "value": 98.8
      },
      {
        "label": "H12 NUCC valid",
        "value": 99.83
      },
      {
        "label": "H12 CMS code valid",
        "value": 99.98
      },
      {
        "label": "H13 internal crosswalk",
        "value": 85.8
      },
      {
        "label": "H13 NDH\u2194NPPES slot 1",
        "value": 92.0
      },
      {
        "label": "H13 NDH\u2194NPPES true primary",
        "value": 93.7
      },
      {
        "label": "H13 NDH\u2194NPPES any of 15",
        "value": 99.7
      }
    ]
  },
  "notes": "Source: bigquery-public-data.nppes.npi_raw (updated 2026-02-09, 9.37M NPIs) + .healthcare_provider_taxonomy_code_set_170 + CMS Medicare Provider and Supplier Taxonomy Crosswalk (2025-10, 565 rows, 1-to-many). H11 v2 methodology \u2014 three tiers: (1) exact match on UPPER(TRIM), (2) normalized match that strips business suffixes (LLC/INC/CORP/PC/PA/PLLC/LLP/LTD/CO/COMPANY/THE for Orgs; JR/SR/II\u2013V/MD/DO/PHD/RN/NP/PA-C/FNP-BC/DMD/DDS/DVM/PHARMD for persons), drops non-alphanumeric, collapses whitespace, (3) Jaro-Winkler \u22650.85 via a BQ JS UDF. Practitioner name: 6,893,725/7,139,700 family exact, 6,805,038 normalized full match, 6,990,597 at JW\u22650.85, 6,833,522 at JW\u22650.95. Organization name: 1,840,638/3,270,089 exact, 2,878,882 normalized, 3,229,845 at JW\u22650.85, 3,119,323 at JW\u22650.95. H12: NUCC codes on Practitioner.qualification (7,112,042/7,124,017 valid in NUCC v17.0); Medicare codes on PractitionerRole.specialty (3,344,800/3,345,518 valid in the crosswalk). NDH PractitionerRole._specialty_code carries a leading 'NN-' prefix (e.g. '14-50'); stripping recovers the canonical Medicare code. H13 internal: 3,337,053 Practitioner\u2194Role pairs, 2,862,934 agree via crosswalk. H13 confusion matrix \u2014 top 10 inconsistent (Medicare \u2192 qualification-NUCC) pairs: C6 (PRACTITIONER - HOSPITALIST) \u2194 207R00000X (Internal Medicine /): 39,150; 80 (PRACTITIONER - CLINICAL SOCIAL WORKER) \u2194 104100000X (Social Worker /): 21,026; 30 (PRACTITIONER - DIAGNOSTIC RADIOLOGY) \u2194 2085R0204X (Radiology / Vascular & Interventional Radiology): 17,599; 29 (PRACTITIONER - PULMONARY DISEASE) \u2194 207RC0200X (Internal Medicine / Critical Care Medicine): 12,819; 08 (PRACTITIONER - FAMILY PRACTICE) \u2194 207P00000X (Emergency Medicine /): 11,904; 68 (PRACTITIONER - CLINICAL PSYCHOLOGIST) \u2194 103T00000X (Psychologist /): 8,637; 06 (PRACTITIONER - CARDIOVASCULAR DISEASE (C) \u2194 207RI0011X (Internal Medicine / Interventional Cardiology): 8,133; 26 (PRACTITIONER - PSYCHIATRY) \u2194 2084P0804X (Psychiatry & Neurology / Child & Adolescent Psychi): 7,084; 05 (PRACTITIONER - ANESTHESIOLOGY) \u2194 390200000X (Student in an Organized Health Care Education/Trai): 6,835; 50 (PRACTITIONER - NURSE PRACTITIONER) \u2194 207Q00000X (Family Medicine /): 6,628. H13 external (v3 \u2014 switch-aware): NPPES stores 15 (taxonomy_code, primary_switch) pairs per NPI; exactly one should have switch='Y' (the TRUE primary). Four buckets:\n  \u2022 Match NPPES true primary (switch='Y' slot): 6,672,407 (93.66%)\n  \u2022 Match any slot:           7,099,905 (99.66%)\n  \u2022 Match slot_1 specifically:6,555,738 (92.02%)\n  \u2022 Match only a secondary (switch='N'): 427,498 (6.00%)\n  \u2022 Disagree entirely (not in any slot): 24,112 (0.34%)\nSlot-ordering observation: 1,063,861 rows (14.93%) have the NPPES TRUE primary in a slot other than slot_1 \u2014 so the prior 'slot_1' proxy for 'primary' was slightly wrong. 0 rows (0.00%) have no switch='Y' at all (NPPES data-quality edge). Known caveats: NPPES vintage 2026-02-09 vs NDH 2026-04-09 \u2014 8-week gap means taxonomy changes in that window show as disagreement; Jaro-Winkler \u22650.85 is a permissive threshold that recovers common variations (whitespace, DBA suffixes, casing) but also accepts some false positives (e.g. 'Smith Medical' vs 'Smith Medicare'); the 0.95 column is the strict signal. v2 upgrade candidates: pinned quarterly NUCC; NPPES secondary-taxonomy match; phonetic fallback (Soundex / Metaphone) for names where JW misses transpositions."
}
