the main dissemination file is a monster - currently weighing in at a beefy four hundred megs. the structure is one record per covered entity, so don't mindlessly run these scripts and think you've got one record per doctor. even after you've eliminated all of the organizations like hospitals and community health centers, you'll still be left with nurse practitioners, respiratory therapists, and other non-physician medical providers. if you prefer to learn about the contents of your data sets amidst a jungle of stock photos, read the cms npi handbook. this microdata seems most useful to tally or geocode medical specialists by geographic area, but perhaps you can conjure up cool new uses. so long as you've followed my monetdb setup instructions, the code below will grab the latest file and import everything seamlessly. after that, you can follow my examples and pull specific columns directly into active memory - as always, doing so will not overload a computer with at least four gigabytes of ram. from there, maybe follow my merge example or construct your own with some missouri census data center (mcdc) geographic files. whatever you do, promise me you'll do it well. this new github repository contains three scripts:
download and import.R
- determine and then download the most current npi data file
- initiate a monet database in the current working directory
- import the nppes into monetdb
- create a well-documented block of code to re-initiate the monetdb server in the future
merge taxonomy ids.R
- construct a multi-level table of medical provider taxonomy ids
- initiate the same monetdb server instance, using the same well-documented block of code as above
- pull a subset of columns - a skinny file - directly into working memory
- merge the nppes with taxonomy id codes, then run a quick crosstab or two
replicate cms state counts.R
- initiate the same monetdb server instance, using the same well-documented block of code as above
- pull a variety of columns into working memory using both dbGetQuery and monet.frame techniques
- construct a table that - in may of 2013 - almost precisely matched this example output from cms and comes close to this enumeration report from their data contractor
click here to view these three scripts
for more detail about the national plan and provider enumeration system, visit:
- the centers for medicare and medicaid services' npi homepage
- the health resources and services administration's workforce data sources comparison, or perhaps fast-forward to the npi page
- the cms frequently asked questions topic page on hipaa simplification
notes:
if your analysis won't be compromised by using county-level instead of provider-level data, also consider hrsa's area resource file (arf). the health workforce statistics in the arf come from the american medical association's physician masterfile. unfortunately, the ama masterfile is not publicly-available, so if your budget is zero dollars, your choices are the nppes (less detail at the individual-level) and the arf (more detail, but aggregated to the county-level). here's what the director of the national center for health workforce analysis told me:
We
use the AMA Masterfile for the ARF (and most of our studies involving
physicians) because it has extensive data on each physician, including
demographic, education/training and practice information. While there
are a number of shortcomings of the MF, it is one of the best sources of
data available nationally. We use the NPI data from some professions
where we don’t have a solid source of national data. While all
practitioners who bill Medicare, Medicaid or private insurers should
have an NPI, in the case of physicians, the NPI does not have the same
depth of data as the AMA MF.
If you have not seen it, I recommend our new Compendium of Federal Data Sources to Support Health Workforce Analysis on our web site that describes 19 sources of data that can be used for health workforce analysis. It describes the data source, how it can be used and accessed and guidance on potential use.
If you have not seen it, I recommend our new Compendium of Federal Data Sources to Support Health Workforce Analysis on our web site that describes 19 sources of data that can be used for health workforce analysis. It describes the data source, how it can be used and accessed and guidance on potential use.
one more trade-off: the nppes is never more than a month old.
confidential to sas, spss, stata, and sudaan users: you are working with the larry, curly, moe, and shemp of statistical languages. time to transition to r. :D