Assignment 3 Health Data Review
Q 2: Examining Administrative Claims Data.
Please access and examine the CMS 2008-2010 Data Entrepreneurs Synthetic Public Use File (DE-SynPUF). The CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) are neat datasets to give students experience with healthcare data. Although these data are synthetic, they were created from real Medicare claims data and hence retain a lot of features of the original data.
Moreover, the data samples are very large. Thus, there is a lot of opportunity to query and explore “big data” to some extent. Here are some of the key phrases that CMS uses to describe the data:
“The DE-SynPUF was created with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries’ protected health information. The tables contain five types of data – Beneficiary Summary, Inpatient Claims, Outpatient Claims, Carrier Claims, and Prescription Drug Events.”
Upload a PDF of your answers to the following questions:
For Part 1: The Star Schema
1. What are some practical advantages to using a star schema data model over a relational model? A few are mentioned in the lecture, but challenge yourself to identify others. ( describe at least 3 advantages)
2. What are some of the disadvantages of using a star schema versus a relational model?
3. What criteria might you use to evaluate which approach to use in a given situation?
For Part 2: Examining Administrative Data
1. In 5-6 sentences, describe the data: Why was it originally collected? What were the sources of the data? Who collected it, ie. Doctors, nurses, pharmacy, self-reported, etc. Where was it collected? How is the context identified?
2. What sorts of things could this data measure? How might these measures identify opportunities to improve patient care or healthcare operations? Who would benefit from this information?
3. What are some of the limits to the use of this data?
4. How might you verify and validate this data?
5. How difficult would this data be to map for comparison to other data?
6. What might be some possible issues with these data in contrast to the more rich clinical type of data?