This task requires you to fit a Markov chain model to simulated insurance claims data. The data are in the file ‘Classification Scheme Data.csv’ (posted on Blackboard).
The Mastodon Insurance Company studies a cohort of 600 drivers, who were all below 25 years old at the start of the study. In each year the number of claims made by every driver was noted.
Mastodon operates a classification scheme with six discount levels from level 0 (no discount, ie the driver pays full premium) to level 5 (50% discount), with a 10% increase in discount at each step. A policyholder who makes no claims in a year moves up one level (unless already at level 5); a policyholder who makes 1 or more claims moves down one level (unless already at level 0).
Before the study began, the drivers were categorised using variables such as age, gender, zip code, and miles driven per year. The categories reflect Mastodon’s expectation of the level of risk associated with that driver:
• Category A — very low risk, ie the best drivers
• Category B — low risk
• Category C — medium risk
• Category D — high risk, ie the worst drivers