Actuarial and Data Analytics

Providing information and advice on real problems


Leave a comment

Integrating Machine Learning into your Reserve Estimates

Introduction

Two hundred years ago a captain may have had only a sounding line and his experience to navigate through uncharted waters. Today, a captain has access to many other data sources and tools to aid in his navigation, including; paper charts, online charts, engineering surveys, a depth sounder, radar, and GPS. These new tools don’t make the old tools obsolete, but any mariner would be safer and more accurate in their piloting by employing all the tools at their disposal.

In the same vein, actuaries who solely use traditional reserving techniques, such as triangle based methods, aren’t capitalizing on new technologies.  Actuaries should start adopting other techniques such as Machine Learning (ML).  ML is a field of predictive analytics that focuses on ways to automatically learn from the data and improve with experience.  It does so by uncovering insights in the data without being told exactly where to look.

ML is the GPS for actuaries. As GPS improved navigation, ML has the potential to greatly enhance our reserves.  It is important to note though that ML is not just about running algorithms; it is a process. At a high level this process includes defining the problem, gathering data and engineering features from the data, and building and evaluating the models. As in the actuarial control cycle, it is important to continually monitor results.

MLPic1

Through our research, we have found significant improvements in the prediction of reserves by employing this ML process. Overall we have found a reduction in the standard and worst case errors by 10 percent. To assist actuaries in testing the value of ML for themselves, this paper will provide an outline of the ML process.

Define the Problem

Similar to the Actuarial Control Cycle, the first step is to define the problem. In our context, we are interested in efficiently calculating the unpaid claims liability. We want to calculate this quantity in an accurate manner that minimizes the potential variance in the error of our estimate.

Actuaries often use various triangle-based methods such as the Development and the Paid Per Member Per Month (Pd PMPM) to set reserves.  These methods in principle attempt to perform pattern recognition on limited information contained within the triangles.  Although, these methods continue to serve actuaries well, information is being left out that could enhance the overall reserve estimate.  To make up for the lack of information used to estimate the reserves, an actuary relies heavily on his/her judgment.  Although judgment is invaluable, biases and other elements can come into play leading to large variances, and the need for higher reserve margins.

As described in our prior article (Cap, Coulter, & McCoy, 2015), the range of reserve estimate error present in company statements pulled from the Washington State Office of the Insurance Commissioners website, was -10 percent to 40 percent. This represents a wide range of error, and has significant implications. These can include an impact to the insurers rating status, future pricing and forecasting decisions, calculation of Premium Deficiency Reserves, or even unnecessary payout of performance bonuses.

Data and Feature Engineering

Gathering data is something that actuaries are already good at.  Leveraging off their expertise along with other subject matter experts will be helpful in identifying all available sources for use.  There is often a saying with ML that more data often beats a sophisticated algorithm.

Once the data has been gathered it will need to be engineered to improve the predictive power of the model. This is referred to as feature engineering, and can include the transformation, creation, conversion, or other edits/additions to the data that will benefit the process.  As an example, suppose we were estimating the cost of a house with only two fields, the length and the width of the house. We could help improve the model by feature engineering a new feature called square footage, where we would multiply the length and width.

The gathering and engineering of the data can be a difficult stage to get through, and without the right people on the team, it could lead to a wasted effort.  Having domain knowledge on the team enables a more thoughtful consideration of what sources and features are important. In our research we have found many features that have predictive power for reserve setting. The following is a sample list of features that could provide value:

  • Seasonality
  • Number of workdays
  • Check runs in a month
  • Large claims
  • Inventory
  • Inpatient admits/days
  • Membership mix by product
  • Change in duration
  • Cost sharing components
  • Demographics
  • Place of service

Modeling and Evaluating

Once the data has been prepared, the user will apply various ML models to the dataset.  In general, there are two types of data, the first of which is called the training set, and the second the testing set.

The training set is the data that is used to train and cross-validate the model and comprises historical data (in the case of reserving, historical completed data).  The testing data on the other hand includes only the data from which you wish to derive predictions (for example, the current month’s reserve data).

To evaluate the model, a portion of the training set is withheld in order to cross validate the results.  The models that are identified to perform well on the withhold set are then applied to the testing data to create the predictions.

MLPic2

 

There are many different machine learning models, each of which has its own strengths and weaknesses. Thus there is no one model that works best on all problems.

Results

For our research we used supervised learning techniques classified as regression.  We ran various ML models and determined which ones were the most appropriate for the problem based on cross validation techniques.  We then used an ensemble method to blend the various model outputs for an overall prediction.  An example of this type of technique can be found in our prior article (Cap, Coulter, & McCoy, 2015).

 

These results were then compared against typical triangle-based methods, where we tested the percent range of UCL error over 24 different reserving months.  Overall we found that ML added significant value in reserve setting, and we highly encourage reserving teams to explore this process for themselves.

MLPic3

 

Conclusion

Predictive analytics are not new to actuaries.  Methods like these are fairly common on the casualty side and have recently become more popular within healthcare for predicting fraud, readmission, and other aspects.  However, those within healthcare are often being led by Data Science teams, who continue to fill a larger analytics role within the health space.  It is only a matter of time before these techniques become standard to reserving.  The question is who will fill this role, will Actuaries stay at the helm, or will we transfer some of our functions to Data Science teams.

 

We hope that the process outlined above will provide some guidance, and at least prepare the actuary for their first step in this space.

Appendix

MLPic4

 

 

Advertisements


Leave a comment

Simple and Effective Reserve Practices

Introduction

Actuarial judgment is pervasive in our work. In many cases, judgment is a necessary element to our modeling and analysis. Over the past four decades behavioral research has shown that simple linear models can do much better than a human practitioner in many cases (Kahneman & Tversky, 2011; Wacek, 2007).

We present a couple simple but effective reserving techniques that an actuary can add to his or her current reserving practices to produce significant reductions in reserve bias as well as reductions to reserve variance. Aggregating reserve estimates using only actuarial judgment can result in high variance and biased results, which can have consequences in many other areas of your company.

According to the Washington State Office of the Insurance Commissioner’s data, the range of reserve error reported on financial statements for the largest insurance entities for the years 2008 – 2014, was -10% to 40% (Company Annual Statements, n.d.). More importantly, the standard deviation of these errors is 11%. This data supports the possibility of   that actuaries generally believe to exist. Biases in reserve estimates include: over-compensation (when you’ve reserved low one year, you over compensate the next year by reserving way too high); or keeping too much weight on the prior estimates when new information is available; and more. It also indicates that the reserving techniques that are being employed are not very precise. With an 11% margin and an 11% swing, companies can easily see reserve estimates exceeding the final paid claims by up to 40%. This leaves capital in the prior year that could be used to benefit this year. This could  impact the bottom line, distort the company’s profitability over time, adversely affect ratings in the following year, trigger regulatory action, or impact pricing and forecasting models. Under-reserving can have similar effects.  In addition to pricing and forecasting impacts, accruals may be set aside assuming an MLR or other rebates are due, causing inappropriate payments on performance bonuses and bringing additional scrutiny to your department and deteriorate your credibility as the reserving actuary.

The results below are based on a simulation study with 8,000 simulations of claims run-out. The simulations took into account a seasonality component, a benefit change component, and a large claim component. Each of these components were developed with some randomness in each simulation. These simulations show a reduction of 5% variance to the reserve estimates. Unless estimators are completely correlated, these techniques should produce a reduction in variance and a more consistent estimate of the mean. With reduced variance and more accurate predictions, the margins needed could be reduced, resulting in a better estimate of each year’s results.

The remainder of this article will outline the proposed techniques, followed by a high level summary of the simulated data used to illustrate the results.  Note, although we illustrated the results by way of simulation, these techniques have been used in real practice and have shown a significant impact.

Weighting Techniques

The idea is simple – take the various predictions you are already making and weight them in a way that minimizes variance and increases accuracy. This paper will discuss two weighting techniques you can use. However, there are many different ways to calculate the weights. Every reserving actuary is inherently doing this weighting in some fashion, whether it be via a mental algorithm or a more formalized approach.  We advocate using a formalized approach that is testable and avoids potential human biases.  In addition, the proposed formalized approach will tend to discredit reserving methods that perform poorly, focusing on those methods that are more reliable and consistent.  If nothing else, this will give you a better baseline in which to apply judgment.

The following is an example illustrating the outcome from a weighting technique over multiple reserve methods by lag month.

wgt

In this example, we used the weighting technique to combine the Seasonality, Paid Per Member Per Month (PMPM), Development, Inventory, and Trend methods. As you can see each lag differs in the weights applied to each method. In Lag 0, the seasonality method had the highest weight, indicating that it was the “best” model for that lag. However, the seasonality method alone is not the best method. Rather, the weighting given in the above panel minimizes the variance of the estimate, so we would use that weighting for our predictions of Lag 0 claims.

We recommend ongoing monitoring and measurement of any approach used to ensure the intended outcomes and expectations are being met. One of the pitfalls of this more data-driven weighting approach is over-fitting. This is a common pitfall in any estimation or prediction procedure.

Technique 1:  Inverse Variance

Inverse Variance weights each of the reserve methods based on the inverse proportion of error variance when comparing to actuals.  Therefore, lower weights are applied to those methods that have historically produced a larger variation of errors.

This approach is straightforward and simple to implement without having to add any additional features to one’s existing reserve model.  It also avoids any complex calculations, making it easy to explain to others.  On the other hand, this type of approach ignores the correlations between the reserve methods being used and their distance from the target, which could be used to help lower the variance even further. This is why we offer two approaches.

Example:

Suppose you have two methods for reserving, A and B.  Each of these methods has a historical monthly reserve error associated with it (variance of 10 and 20 respectively).  Based on the inverse variance technique, the proposed future weights when developing a projection could be 86% A and 14% B.  This type of back-test has established that A is a better predictor, however the mix of the two methods is still preferred.  This technique provides a systematic approach to choosing a good mix and possibly better starting point prior to applying judgment in your reserve picks going forward.

Historical Experience Method A Method B Actuals
Month 1 150.00 155.00 151.10
Month 2 160.00 145.00 155.20
Month 3 170.00 180.00 172.30
Variance of Monthly Errors 14.44 88.94
Inverse Variance 0.07 0.01
Proposed Future Weights 0.86 0.14

 

After applying the inverse variance against our simulated claims database, using two of the more common reserving methods, we captured the unpaid claim liability estimates for each incurred month.  These estimates were then compared to the actual known liability, and their range of error is illustrated below.  As seen below, the range of error using the Inverse Variance approach reduces the overall range of error when compared to each reserve method independently. However, you can also see that the technique doesn’t improve accuracy significantly.

wgt1

Technique 2:  Linear Regression

The linear regression approach should produce more accurate weightings than the inverse variance approach, but it is far more computationally intensive.  To ensure accuracy, the linear regression technique minimizes the sum of squared prediction errors for all points, penalizing larger errors disproportionately.  On the other hand, the inverse variance focuses on reducing the dispersion of the estimates instead of the size of the error.  In other words, the inverse variance method tends to enhance precision of the estimate, but not necessarily the accuracy.

Example:

Suppose you have two methods used for reserving, A and B.  Each of these methods produced a historical estimate for the month.  If we define A and B as X (a 2 x 3 matrix with A being column 1 and B column 2) and Y being the actuals, we could use the normal equation to solve for the proposed weights (assuming the matrix is invertible).  Below is an example of the equation, where T is the transpose of the matrix and -1 is the inverse.

Applying this to the table below, the proposed future weights for these methods would be 71% A and 29% B (for this particular Lag).

Historical Experience Method A Method B Actuals
Month 1 150.00 155.00 151.10
Month 2 160.00 145.00 155.20
Month 3 170.00 180.00 172.30
Proposed Future Weights 0.71 0.29

 

This type of backtest has established that A is a better predictor, however the mix of the two methods is still preferable.  This technique  provides a systematic approach to choosing a good mix and possibly better starting point prior to applying judgment in your reserve picks going forward.

A similar illustration using linear regression against our simulated claims database can be found below.  As discussed above, accuracy is what sets linear regression apart from the inverse variance approach.  Unlike the previous results, the results here tend to center themselves on zero.

wgt2

Although weprovided an actual example where only two predictors are used, you can include more. Typically, an actuary may have many methods at their disposal, like: the development method, the paid PMPM method, loss ratio methods, trend based methods, seasonality based methods, etc. You can also integrate other variables into the analysis, such as the size of the current claims inventory.  For whatever methods are ultimately chosen, we encourage you to pick methods that are diverse and not well correlated with one another.  We also encourage the methods be consistent and stable over time. At the same time, you should be careful not to over fit your data.

Summary

In the examples outlined above, we presented two high-level techniques to weight existing reserve estimates.  We showed how these techniques can improve your already defined reserving process with little extra work. In addition to the improvement to your estimates, there are two other benefits: the techniques will help the reserving actuary more precisely quantify where and when each reserving method works, and linear regression allows the actuary to integrate stochastic techniques in the calculation of reserve margin. However, there are limitations, and you should be aware of these and use judgment where necessary.

Predictive analytics is the practice of extracting information from existing data to determine patterns and predict future outcomes and trends (Predictive analytics, n.d.). If you don’t use a weighting algorithm to combine your reserve estimates, you probably have a pretty good sense of which of your models performs the best for each lag month. But, the question is by how much. A weighting algorithm trained on real data can give you more precision around which models work better and when.

Predictive analytics is the new catch phrase, but not long ago stochastic analysis was a hot topic. Reserving is certainly a place when more stochastic models can prove beneficial. A Society of Actuaries sponsored report gives a definition of what margin is for IBNR. In math, it is written as:

Probability(Estimate+Margin>95%)>85%

The report also gives the reader a couple of ideas on how to obtain this estimate (Chadick, Campbell, & Knox-Seith, 2009). In this report, they also point you to another Society of Actuaries published report, Statistical Methods for Health Actuaries IBNR Estimates: An Introduction, which outlies some more sophisticated ways to statistically approximate your IBNR (Gamage, Linfield, Ostaszewski, & Siegel, 2007). Using Technique 2 is a great first step in integrating the stochastics into your already defined reserving system.

The idea of combining two or more estimates for better prediction or lower variance is used in many other contexts; it’s called meta-analysis in statistics and ensemble methods in data science, while  in Finance the capital asset pricing model (CAPM) uses an optimal weighting structure.  In any case, they work and can help to reduce the biases that exist in your reserving process.

Appendix

Data and Simulations

Although these techniques have been shown to be successful in practice, the results included in this paper were developed using data from our simulated claim database to avoid the use of actual data in this paper.  The ultimate incurred claims were developed by lag month and include adjustments for changes in claim processing patterns, number of weekly paid claims in a month, benefit design, workday factors, random large claim shocks, seasonality, leveraging, and other factors (which include random noise within each component and overall).

Consistent with actual experience, our simulated examples have shown improved performance when compared to using a single method for reserving.  Although we are not able to simulate judgment, we have seen actual improvement when comparing to our final picks (adjusting for margin and implicit conservatism), but we will leave it to the reader to test their own historical performance and whether these techniques add value (or just a better baseline from which to build their estimates).

In the end, we believe if employed correctly, using various reliable and stable methods that these techniques (particularly regression) can help reduce both the bias and variance in the estimates.

Below are the results obtained from applying these techniques to our claims database.  Roughly 8,000 simulations were generated estimating the ultimate claim liability for a given month.

wgt3

VAR95% represents the point at which 95% of the errors (in absolute terms) fall below.

wgt4

Actual Excel Illustration Below

Example Techniques