Actuarial and Data Analytics

Providing information and advice on real problems

Leave a comment

Reserve(ML) – Advanced Claim Reserving Techniques

Automate and Improve Your Reserve Estimates With Our BETA Version of Reserve(ML)

Improve your reserve estimates using advanced techniques that go beyond traditional triangle analysis.  Reserve(ML) is a cloud based claim reserving platform tailored for Health Actuaries.  To get a peek of what it can do and see how well it can improve your estimates, I have released a stripped down version for the public.  A more robust version will be available later this year (once we gather all the feedback from our testers).

Prior to releasing the full version to a select few, we want your input on what you find valuable.  As such, we released a high level version to illustrate how it works.  We are looking for feedback, such as what charts, graphs, and data sets are you interested in seeing in your data.

Why Reserve(ML)? 

Actuaries often use various triangle-based methods such as the Development and the Paid Per Member Per Month (PdPMPM) to set reserves. These methods in principle attempt to perform pattern recognition on limited information contained within the triangles.  Even though these methods continue to serve actuaries well, information is being left out that could enhance the overall reserve estimate.

To make up for the lack of information used to estimate the reserves, an actuary relies heavily on his/her judgment. Although judgment is invaluable, biases and other elements can come into play leading to large variances, and the need for higher reserve margins.



Leave a comment

Claim Reserving in R

Introduction to Reserving  in R:

In our last two blogs we discussed how to combine reserve estimates, as well as how to start integrating Machine Learning (ML) into your process.  Hopefully the reader found these interesting and useful.  If you have any comments, questions, or just looking for advice, please feel free to contact us.

With that, to provide a more solid foundation, and give the reader real tools to deploy (instead of just talk), I plan on releasing the code used in our prior articles over the next weeks/months.

For the first release, we will start with the basics (triangle based methods).  Other posts that follow will go over the steps to setup data for ML models, ways to run data through the ML models, and end with an excel package that will allow every Actuary to use ML in excel (without having to work in R).  This will serve as the carrot at the end of the stick!

Since most Actuaries are comfortable in excel, the idea was to create an interface that would allow them to perform all their reserving functions (including some basic ML models) without having to learn how to code in R, Python, or other languages.

However, prior to letting the cat out of the bag, I would like to go over the code piece by piece so the user has a better understanding of the model.  I have no intentions of providing a black box.  I want the user to at least grasp some of the basic concepts prior to using the file.

Starting with the basics:

As noted above, we will start by tackling the basics (traditional triangle based method).  For this specific example we will be covering the Paid Per Member Per Month (Paid PMPM) method.  Although this can be done efficiently in excel, this will give the reader an understanding of the logic used in R.

Let’s begin – Suppose we have a matrix called “data” that contains the three basic features (see below):

Incurred Month Processed Month Paid Claim PMPM
2010-01-01 2010-01-01  xx.xx
2010-01-01 2010-02-01  xx.xx
2010-01-01 2010-03-01  xx.xx
2010-01-01 2010-04-01  xx.xx
2010-01-01 2010-05-01  xx.xx
2010-01-01 2010-06-01  xx.xx
As a primer we need to create two functions called “add.months” and “add.months.v”.  This will allow us to filter the data as we loop through each incurred and paid month.
add.months= function(date,n) seq(date, by = paste (n, “months”), length = 2)[2]
add.months.v= function(date,n) as.Date(sapply(date, add.months, n), origin=”1970-01-01″)
Next we will define a runout and the lookback period.  Runout will reflect the number of months we wish to include (in this case we will assume 24 months of runout is sufficient enough to be complete).  Lookback on the other hand will be the number of recently observed lag PMPMs (based on the time period within the row) that we will use as our estimate.
runout = 24
lookback = 6

We will then create a loop that cycles through all rows within the data.

for (i in 1:dim(data)[1])
And finally, we have the portion that is being looped through.  Here data$PdPMPM is an added column to our initial data in which we will drop our results (and i is the row within the data).
{data$PdPMPM[i]=sum(data[data$Pdmo >=add.months.v(data$Pdmo[i],-(lookback-1)) & data$Pdmo<=data$Pdmo[i] & data$Lag > data$Lag[i] & data$Lag <=runout,]$Paid)/lookback
  +  sum(data[data$Incmo==data$Incmo[i] & data$Lag <= min(data$Lag[i],runout),]$Paid)}
 In short, we are cycling through all rows and trying to complete that row based on the incurred and paid month.
I have attached an excel sheet with dummy data illustrating and comparing the calculation in excel and R.
Below is the link to the excel file!




Leave a comment

Simple and Effective Reserve Practices


Actuarial judgment is pervasive in our work. In many cases, judgment is a necessary element to our modeling and analysis. Over the past four decades behavioral research has shown that simple linear models can do much better than a human practitioner in many cases (Kahneman & Tversky, 2011; Wacek, 2007).

We present a couple simple but effective reserving techniques that an actuary can add to his or her current reserving practices to produce significant reductions in reserve bias as well as reductions to reserve variance. Aggregating reserve estimates using only actuarial judgment can result in high variance and biased results, which can have consequences in many other areas of your company.

According to the Washington State Office of the Insurance Commissioner’s data, the range of reserve error reported on financial statements for the largest insurance entities for the years 2008 – 2014, was -10% to 40% (Company Annual Statements, n.d.). More importantly, the standard deviation of these errors is 11%. This data supports the possibility of   that actuaries generally believe to exist. Biases in reserve estimates include: over-compensation (when you’ve reserved low one year, you over compensate the next year by reserving way too high); or keeping too much weight on the prior estimates when new information is available; and more. It also indicates that the reserving techniques that are being employed are not very precise. With an 11% margin and an 11% swing, companies can easily see reserve estimates exceeding the final paid claims by up to 40%. This leaves capital in the prior year that could be used to benefit this year. This could  impact the bottom line, distort the company’s profitability over time, adversely affect ratings in the following year, trigger regulatory action, or impact pricing and forecasting models. Under-reserving can have similar effects.  In addition to pricing and forecasting impacts, accruals may be set aside assuming an MLR or other rebates are due, causing inappropriate payments on performance bonuses and bringing additional scrutiny to your department and deteriorate your credibility as the reserving actuary.

The results below are based on a simulation study with 8,000 simulations of claims run-out. The simulations took into account a seasonality component, a benefit change component, and a large claim component. Each of these components were developed with some randomness in each simulation. These simulations show a reduction of 5% variance to the reserve estimates. Unless estimators are completely correlated, these techniques should produce a reduction in variance and a more consistent estimate of the mean. With reduced variance and more accurate predictions, the margins needed could be reduced, resulting in a better estimate of each year’s results.

The remainder of this article will outline the proposed techniques, followed by a high level summary of the simulated data used to illustrate the results.  Note, although we illustrated the results by way of simulation, these techniques have been used in real practice and have shown a significant impact.

Weighting Techniques

The idea is simple – take the various predictions you are already making and weight them in a way that minimizes variance and increases accuracy. This paper will discuss two weighting techniques you can use. However, there are many different ways to calculate the weights. Every reserving actuary is inherently doing this weighting in some fashion, whether it be via a mental algorithm or a more formalized approach.  We advocate using a formalized approach that is testable and avoids potential human biases.  In addition, the proposed formalized approach will tend to discredit reserving methods that perform poorly, focusing on those methods that are more reliable and consistent.  If nothing else, this will give you a better baseline in which to apply judgment.

The following is an example illustrating the outcome from a weighting technique over multiple reserve methods by lag month.


In this example, we used the weighting technique to combine the Seasonality, Paid Per Member Per Month (PMPM), Development, Inventory, and Trend methods. As you can see each lag differs in the weights applied to each method. In Lag 0, the seasonality method had the highest weight, indicating that it was the “best” model for that lag. However, the seasonality method alone is not the best method. Rather, the weighting given in the above panel minimizes the variance of the estimate, so we would use that weighting for our predictions of Lag 0 claims.

We recommend ongoing monitoring and measurement of any approach used to ensure the intended outcomes and expectations are being met. One of the pitfalls of this more data-driven weighting approach is over-fitting. This is a common pitfall in any estimation or prediction procedure.

Technique 1:  Inverse Variance

Inverse Variance weights each of the reserve methods based on the inverse proportion of error variance when comparing to actuals.  Therefore, lower weights are applied to those methods that have historically produced a larger variation of errors.

This approach is straightforward and simple to implement without having to add any additional features to one’s existing reserve model.  It also avoids any complex calculations, making it easy to explain to others.  On the other hand, this type of approach ignores the correlations between the reserve methods being used and their distance from the target, which could be used to help lower the variance even further. This is why we offer two approaches.


Suppose you have two methods for reserving, A and B.  Each of these methods has a historical monthly reserve error associated with it (variance of 10 and 20 respectively).  Based on the inverse variance technique, the proposed future weights when developing a projection could be 86% A and 14% B.  This type of back-test has established that A is a better predictor, however the mix of the two methods is still preferred.  This technique provides a systematic approach to choosing a good mix and possibly better starting point prior to applying judgment in your reserve picks going forward.

Historical Experience Method A Method B Actuals
Month 1 150.00 155.00 151.10
Month 2 160.00 145.00 155.20
Month 3 170.00 180.00 172.30
Variance of Monthly Errors 14.44 88.94
Inverse Variance 0.07 0.01
Proposed Future Weights 0.86 0.14


After applying the inverse variance against our simulated claims database, using two of the more common reserving methods, we captured the unpaid claim liability estimates for each incurred month.  These estimates were then compared to the actual known liability, and their range of error is illustrated below.  As seen below, the range of error using the Inverse Variance approach reduces the overall range of error when compared to each reserve method independently. However, you can also see that the technique doesn’t improve accuracy significantly.


Technique 2:  Linear Regression

The linear regression approach should produce more accurate weightings than the inverse variance approach, but it is far more computationally intensive.  To ensure accuracy, the linear regression technique minimizes the sum of squared prediction errors for all points, penalizing larger errors disproportionately.  On the other hand, the inverse variance focuses on reducing the dispersion of the estimates instead of the size of the error.  In other words, the inverse variance method tends to enhance precision of the estimate, but not necessarily the accuracy.


Suppose you have two methods used for reserving, A and B.  Each of these methods produced a historical estimate for the month.  If we define A and B as X (a 2 x 3 matrix with A being column 1 and B column 2) and Y being the actuals, we could use the normal equation to solve for the proposed weights (assuming the matrix is invertible).  Below is an example of the equation, where T is the transpose of the matrix and -1 is the inverse.

Applying this to the table below, the proposed future weights for these methods would be 71% A and 29% B (for this particular Lag).

Historical Experience Method A Method B Actuals
Month 1 150.00 155.00 151.10
Month 2 160.00 145.00 155.20
Month 3 170.00 180.00 172.30
Proposed Future Weights 0.71 0.29


This type of backtest has established that A is a better predictor, however the mix of the two methods is still preferable.  This technique  provides a systematic approach to choosing a good mix and possibly better starting point prior to applying judgment in your reserve picks going forward.

A similar illustration using linear regression against our simulated claims database can be found below.  As discussed above, accuracy is what sets linear regression apart from the inverse variance approach.  Unlike the previous results, the results here tend to center themselves on zero.


Although weprovided an actual example where only two predictors are used, you can include more. Typically, an actuary may have many methods at their disposal, like: the development method, the paid PMPM method, loss ratio methods, trend based methods, seasonality based methods, etc. You can also integrate other variables into the analysis, such as the size of the current claims inventory.  For whatever methods are ultimately chosen, we encourage you to pick methods that are diverse and not well correlated with one another.  We also encourage the methods be consistent and stable over time. At the same time, you should be careful not to over fit your data.


In the examples outlined above, we presented two high-level techniques to weight existing reserve estimates.  We showed how these techniques can improve your already defined reserving process with little extra work. In addition to the improvement to your estimates, there are two other benefits: the techniques will help the reserving actuary more precisely quantify where and when each reserving method works, and linear regression allows the actuary to integrate stochastic techniques in the calculation of reserve margin. However, there are limitations, and you should be aware of these and use judgment where necessary.

Predictive analytics is the practice of extracting information from existing data to determine patterns and predict future outcomes and trends (Predictive analytics, n.d.). If you don’t use a weighting algorithm to combine your reserve estimates, you probably have a pretty good sense of which of your models performs the best for each lag month. But, the question is by how much. A weighting algorithm trained on real data can give you more precision around which models work better and when.

Predictive analytics is the new catch phrase, but not long ago stochastic analysis was a hot topic. Reserving is certainly a place when more stochastic models can prove beneficial. A Society of Actuaries sponsored report gives a definition of what margin is for IBNR. In math, it is written as:


The report also gives the reader a couple of ideas on how to obtain this estimate (Chadick, Campbell, & Knox-Seith, 2009). In this report, they also point you to another Society of Actuaries published report, Statistical Methods for Health Actuaries IBNR Estimates: An Introduction, which outlies some more sophisticated ways to statistically approximate your IBNR (Gamage, Linfield, Ostaszewski, & Siegel, 2007). Using Technique 2 is a great first step in integrating the stochastics into your already defined reserving system.

The idea of combining two or more estimates for better prediction or lower variance is used in many other contexts; it’s called meta-analysis in statistics and ensemble methods in data science, while  in Finance the capital asset pricing model (CAPM) uses an optimal weighting structure.  In any case, they work and can help to reduce the biases that exist in your reserving process.


Data and Simulations

Although these techniques have been shown to be successful in practice, the results included in this paper were developed using data from our simulated claim database to avoid the use of actual data in this paper.  The ultimate incurred claims were developed by lag month and include adjustments for changes in claim processing patterns, number of weekly paid claims in a month, benefit design, workday factors, random large claim shocks, seasonality, leveraging, and other factors (which include random noise within each component and overall).

Consistent with actual experience, our simulated examples have shown improved performance when compared to using a single method for reserving.  Although we are not able to simulate judgment, we have seen actual improvement when comparing to our final picks (adjusting for margin and implicit conservatism), but we will leave it to the reader to test their own historical performance and whether these techniques add value (or just a better baseline from which to build their estimates).

In the end, we believe if employed correctly, using various reliable and stable methods that these techniques (particularly regression) can help reduce both the bias and variance in the estimates.

Below are the results obtained from applying these techniques to our claims database.  Roughly 8,000 simulations were generated estimating the ultimate claim liability for a given month.


VAR95% represents the point at which 95% of the errors (in absolute terms) fall below.


Actual Excel Illustration Below

Example Techniques