Predictive modelling – a view from New York
By Jason Bowman, Founder & CEO, CumPane Solutions LLC
On 19 January 2019, the New York State Department for Financial Services (DFS) issued a circular letter1 concerning the use of external consumer data and information sources for life insurance underwriting. This followed a prior notice sent to insurers that the Department was investigating the use of such data for potentially unfair or discriminatory practices.
In the US the deployment of predictive models for life insurance has grown exponentially over the past five years, providing carriers with an opportunity to break out of the preferred underwriting ‘strait-jacket’ that has required anyone applying for life insurance over relatively low levels (around $100,000) to undergo a medical exam with blood and urine testing. Whilst preferred underwriting has enabled the industry to offer some of the most affordable life insurance in the world, it has conversely created an expensive and time-consuming underwriting process that has become an inhibitor for some consumers to purchase cover.
Until relatively recently, life insurers would only offer products without exam and lab testing at their peril. Known as ‘simplified issue’ products, these had a reputation for attracting the customers who would not qualify for preferred (or even standard) rate classes, leading to increased mortality and volatility. This caused companies to increase the price of such products which created a vicious circle, making them only appealing for less healthy lives. Over the past decade, some improvement was achieved through the use of database checks, such as prescription records and driving and insurance history. However a significant gap remained between simplified issue and fully underwritten products.
Enter predictive models, as a revolutionary solution to the problem. These were developed using a range of data sources that could be accessed in real-time. When matched with mortality records (usually the Social Security Master Death file) a model could be developed to predict the likely mortality for a consumer at the point of application. Initially, most models were custom-built by carriers using existing policyholder records combined with publicly available external data. Whilst these models give the carrier a high degree of control, they are expensive to develop and take many years to prove their accuracy. A big step forward came as companies with access to large quantities of data began to develop predictive solutions that could be deployed by carriers relatively easily. This has led to the emergence of ‘accelerated underwriting’ in which carriers can offer products at a fully underwritten preferred rate but without the invasive testing, provided the applicant has a good score from the model.
In theory this is all good news for consumers and the industry, but there are concerns that this approach could change the basic principles of risk assessment and lead to unfair or seemingly arbitrary underwriting decisions. In its circular the DFS raises two main issues with the use of data for underwriting:
- The models may be discriminatory against certain protected groups.
- There can be a lack of transparency in providing the customer with a reason for the outcome.
On the discrimination side, Insurance Law Article 26 prohibits the use of race, colour, creed, national origin and other factors for underwriting. This is well known and would certainly be taken into account in the development of predictive models. However, the uncertainly comes where other data that might be used in the model is closely correlated with one of these fields. For example, the use of zip code for a model is likely to correlate with certain national origin groups. Detecting these biases when developing the model can become highly complex2. Nevertheless, companies need to look carefully at the source data and extensively test the models to try and eliminate biases wherever possible. The DFS highlights its right to audit and examine these models and its attention will be focused closely on potentially discriminatory biases.
Regarding transparency, carriers that have deployed such models will already be aware of the challenges in explaining the outcome to applicants. The initial models tended to be pure ‘black boxes’ whose results could not be explained in any way, mostly due to the complexity of the models but also attributed to vendors wanting to protect their proprietary algorithms. More recent versions have added some level of explanation, including reason codes that list the main factors that went into a specific decision for an applicant. Some carriers have gone to the step of using rules to alter the outcome of the model based on the reason codes, applying a ‘sanity check’ to ensure the results can be justified; for example, if the top reason code is that the applicant did not have an email address, this might be discounted in some way. This highlights the disparity between traditional and algorithmic underwriting and the challenges companies are facing in trying to straddle the two approaches. Should you use the outcome of the model blindly or somehow look to align it with the traditional approach?
How the DFS moves ahead with implementation of these guidelines will be closely watched across the industry. Whether other states follow with similar action is also a keen area of interest. New York certainly has the most extensive regulations which have already led some companies to decide not to write business there. Now it is even more likely that carriers rolling out new accelerated programs will adopt a ‘wait and see’ approach and hold back from filing in New York until the stance is tested.
The momentum with machine learning underpinned by algorithmic models is certain to continue and similar debates will be played out across many industries. For this reason, the DFS circulation is unlikely to stop the deployment of these models. Nevertheless, it will bring a new level of scrutiny from carriers’ compliance teams as companies move forward with these solutions.
- Datta et al. Proxy Discrimination in Data-Driven Systems. https://arxiv.org/pdf/1707.08120.pdf
Cumpane Solutions: https://www.cumpanesolutions.com/