Skip to main content

In our first article we looked at some of the basic terminology used around the umbrella term artificial intelligence (AI). In our second we take a deeper dive into an area which we think has huge potential in underwriting and claims. This is the world of optical character recognition (OCR), natural language processing (NLP) and text mining. As we have said previously, the amount of data and indeed data sources available to underwriters is expanding rapidly. To make sense of this information, underwriters and claims technicians need appropriate techniques to use the data: sifting through what there is, establishing what is useful and then assimilating it alongside traditional data.

So let’s again start with some basic definitions.

OCR uses machine learning (ML). It is the automatic conversion of images of printed or hand-written text into machine text from a scanned document, photograph or PDF.

NLP, again part of the ML continuum, is defined as the automated manipulation of natural language in speech or text. Or, put more simply, it enables computers to read text, hear speech and interpret it. When you ask Alexa to play a song, software processes the speech, retrieves that song from your library and then plays it for you.

In NLP, the key to understanding the significance of a word or phrase is to place it in context. For example, the underwriter’s interest in the mention of asthma on a physician’s report is altered by whether the reference is to ‘asthma in childhood’ or to ‘severe asthma attacks’. Consider too ‘major heart attack’ versus ‘heart attack excluded’.

Another term often used in conjunction with NLP is ‘text mining’ or ‘text data mining’, which categorises groups of text using automatic processes and so derives high-quality data from it.

All of this has a number of uses in underwriting.

In our first article we mentioned the explosion of new and expanded data sources that are becoming available to the underwriter. Some of the newer kids on the block include motor vehicle records, prescription histories, credit and other financial data, and electronic health records (EHRs) from personal physicians, and hospitals. How does the underwriter avoid drowning in a sea of data? It’s important to remove the background noise and focus on what is useful.

The answer, of course, is technology such as OCR and NLP. At the moment NLP and OCR are converging and are starting to be used in combination.

The most obvious application in the underwriting world is the automatic extraction of data from unstructured sources (such as the application form, laboratory test results and physician statement information) in an efficient way and feeding it into the underwriting decision-making process.

Digital health data contains both structured and unstructured data, and of course the unstructured data will contain useful information. The structured data may well contain a code that records a diagnosis, but the unstructured data may be crucial to understanding significance – for example severity; as we explained before, context is vital to understanding. But it can be a real challenge, for example in mental illness in which just a diagnosis code may not really convey much.

Trawling through related unstructured data may be time-consuming but may well yield information that is not covered in the structured data. But there are a number of tools now available that can help with extracting, understanding and contextualising unstructured documents using state-of-the-art NLP models. At a very high level, this process might look something like this:

  1. Read and extract relevant terms.
  2. Normalise those terms, removing redundancies.
  3. Put the terms into some sort of context.
  4. Identify what is significant and what is not.
  5. Organise the data to make it easy (or easier) for a human (or a clever machine) to process it and understand the impact in terms of risk.

Again, maybe in a slightly simplistic way, this describes the case for putting OCR and NLP to work in order to summarise an applicant’s health information and place it into some sort of underwriters’ dashboard.

In our opinion the road to success in the use of AI is to create a series of building blocks which can be used to support either automated underwriting systems or the work of the human underwriter. We can’t see AI completely replacing the experienced underwriter any time soon but the machine can certainly a) help the human make better and faster decisions, and b) increase the power of automated underwriting, pushing up straight-through processing rates.

The trick with AI is not to try and solve all the problems in one go – to ‘boil the ocean’ as it were – but rather create incremental success based on steps which achieve the ultimate goal. One example might be the automated processing of EHRs by an underwriting engine. Do you have the system ‘ingest’ the content of your underwriting manual with the goal of eliminating human intervention, or do you work in steps to summarise physician’s statements, eliminating boring and lengthy processes that are prone to error, and create something of real value in the underwriting process? Arguably this could be one of a series of building blocks to the ‘holy grail’ of automating the underwriting of the EHR.

AI can process text but that is not the same as understanding what it means. So an AI system could digest an on-line encyclopaedia and win a quiz show,1 but digesting a manual and then being able to understand the underwriting process and make risk decisions is another matter. Computers are great at processing repetitive tasks much quickly and more accurately than human beings. Replicating the decision-making process – that is second nature to an experienced underwriter – is a whole different ball game.

So, the computer, which doesn’t get bored, is quite happy wading through a 300-page case file. OCR and NLP together can sift through these and serve up to the underwriter what it thinks is important, thus saving the human time. The computer, whilst good, at the moment remains a tool to cope with the vast amounts of data being thrown at the human underwriter. For now, the expert human insight remains exactly that.

  1. In 2011 IBM’s ‘Watson’ question-answering computing system won the American quiz show ‘Jeopardy!’ against the reigning (human) champions.