Clearing the Crystal Ball of Car Insurance with TensorFlow and AWS SageMaker

30/01/20, by Gavin Hamill

AWS SageMaker bridges usability gap in TensorFlow

Machine Learning (ML) is a contemporary technique which can give competitive advantage to a wide range of organisations; this article discusses the setup at one of Mayara’s clients who are making use of an ML model in order to generate quotations for car insurance.

Codeoscopic are working with Spain’s largest insurance brokers and insurance companies to help them modernise their approach to historical quotation data; they reached out to Mayara to take advantage of our previous experience of working with both TensorFlow and AWS SageMaker. This functionality is integrated in their tool called Versus.


Insights Inside

Versus users can easily estimate the value of their client portfolio in the market. They can either do a per specific case estimation or a full batch estimation.

Using this tool an insurance company can know which price would offer the competency for a specific client, and make a decision about it.

When a customer requests a quotation for car insurance, the broker will call the API endpoints of multiple underwriters with details of the vehicle and driver to be insured. As each underwriter assigns different levels of risk to factors such as engine performance, age of driver and the car’s overnight location, this results in a range of quotations being returned for the same input data.

The broker had been running analyses on this historical data using quite traditional means through spreadsheets which were error-prone and clumsy. In order to deliver better insights, the broker recognised that a great wealth of information could be extracted from the data if they applied modern methods.

Enter TensorFlow

The study of Machine Learning has been active in academic scenarios for many decades, but due to the large computing requirement it has taken until the 2010s before ‘Deep Learning’ using neural networks was within reach of regular businesses and end users.

Google originally wrote the TensorFlow library to solve in-house problems and released it to the world in late 2015. Early adoption was difficult due to the steep learning curve of new concepts, many of which were completely alien to business people and their IT staff.

The incentive for businesses to experiment with Machine Learning was to gain competitive advantage in whichever markets they operated in; if you can understand the needs of your customers better, you will surely be in a better position to serve them more effectively.

Going Deep

The source data consits of CSV files each with over 60 fields, and the approach we are using is a form of regression analysis using a deep neural network (DNN). A greatly simplified way of thinking about this is that the DNN becomes a ‘black box’ where you ask it questions and it will provide answers according to the model that it’s using. That model - the fabric of the DNN itself - must be generated before it can be used to make predictions.

An empty DNN must be instructed what it’s supposed to do - in our case we are using supervised learning; this technique requires a set of training data where each item contains an output label - in this case, the output label is the price returned by one underwriter for one car insurance quotation. In this way, the DNN builds a model that given sets of input data will lead to a given price. This kind of work is highly suitable for a DNN and would be unthinkable for humans to undertake.

A simplified view of a neural network with a single hidden layer

For the broker, the input data set consists of over 25 million rows of quotations recorded over several years in their existing system, and once we have established that we can successfully create a model on our local laptop using a tiny subset of the full data set, we are ready to go big. Indeed, creation of the local model also uses the same code as is used for cloud deployment as described below.

SageMaker for speed and ease

AWS SageMaker is inherently compatible with TensorFlow and understands the model format being used. We can easily invoke SageMaker to run a training job using this code:

import boto3
import os
import time
import csv
import logging
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker import get_execution_role
from sagemaker.session import Session
# S3 bucket for saving code and model artifacts is sagemaker_bucket
sagemaker_bucket = 'source-bucket'

# Location to save your custom code in tar.gz format.
custom_code_upload_location = 's3://{}/customcode/tensorflow_insurance_intelligence'.format(sagemaker_bucket)

# Location where results of model training are saved.
model_artifacts_location = 's3://{}/artifacts/'.format(sagemaker_bucket)

#IAM execution role that gives SageMaker access to resources in your AWS account. This gets the role that the notebook instance uses.
#role = get_execution_role()
role = 'arn:aws:iam::123412341234:role/service-role/AmazonSageMaker-ExecutionRole'

#The train instance type, if set to 'local', then a local docker container will act as a Sagemaker instance
train_instance_type = 'ml.c5.2xlarge'

region = boto3.Session().region_name
insurance_estimator = TensorFlow(entry_point='',
                            train_instance_count=4 #Must be 1 if train_instance_type == 'local'
                                    'enabled': False
                            tags = [{'Key' : 'Project', 'Value' : 'insurance_dnn_regressor'},{'Key' : 'TensorBoard', 'Value' : 'dist'}],
                                'hidden_units': '500,250,125,70,35,15',
                                'batch_size': 250,
                                'epochs': 2,
                                'training_dir': '/opt/ml/input/data/training',
                                'local_model_dir': '/opt/ml/model'
print ("insurance estimator loaded")

input_data = sagemaker.s3_input(s3_data='s3://{}/inputs/combined_small'.format(sagemaker_bucket), content_type='csv',distribution=distribution)

#Now we instantiate Sagemaker via the 'fit' function, logs=True)
training_job_name =

When executed, this will run until model training is complete, and from there we are ready to launch and interrogate it.

from sagemaker.tensorflow.serving import Model
# Change this to whichever model you want to use
model_path = 'tensorflow-training-2019-09-24-15-00-35-845'

insurance_estimator = Model(
                        model_data = 's3://{}/artifacts/{}/model.tar.gz'.format(sagemaker_bucket, model_path),
                        role = role,
endpoint_instance_type = 'ml.t2.medium'
insurance_predictor = insurance_estimator.deploy(initial_instance_count=1, instance_type=endpoint_instance_type)

import base64

data = {
    'year': '2018',
    'month': '08',
    'stage_id': 1,
    'company_id': 1,
    'product_modality_id': 8,
    'cover_3': 1,
[... and 60 other parameters...]

data_list = [data]
input_data = {'signature_name': 'predict','instances': data_list}


AWS SageMaker means we can freely use the same model training and deployment code for both local use on a laptop and for deployment in the cloud - it takes a lot of the undifferentiated heavy lifting away from dealing with ML and lets us focus on tuning parameters and delivering value for our client.

Get in touch

If you’ve been interested by what you’ve read and would like to talk to someone about how Machine Learning could give your business the edge, please get in touch!

30/01/20 Clearing the Crystal Ball of Car Insurance with TensorFlow and AWS SageMaker, by Gavin Hamill

comments powered by Disqus