So What Exactly is Big Data and How Will it Affect Insurance?

This article originally published on

Unless you’ve been living under a rock for the last few years you’ve heard a LOT about Big Data. But if you’re like most insurance professionals you didn’t go to school for Computer Science and even though it sounds very cool you really haven’t gotten your head around a simple question:



What the heck is big data? How will it affect insurance?

For the last several years, the world has been creating more data than it ever had in the past. Some call it the digital exhaust: everything we do leaves a digital trail and with a smart phone in every pocket, a laptop in every backpack and near universal access to giant clusters of computers in the cloud, the sheer amount of data we are able to collect on everyone and everything has grown exponentially. Data grew to such large quantities that it no longer fit in the memories computers use for processing it, so whole new tools had to be designed to handle it. We started creating and saving so much data that there was a qualitative change, and all of a sudden we became able to extract new insights and create new value due to the large scale of the amount of data that we can access. Things are now possible that simply could not have been done at a smaller scale.

One of the key changes that happened is that we started recording everything in a digital rather than analog way (computers instead of paper). [tweet_dis excerpt=”World’s information in digital format: Y2000: 25%. Y2007: 93%. Y2013: 98%. #BigData #Insurance”]As recently as the year 2000, only a quarter of all of the world’s information was digital. By 2007, more than 93% of the world’s information is now in digital format and can be much more easily read and analyzed by computerized tools! By 2013, more than 98% was digital.[/tweet_dis]


Why is Big Data a Big Deal for Insurance?

[tweet_dis excerpt=””At its very core, #Insurance has always been an information business.” #BigData”]At its very core, insurance has always been an information business.[/tweet_dis] We don’t make widgets. We help people and businesses manage their risks and help pay for the losses when they happen, and all of this is based on information, not on arranging physical atoms in any way. It’s literally a pure information business.

For centuries, when faced with very large numbers of data points, society has depended on using samples. This applies even more to the insurance industry, think back to CPCU 500, our ENTIRE business is based on the law of large numbers and on making statistically valid predictions about risk. (If you haven’t done your CPCU stop reading this article right here and go get started on it! Here’s why, here’s how.) [tweet_dis excerpt=”Sampling was necessary because we lived in an analog world. We are now in the #BigData era.”]Sampling, and the law of large numbers, was necessary because we lived in a world of limited information, an analog world where most things didn’t get recorded in an easy to analyze way. We are now in a different world, in the digital era, and now thanks to Big Data, we are approaching a world in which we won’t need to use samples anymore, we’ll have ALL the data.[/tweet_dis] This will have huge implications for our industry.



Historically, we had to work with samples because it was very difficult or impossible to collect all of the data, and because we didn’t have tools that could work with gigantic sets of data. Having ALL the data related to something instead of a sample of it, allows us to see much more detail. For example: in the old analog world our actuaries figured out that 16 to 19 year old drivers were more likely to have an auto accident, and this became a key part of how we price auto insurance. In the new digital Big Data world, we might be able to analyze every second a young person has ever driven and make a personalized price for their very own level of risk! That rate will be much more accurate because it is based not on some of the general data (the accidents had by insured 16-19 year olds) but rather by ALL the specific data (every second of driving this person has ever done).

By its very definition, actuarial science, which our entire business is built on, is “the discipline that applies mathematics and statistical methods to assess risk” and one of the aims of statistics is to “confirm the richest findings using the smallest amount of data.” In other words, [tweet_dis excerpt=”#Insurance business is based on predictions using limited data. Now we are in the era of unlimited data.”]our entire business is built on making predictions using limited data. In a world of unlimited data, we will have to quickly become world class at analyzing and reacting to ALL the data, or we might be beat at our own game by those who do.[/tweet_dis]


The why doesn’t matter, only the what:

In the old world of small data, society spent a lot of resources trying to figure out the why behind things. Scientific and statistical studies started with a hypothesis, a prediction of how things worked, and then tested the available sample of data to see if that hypothesis was correct, if it wasn’t then it modified the hypothesis and tried again. Most data was collected for a specific purpose, and it was very difficult to use it for other purposes without collecting a new sample. “Today, with so much data around and more to come, hypotheses are no longer crucial for correlational analysis.”

Before big data, because of the more limited amount of computer power we had, most analysis was for linear relationships (this causes that), with the new tools of big data analysis and the faster computers available today we can find more complicated non-linear relationships (a, b, c, d, e, f, g independently predict x a little bit but together they predict x very accurately).


It doesn’t matter that your system doesn’t know all the variables that go into a problem, only that it can predict the result. For example, Google has used Big Data to predict flu outbreaks faster than the CDC by letting the computer figure out which searches people search for that correlate with flu outbreaks in their area. It doesn’t matter whether those people know that what they’re searching about is the flu, just that they’re searching on it and that when those hundreds of identified search terms happen in one area there’s a very good chance that area is experiencing a flu outbreak. In the new world of big data, the why something happens doesn’t matter, it only matters that we are now able to find the hidden patterns and find it or predict it. “Society will need to shed some of its obsession for causality in exchange for simple correlations.”

One example of how an insurance company is trying to use Big Data to improve its underwriting is Aviva which studied the idea of using credit report and marketing data to underwrite some life insurance applicants instead of the traditional blood and urine lab analysis. The idea is to identify applicants with higher risk of lifestyle diseases like high blood pressure, diabetes and even depression. “The method uses lifestyle data that includes hundreds of variables such as hobbies, the websites people visit, and the amount of television they watch, as well as estimates of their income.” The traditional lab tests cost $125 per person while this new approach can be as cheap as $5.  This is an example of a correlational relationship being valuable and more efficient than relying on a causal relationship for prediction of an outcome.


The more data we have, the less exact it needs to be:

In the old world of small data, statisticians and data analysts were trained to clean out outliers and try to get data that was as clean as possible. With Big Data we are looking at vastly more data which means that we can get away with less exact data. “It’s a tradeoff, with less error from sampling, we can accept more measurement error.” The old tools (spreadsheets, relational databases, SQL, business intelligence tools, etc) were created to work on exact data, the new tools are designed to work with large quantities of imperfect data. The need for perfect data was a side effect of the limited tools we used to manage small data.

Here’s a great example of why we can now get away with less exact data: “Suppose we need to measure the temperature in a vineyard. If we only have one temperature sensor for the whole plot of land, we must make sure it’s accurate and working at all times: no messiness allowed. In contrast, if we have sensors for every one of hundreds of vines, we can use cheaper, less sophisticated sensors (as long as they don’t introduce a systematic bias). Any particular reading may be incorrect, but the aggregate of many readings will provide a more comprehensive picture.”


Data is no longer stale after its original use:

One of the very limiting features of the old world of data is that once a dataset was built for a particular use, it was very difficult to use it for another, so you have to know what you’re looking for before collecting the data.  Because you were collecting a sample of data and inputting it in to a very structured format for future analysis, getting the right pieces of information was of paramount importance. In the new world of Big Data, all data becomes a new raw material to create value in new and creative ways, most of which were impossible in the old world. Because we are collecting data on everything, and our tools are more sophisticated in ability to arrange and rearrange that data, we are more able to use the information in a variety of ways. Think about it, that telematics device on your car collects a TON of data. Think about the data your smartphone collects about your habits each day. Every time you search on Google they’re recording not only what you search for but even the exact amount your mouse spent at different parts of the screen. Soon, we’ll even be able to track your eyes through the webcam when you visit our website. There’s just a TON of data out there that we’ll now be able to analyze and learn about our customers.


Being free of sampling will allow us to know more:

“Sampling quickly stops being useful when you want to drill deeper, to take a close look at some intriguing subcategory of the data”. One of the key benefits of being able to collect ALL of the data about something is that we can dig further into the data and ask it fresh questions that we hadn’t even thought of when we started collecting the data. In the old paradigm of sampling, one would collect only what was directly asked for.  If you noticed a pattern in that sample but needed something to explain or verify the pattern that you had not thought to ask for ahead of time, you would need to re-sample and get additional data to confirm what you found.


Data no longer needs to be structured:

Traditionally, the way data was stored in spreadsheets and databases was structured, meaning that each field could fit a very specific type of data, a phone number field for example, could only hold a 10 digit number. The problem is that only around 5% of all digital data in the world is structured in a form that neatly fits into a spreadsheet or database. That means we had no easy way to analyze the other 95% of data! Pretty much all data had to be cleaned up before analysis which made everything smaller and more expensive.

In the new world of Big Data, new tools such as Hadoop are able to analyze unstructured data in all shapes and sizes, 100% of data instead of just 5%. It can even analyze things like books, journals, metadata (data about data), audio, video and much more. Imagine being able to include every second of conversation digitally recorded from your call centers along with all your other data and analyze it all to find trends! This is one of the most powerful features of Big Data, and it is already being used in many call centers.



Messier data will help us insure messier things:

Big Data’s ability to help us analyze messy data could help us insure harder to insure things. For example ZestFinance, a company founded by a former chief information officer at Google, built technology that helps lenders underwrite small, short-term loans to people who have bad credit scores. Turns out traditional credit scoring is based on a few factors, ZestFinance uses a huge amount of variables using Big Data, and it produces solid results; in 2012, their loan default rate was a third lower than the industry. Big Data might allow us to better underwrite risks for which we don’t have very good data such as people who can’t get a driver’s license or commercial risks that currently can only be insured in the surplus lines market. Imagine all of the services, microinsurance, and other innovations we’ll be able to develop.


From indemnification to risk prevention:

One of the techniques used with Big Data is Predictive Analytics, and pretty much every carrier is experimenting with it.

“The technique is being used to prevent big mechanical or structural failures: placing sensors on machinery, motors, or infrastructure like bridges makes it possible to monitor the data patterns they give off, such as heat, vibration, stress, and sound, and to detect changes that may indicate problems ahead.”

“The underlying concept is that when things break down, they generally don’t do so all at once, but gradually over time.” If we have sensor data and correlational analysis we can probably figure out something will break before it actually does. This can allow us to prevent claims from ever happening, thus moving from insurance as a loss paying service to being a risk prevention partner.


Acknowledgement: Unless otherwise stated, all quotes, numbers and explanations are adapted from Big Data: A Revolution That Will Transform How We Live, Work, and Think. Yes, you should read it! Yes, we get a small commission if you buy it using that link and it helps us run and improve InsNerds.

Leave a Comment