World tour of data
"Data is the new oil" is what the mathematician Clive Humby said in 2006. Today, data is all around us. Finance, supply chain, marketing and even management are some of the sectors in which data is becoming more and more important. Discover in this general introduction to data how data is part of our lives. In this article:
- Data infrastructures
- Data for analysis
- Data for prediction
- The sectors revolutionized by data.
Data is the new oil: how data is part of our lives
A data infrastructure is the way to organize data and the way it is collected. We will create data dictionaries around this, that is to say guides explaining how this data is collected, processed and how to use them. The right technologies allow.
Data are collected in databases, relational or non-relational. To manage this data, we use database management systems (DBMS). Among the best known, RDBMS (relational DBMS) like MySQL, PostgreSQL or non-relational like MongoDB or NoSQL. With the advent of the cloud, other types of infrastructure have developed: datawarehouses and data lakes.
Data for analysis
Once we have collected this data with the right architecture, we can start to make analyses. We will use this collected data to better explain some past phenomena.
The tools of data analysis
The tools to analyze this data have several levels of sophistication. We can start with aggregates known by many: mean, median, quartiles, standard deviation are basic tools used to analyze some data.
The next level is to use more advanced statistical tools. Here are some of them:
- Linear regression
- Logistic regression
- Principal component analysis
- Factor analysis
- Independent component analysis
The list is obviously longer than just these tools.
The steps of data analysis
First, we will collect data, from one or more sources. This is where the data architecture becomes important. It is important to clearly define the objective of collecting this data in order to choose the right architecture.
Then, before they can be analyzed, the data must be cleaned and transformed to make them ready for analysis. This is called data prep.
We can now analyze them, the quality of the data has obviously all these importance. We will use tools as defined above to make these analyses. It is at this stage that we will transform the raw data into usable information. And from this information, we will generate insights that have value for companies.
The final step is the restitution. It is done through the visualization of these data.
Examples of the use of data analysis
Here are some use cases for data analysis. Data analysis can be useful to detect anomalies. We often speak with the term outliers. Data analysis techniques will allow you to detect statistically improbable data that are surely wrong. The anomaly can be of various forms: bad measurement, error in the model, exception which will disturb the analyses (in particular by calculating an average for example).
Another example of use is the realization of market studies. To compare a sector, companies, market studies will be based on data analysis that will allow to understand in depth a sector.
The next level up from analysis is prediction, to estimate events that have not yet occurred.
Data for prediction
Prediction is a sub-branch of data analysis. It is the fact of using data to predict the future... or at least to estimate it as well as possible. It is therefore to estimate which of all possible futures is the most likely, based on the data we have collected. Predicting the future? This is what every company dreams of. Reality is of course far from perfect prediction, but if we can limit the risks, it has a strong value for companies.
The different types of prediction
Some examples of prediction types are:
- Classification: an algorithm predicts, based on historical data, a category. For example, from the content of an email, its subject and its sender, whether it is likely to be spam or not. The two categories are spam and non-spam.
- Regression: An algorithm predicts a value from historical data. For example, what will be the price of a barrel of oil in 2 months. How much is a house with such and such a surface, such and such a number of rooms, such and such a number of bedrooms and located in such and such a place worth?
- Clustering: The goal is to group data. Example : we have various measurements of a group of people. How to make 3 groups to create 3 sizes of tee-shirt: S, M and L? Clustering algorithms will help us to do this.
The sectors revolutionized by data
In insurance, data will allow us to understand which geographical areas are the most at risk for certain claims and therefore to properly assess the risk and set aside provisions for the day it occurs.
On the customer side, it will help identify which customers are most likely to leave for the competition and therefore find the right way to retain them.
In terms of cybersecurity, data can help detect the biggest flaws, those with the biggest consequences. It allows to detect among the alerts, which ones present the most risks and should be treated first.
Have you heard of the smart grid? The smart grid allows to optimize the flow of electricity in real time. With the advent of renewable energies, which make electricity production very variable, the smart grid makes it possible to regulate electricity demand. For example, it will be possible to defer the charging of electric car batteries. We can make sure that not all fridges or water heaters are running at the same time, but rather that they are used when the electricity supply is the greatest.
Bringing data in real time to enable informed decisions is the first element. Fraud detection and the creation of increasingly advanced risk models are two other uses of data in this sector
There are many cases in the industry, but here's an important one: data allows for better inventory management. This is important for the industry. Buying too much inventory creates a hole in the cash flow and requires having enough space, which again is a cost. On the other hand, for obvious reasons, not having enough inventory is problematic. Indeed, it can block all the rest of the chain.
In general, data allows you to better understand your customers. Upstream, it will allow you to identify which customers are most likely to acquire your product and services. This will allow you to develop strategies to acquire customers. Downstream, the data will then help to identify the customers most likely to leave and therefore to define strategies to be able to keep them.
Data is revolutionizing the world of healthcare. The first case, which concerns everyone, is Doctolib. By creating a platform and using data, Doctolib has created a system that optimizes bookings for everyone and helps find the slots that are still available. Everyone wins: doctors reduce the gaps in their schedule. Patients see the available slots and can find the closest slot that fits their schedule and be informed when a slot becomes available.
And with more advanced predictions: detecting certain diseases that are complex to evaluate. The detection of cancer, for example, using image analysis(computer vision) is a great evolution if it allows to reduce human errors. Indeed, a misdiagnosed cancer has consequences in both directions. For example, a person who does not have cancer and is diagnosed as having cancer will undergo heavy treatment for nothing.
Autonomous vehicles are expected in a few years. Many tests are promising. Data collected with sensors placed all over the cars should allow this and make transportation more pleasant and reduce the risk of accidents. The prediction of breakdowns is also an important field of application of data in transport and industry.
As you have seen, data is now increasingly integrated at all levels of the company. It allows to optimize production, to guarantee customer satisfaction and to allow important innovations, that's why the data professions have the wind in their sails!