Big data is an umbrella term for many technologies: Search, NoSQL, Distributed File Systems, Batch and Realtime Processing, and Machine Learning ( Data Science). These Different technologies are developed and proven to various degree. After 10 years, is it real? Following are few success stories of what big data has done.
- Nate Silver predicted outcomes of 49 of the 50 states in the 2008 U.S. Presidential election
- Money Ball ( Baseball drafting)
- Cancer detection from Biopsy cells (Big Data find 12 tell-tale patterns while doctors only knew about nine). See http://go.ted.com/CseS
- Bristol-Myers Squibb reduced the time it takes to run clinical trial simulations by 98%
- Xerox used big data to reduce the attrition rate in its call centre by 20%.
- Kroger Loyalty programs ( growth in 45 consecutive quarters)
As these examples show, big data indeed can work. Could that work for you. Let’s explore this a bit.
The premise of big data goes as follows.
If you collect data about your business and feed it to a Big Data system, you will find useful insights that will provide a competitive advantage — (e.g. Analysis of data sets can find new correlations to “spot business trends, prevent diseases, combat crime and so on”. [Wikipedia])
When we say Big Data will make a difference, the underline assumption is that way we and organisations operate are inefficient.
This means Big Data is as an optimization technique. Hence, you must know what is worth optimizing. If your boss asked you to make sure the organization is using big data, doing “Big Data Washing” is easy.
- Publish or collect the data you can with a minimal effort
- Do a lot of simple aggregations
- Figure out what data combinations makes prettiest pictures
- Throw in some machine learning algorithms, predict something but don’t compare
- Create a cool dashboard and do a cool demo. Claim that you are just scratching the surface!!
However, adding value to your organization through big data is not that easy. This is because insights are not automatic. Insights are possible only if we have right data, we look at the right place, such insights exists, and we do find the insights.
Making a difference will need you to understand what is possible with big data, what are its tools, as well as the pain points in your domain and organization? Following Pictures shows some of the applications of big data within an organization.
The next step is understanding tools in “Big Data toolbox”. They come in many forms.
KPI ( Key Performance Indicators) — People used to take canaries into the coal mines. Since those small birds are very sensitive to the oxygen level in the air, if they got knocked out, you need to be running out of the mine. KPIs are canaries for your organization. They are numbers that can give you an idea about the performance of something — E.g. GDP, Per Capita Income, HDI index etc for a country, Company Revenue, Lifetime value of a customer, Revenue per Square foot ( in the retail industry). Chances are your organization or your domain has already defined them. Idea is to use Big Data to monitor the KPIs.
Dashboard — Think about a car dashboard. It gives you an idea about the overall system in a glance. It is boring when all is good, but it grabs attention when something is wrong. However, unlike car dashboards, Big data dashboards have support for drill down and find root cause.
Alerts — Alerts are Notifications ( sent via email, SMS, Pager etc.). Their Goal is to give you a peace of mind by not having to check all the time. They should be specific, infrequent, and have very low false positives.
Sensors — Sensors collect data and make them available to the rest of the system. They are expensive and time-consuming to install.
Analytics — Analytics take decisions. They come in four forms: batch real-time, interactive, predictive.
- Batch Analytics— process the data that resides in the disk. If you can wait (e.g. more than an hour) for data to be available, this is what you use.
- Interactive Analytics —It is used by a human to issue ad-hoc queries and to understand a dataset. Think of it as having a conversation with the data.
- Realtime Analytics— It is used to detect something quickly within few milliseconds to few seconds. Realtime analytics are very powerful in detecting conditions over time (e.g. Football Analytics). Alerts are implemented through Realtime analytics
- Predictive Analytics — It learns a solution from examples. Example, It is very hard to write a program to drive a car. This is because there are too many edge conditions. We solve that kind of problems by giving lot of examples and asking the computer to figure out a program that solves the problem ( which we call a model). Two common forms are predicting next value (e.g. electricity load prediction) and predicting a category (e.g. is this email a SPAM?).
Drill down — To make decisions, operators need to see the data in context and drill down into detail to understand the root cause. The typical model is to start from an alert or dashboard, see data in context (other transactions around the same time, what does the same user did before and after etc.) and then let the user drill down. For example, see WSO2 Fraud Detection Solution Demo.
The process of deriving insight from the data, using above tools, looks like following.
Here different roles work together to explore data, understand data, to define KPIs, create dashboards, alerts etc.
In this process, keeping the system running is a key challenge. This includes DevOps challenges, Integrate data continuously, update models, and get feedback about the effectiveness of decisions (e.g. Accuracy of Fraud). Hence doing things in production is expensive.
On the other hand, “doing it Once” is cheap. Hence, you must first try your scenarios in an ad-hoc manner first (hire some expertise if you must) and make sure it can add value to the organization before setting up a system that does it every day.
Actionable Insights are the Key!!
Insights that you generate must be actionable. That means several things.
- Information you share is significant and warrant attention, and they are presented with their ramifications ( e.g. more than two technical issues would lead customer to churn)
- Decision makers can identify the context associated with the insight ( e.g. operators can see through history of customers who qualify)
- Decision makers can do something about the insight ( e.g. can work with customers to reassures and fix)
For each information you show the user, think hard “why I am showing him this?”, “what can he do with this information?”, and “what other information I can show to make him understand the context?”.
Where to Start?
Big Data projects can take many forms.
- Use an existing Dataset: I already have a data set, and list of potential problems. I will use Big data to solve some of the problems.
- **Fix a known Problem: Find a problem, collect data about it, analyse, visualize, build a model and improve. Then build a dashboard to monitor.
- Improve Overall Process: Instrument processes ( start with most crucial parts), find KPIs, analyze and visualize the processes, and improve
- Find Correlations: Collect all available data, data mine the data or visualize, find interesting correlations.
My recommendation is to start with #2, fix a known problem in the organization. That is the least expensive, and that will let you demonstrate the value of Big data right away.
Finally, the following are key take away points.
- Big Data provide a way to optimize. However, blind application does not guarantee success.
- Learn tools in Big Data toolbox: KPIs, Analytics ( Batch, Real-time, Interactive, Predicative), Visualizations, Dashboards. Alerts, Sensors.
- Start small. Try out with data sets before investing in a system
- Find a high impact problem and make it work end to end