Scaling to Big Data: Organizational Considerations, and Why You Should be Thinking of Small Data
Preparing organizations to handle big data is a hot topic these days. But there are a few considerations before your organization should go there. First, ask yourself why. Many organizations are getting caught up in the hype without fully thinking of the broader implications to the enterprise data architecture. Secondly, ask yourself who – specifically does your organization have the right talent and org structure to successfully implement and manage the architecture?
The data scientist role, the ‘sexiest job of the 21st century’ , is a poorly defined role that crams the statistician, developer, systems architect, and engineer into one person. While a single data scientist may have some of those skills, they certainly are probably not amazing at all skills. That’s why thinking below the surface of the skills of the all-mighty data scientist and building a team of specialists in key areas makes for a more scalable model to successfully grow your big data capabilities. To be successful, consider an organization that has a predictive modeler (with experience in the Big Data space), an architect (planning the integration into the broader data architecture), a Big-Data engineer (someone who can set up and maintain the stack), and analysts with strong visualization abilities to tell the Big Data analytics story to the business.
While the Big Data movement is exciting, the value of Small Data is sometimes forgotten. What is small data? Here’s our take on it: Big data essentially is massive, fast moving, unstructured data that is designed to capture high frequency data such as web logs, sensors, appliances, smart meters, etc… Predictive practitioners and data scientists try to ask Big Data questions it was not designed to answer. Big Data only hints at motivations behind customer choices. We get proxy statistical answers, which are good, but not always the best. In some cases it is faster, and cheaper to collect new data that directly answers the question at hand, utilizing the scientific method. For example, if you are trying to estimate the incremental ROI of a digital marketing offer, statistical modeling over weblog and ad-server big-data could get an estimate of the incremental ROI of the marketing offer. Running a randomized A-B test to collect new data measuring the incremental ROI is more precise. Understanding how to collect data in the most efficient way possible to get as much information as possible with as little volume as possible is a science in of itself (design of experiments). Small Data is info-dense and small (sometimes 8-16 rows), and big data is info-sparse and huge. By leveraging experimental design techniques, small data can answer problems that Big Data cannot. Even better, the two work together very well. Big Data can predict the outcome, and Small Data can predict the best business action to take.
Marriage of Predictive Analytics and Experimentation – Prescriptive Analytics
The marriage of these two capabilities opens the door to prescriptive modeling, which we define as understating why something happens and prescribing the best action to drive the desired outcome. The reason this is important, is predictive analytics is often plagued with the “so what?” problem. So what if I know my customer is going to leave, what is the best possible thing we should do? Should we do anything at all? For example, traditional predictive analytics may tell you if an online customer will convert if sent an email based off of clickstream history, demographics, etc… Small data analytics will tell you sending an email with 50% off on Thursday Evenings with the specific subject line is the best offer. This approach provides greater certainty, since the data is collected in a random fashion, and compared to a control group. The combination of these two techniques enable the ability to serve the best possible email for each individual based on the combination of their Big-Data profiles and predictions to convert based on different combinations of email offers and designs.
Stay tuned for further discussion around best practices in experimental design to kick-start your Small Data practice, as well as a discussion on how to employ prescriptive techniques that substantially improve results in your advanced analytics practice.