Forrester’s John Rymer sums up his opinion succinctly when he says, “Big Data: The worst category name ever.” It certainly has challenges in name and how people conceive of it. Big Data as the hype would have it, I call, “The elephant riding the bicycle.” I’ll give you the seven things you need to consider, but first, let’s look at the hype.
Big Data hype is everywhere. Fresh from three conferences in the past four weeks, Big Data has been the single biggest topic of discussion with customers, new acquaintances, old friends, coworkers and industry analysts. It would be hard to overstate how much attention Big Data is getting.
But as with any hype cycle, there is an enormous amount of questioning going on around where and how Big Data delivers value. Thinking back to the early Internet, there were very similar conversations. In 1995, there was a growing feeling that something big was coming from The Web but it was hard to sort the success from the sales pitch. People were spending money to create websites with little plan for why. A few learned detractors even published bold criticisms.
Despite the doubt1 and slow start, we found our cyberspace groove and launched wave after wave of new ways to buy, sell, read, listen, watch and converse. For billions of people, the Internet is simply an expectation.
Faster this time around
But the pace of change is now much faster and this time around the pump has been primed…we’ve become very quick at jumping on new technology and far better at marketing software and services. What took years to crank up in the 90’s takes much, much less time today. In what seems like no time, there are a wide variety of vendors selling brand-new products with Big Data labels and more than a few old products getting papered over with Big Data buzzwords. The market is confused, perhaps a bit skeptical and for good reason.
Solutions are well ahead of success stories. There are far more companies selling Big Data solutions than companies providing details around how Big Data is creating value.
What about Hadoop?
Before you think Big Data lacks success stories, it doesn’t. Powerful Big Data analytics have already created enormous value for companies like Google, Facebook and Yahoo that needed to monetize their vast amounts of member and search data. Apache Hadoop, the leading open source solution, was created from Google’s MapReduce and the Google File System but adopted by a Yahoo employee. Facebook uses Hadoop.
No one doubts the value those companies gained but they were in unique positions and employed scores of people in creating and maintaining Big Data solutions. It doesn’t make sense for everyone to go through that effort or throw money at the scarce data scientists needed to run the system. Most organizations shouldn’t follow that model and for one very good reason that has nothing to do with Hadoop. It has to do with elephants and bicycles.
Rymer weighed in on Hadoop with, “…too many people now seem to think that Hadoop is big data, when Hadoop is just one of the several big-data solutions available — and Hadoop isn’t good for many big-data scenarios.
Elephants on bicycles
Unlike the Internet circumstances in 1995, Big Data has a higher barrier to entry and requires a broader perspective to connect, understand, anticipate and act on the intelligence gleaned from data that’s known for its high velocity, large volume and wide variety of sources and types. ”
Simply applying distributed storage and processing (like Hadoop) to extremely large data sets is like putting an elephant on a bicycle…it just doesn’t make business sense.
And there’s another challenge to those thinking about the Google, Facebook and Yahoo model: The need for operational decision making isn’t sufficiently supported by offline, batch ‘research projects’. Things are happening much faster than batch allows. More on that in a moment.
If the data ecosystem isn’t sufficient for creating value from Big Data, value will be elusive. Data will be ‘dirty’, silo’d and response will be too late. There is a logical path forward emerging among forward-leaning organizations, analysts and a small set of software vendors. It is a balanced ecosystem with a deep foundation. It starts even before the data arrives and it has its roots in how we talk about the challenge and where we want to create value.
As Rymer puts it, “We have yet to see a one-size-fits-all suite or solution for all of these scenarios.” It takes foundational strategy and technology to make Big Data ‘work’.
Here are our seven steps to avoid getting crushed by the elephant riding a bicycle:
1. Use case clarity
Those who haven’t figured out their uses cases for Big Data are in danger of confusing the term with its purposes, a focal point of Rymer’s “‘crabby old guy’ rant”. This is the most likely reason companies who are finding success don’t like to use the words “Big Data”. Instead, they refer to their use cases with terms like predictive analytics, digital customer experience management, compliance, sense and respond, behavioral insights and compliance.
The purpose matters more than the hype and the hype will eventually go away. Those without purpose will be in an Emperor’s New Clothes scenario as either the king or his advisers.
2. Data enablement
You can’t have Big Data without data. While there are companies that simply crunch one-dimensional data, like Twitter feeds or customer shopping patterns, the typical enterprise needs to manage much more complex data sets coming from anywhere and everywhere. There is traditional data and its absolute integrity, defined by complex schema, sitting at rest in databases and log files. There is also the data that arrives in the operational moment, while business is happening and there’s no time to treat, store and recall. This in-motion data is often unstructured and dynamic. It presents challenges for an organization that can’t grab it real-time and make use of it.
Which is where in-memory data grids matter. The ability to use what Rymer calls “elastic caching platforms” allows data at rest and data in motion to be married up in cache, where flexibility and speed matter and many operations and queries can take place at the speed of business.
If you’re skeptical, think of how much information is only important in a moment. We call this volatile data and it ‘dies’ before it can be extracted, transformed, loaded into a database and then queried. Without cache memory, volatile data has no value. Commodities/equities trading and healthcare are big users of volatile data, but as putting it to use gets easier, more and more organizations will demand this capability.
So far, we’ve only talked about typical data sources, but as the world adapts more sensors and live feeds, the amount, speed, types and volatility of data will increase rapidly. Elastic caching options will be a highly critical piece.
3. Infrastructure pipes
We’ve written three recent success stories that stand out from the noisy crowd talking Big Data. Mercy Healthcare, The Nielsen Company and FedEx Services all reap powerful benefits from Big Data solutions. But they all have something basic in common. They all made significant investments in a service oriented architecture (SOA) with interoperable services that allow their organizations to move large amounts of data quickly from outside in, inside out, and across any application, database or other data source. They do this through an enterprise service bus (ESB) that moves data automatically between disparate sources that publish and subscribe rather than trying to connect each application or database separately.
If you think about it for a moment, the need to connect to data precedes the ability to do any form of analysis. Without it, you have the bicycle, unable to support the heavy elephant of Big Data.
4. User-friendly analytics
Organizations that have their infrastructure house in order are ready to crunch data and find meaningful insight. Insight shows up as patterns in the data that can be discovered by using complex algorithms. The best tools have visual interfaces and are useful to business people who don’t need to understand every technical nuance but can manipulate data to discover insights. Drag and drop is the new black.
Done right, visualization is the starting point for creating interactive dashboards that aggregate, present and allow manipulation of large data sets across disparate data sources.
If a PhD is necessary to manage the front end of analytics, the system is unsustainable and won’t allow the typical enterprise to successfully mine data at the pace necessary for meaningful change.
5. Sense and respond
Once a pattern is understood, there needs to be a way to anticipate its occurrence to either maximize its benefit or take steps to prevent or mitigate the problem it presents. Automated event processing is the modern equivalent of what was studied and taught by U.S. Air Force Col. John Boyd, a fighter pilot who realized that decision-making occurs in recurring cycles. He called the process OODA, for Observe, Orient, Decide and Act. He was amazingly effective with his system and it is still taught today in air combat schools.
What Boyd had to do based on training and awareness, we now support with computerized systems that can handle far more data, far faster. Systems trained to watch for events in combination can also apply logic to a pattern discovery to call for more analysis, look for follow-on events, or respond immediately in complex ways.
And just like OODA, it feeds back into itself. Events that are streamed into an event processing engine from either data found across the ESB or coming ‘live’ from external feeds may not be understood in the moment, but are processed after-the-fact in the very same analytical tools that discovered data patterns in the first place. This creates a virtuous cycle of discovery-operation-improvement-operation and so on.
Making sense of the competitive landscape is truly a function of a Big Data solution, but event processing is the secret sauce. It reflects the highly proprietary choices each organization makes. It sets the stage for the highest value step of a Big Data solution.
6. Putting solutions in play
Whether preventing deadly illnesses, providing near-real-time global market research, or solving congestion in the logistics network, Big Data’s big value comes from the action that is taken. The most effective way to respond to opportunity and risk is to have control over both manual and automated processes. Business process management suites are the flexible and fast way to drive efficiency in execution and consistency in response.
Social media has a powerful role as well, applying the Big Data ‘Big Filter’ to put the right information in the right hands at the right moment. Social software is maturing into this role, but users need to catch up. Expect social tools to be the way to define work at the role level of any organization in the future.
Big Data challenges organizations to think across more functional silos than ever before and responsive process management and collaboration will keep the Big Data wave from swamping the boat.
7. Go back and do it again
When the connect-understand-anticipate-act cycle has run its course, the next step is to analyze the outcomes, find new and/or better patterns and improve the system’s function and output. It is a virtuous cycle of change that the best organizations never stop running.
Not just our opinion
It would be hard to oversell the need for the infrastructure outlined above. A recent IDC report lists data integration (#3) as the biggest Big Data IT challenge, not the size or speed required by the system. That same report lists defining business requirements (#1) as the top business challenge, not skills or tools.
Rymer says, “Big data must include complex event processing platforms (#5), elastic caching platforms (#2), and the various not-only SQL (NoSQL) databases. We have yet to see a one-size-fits-all suite or solution for all of these scenarios.”
Likewise, a Gartner Big Data report by analyst Douglas Laney, released just last week, kicks off its analysis section with a warning that organizations need to ensure infrastructure adequacy (#3). Coming out of the late 2000’s downturn, infrastructure adequacy isn’t by any means in place for current needs, much less the significantly expanded requirements for Big Data solutions.
O’Reilly’s Strata site wrote up an interview with an expert that stressed A new focus on user-friendly analysis (#4).
Gartner’s W. Roy Schulte argues that organizations need to Use Complex-Event Processing (#5) to Keep Up With Real-time Big Data in an article from August 2012.
In a report from last summer by Sanchit Gogia, Forrester warns that organizations overlook the key factors that support Big Data solutions like infrastructure (#3) and process (#6).
Forrester’s Clay Richardson uses the term “Big Process” when he observes that, “…even for organizations that might not be focused on big data quite yet, there is still the need to begin thinking from a big process perspective to better understand the relationships and impacts between operational data and business process performance.” (#6). Clay’s blog is appropriately titled, Big Data Ain’t Worth Diddly Without Big Process.
The challenges aren’t a mystery at this point in the Big Data hype cycle. The solutions aren’t either if you take a look at the market analysis and the success stories we’ve mentioned. Big Data, beyond being a questionable term, has real value for those who understand the nuances of working with data that has volume, velocity, variety and volatility.
We’ve included a great deal of information here and we welcome your comments.
1. Detractors are always easy to find…here are but three of the more famous ones:
Airplanes are interesting toys but of no military value.” — Marechal Ferdinand Foch, World War I French General
I think there is a world market for maybe five computers.” — Thomas Watson, chairman of IBM, 1943
There is no reason anyone would want a computer in their home.” — Ken Olson, president, chairman and founder of Digital Equipment Corp., 1977