Big Data Primer: What’s All This Hype about “Big Data?”

Supply Chain Management (SCM) and IT department leaders are enamored with “Big Data.” Their discussions are unavoidable. Rightfully so, everyone wants to come up to speed on the new technologies required to collect, integrate, store and analyze data on a massive scale –and “Big Data” is all the buzz. Big Data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. How big does a dataset need to be in order to be considered Big Data? No one seems to know. We just assume that as technology advances over time, the size of datasets that qualify as Big Data will also increase.

Big Data enthusiasts love to talk about “the three V’s.” If you don’t know about them yet, here’s a good primer:

  • Volume – Volume describes the amount of data generated by organizations or individuals. Terabyte (TB) records, transactions, tables, files. For example, a Boeing Jet engine generates 10TB of operational data every 30 minutes it runs. Hence a 4-engine Jumbo jet can create 640TB on one Atlantic crossing. Multiply that by 25,000 flights flown each day and you get the picture.
  • Velocity - Velocity describes the frequency at which data is generated, captured and shared. The speed at which data can be captured, processed and analyzed ranges from batch, to near real time, to real time –and naturally, the hype is all about real time. Forget about it. I could write another post on what “real time” actually means, but for now, let’s just accept that real-time creates too much information. As Ray Wang, CEO of Constellation Research Group put it: “Real-time is irrelevant because speed does not trump fidelity. Quantity does not trump quality.” Context is key. Ray talks about information being made available in “right time” and presented in ways that are sensitive to “roles, processes, location, time, and relationships.”
  • Variety - A proliferation of data types from social, machine to machine and mobile sources add new data types to traditional transactional data. These days the data may be structured, unstructured, semi-structured, etc., and the new types include content, geo-spatial, hardware, location based, log, process, RFID, search, streaming, social, text and web.  In healthcare, where there will be a myriad of new devices manufactured to collect health data, from remote monitoring equipment placed in patient’s homes to implants enabled to track your day-to-day actions, the numbers seem, well, unimaginable. So go ahead and do it. Imagine when all these data–medical records, genome sequences, public health data, self-monitoring data–become available for analysis on a macro-scale.  Imagine if/when all of these disparate datasets are combined; when open data platforms and open middleware platforms (connecting disparate sensing devices at one end, and a multitude of service apps at the other end) are put together.

Keep imagining. It’s good for the brain and not hard on the wallet. In a recent Forbes article, Brad Peters argued that the Big Data optimists assume two things:

  • The right tools can be found to collate and analyze all of this disparate data in an efficient way;
  • That there really is valuable information to be extracted from all of this raw data, or at least sufficiently valuable information to justify the cost of this endeavor.

Those are big assumptions.

Despite being a “glass half full kinda guy, I do have a salesman’s sensibilities, and I intuitively know that the current market hype is creating a lot of customer confusion. Hardware, storage, integration and analytical software solution providers are separately banging on doors without a single successful example to reference. The talk is all about technology options, not business value and/or how any of this fits with the industry transformation (in healthcare and elsewhere) that is supposedly taking place.

As Ray Wang put it, “in the case of Big Data, the big question is: what is the question?  What questions do business leaders want answered? Healthcare executives are already inundated with reporting dashboards and charts. Clearly, retrospective analytic capability is not to be sneezed at, but it’s not supposed to be central to any conversation involving the promise of Big Data.

Sit back and enjoy the hype. Keep your eyes and ears alert for referenceable examples of Big Data solutions in action and start thinking about the questions you still can’t answer.

—Tom Finn

* Required fields  [email address will not be published]

*