Big data pushes the scalability frontier. But where does the frontier begin and end?
When people think about big data, they generally envision macroscopic scaling: in other words, bigness that spans the globe or, at the very least, some large data center. People generally associate big data platforms with server farms that sprawl across huge tracts of real estate, and which are populated by ever-larger racks of processing, storage, memory, and interconnect nodes arranged in endless rows.
However, big data platforms must also push the nanoscopic frontier of scalability. It’s useful to think of a consolidated big data platform, architecturally, as a fractal structure. What that means is that big data must be self-similar on all platform scales—from macro to nano—and leverage the full parallel processing resources that are available at each level.
Big data’s self-similarity comes from one of its central architectural features: pervasive parallelization. This architectural principle should operate concurrently at on three levels of scalability in the architecture of your big data platform:
Scale-in will steadily increase its importance in your scalability strategy. Miniaturization remains the juggernaut uber-trend, and subatomic density is its frontier. We all know that today’s handheld consumer gadgets have far more computing, memory, storage, and networking capacity than the state-of-the-art mainframes that IBM and others were selling back in the Beatles era. And we’re all starting to get our heads around quantum computing, atomic storage, synaptic computing, and other “scale-in” approaches that will keep pushing Moore’s Law forward for the foreseeable future.
For enterprises that are serious about pervasively scaling their consolidated big data architectures, a balanced strategy should incorporate all three approaches—scale-up, scale-out, and scale-in—depending on the workloads you’re trying to optimize.
You will need to scale big data elastically and concurrently toward both the infinite and the infinitesimal.
What do you think? Let me know in the comments.
IBM is named a leader in the first Forrester Wave for data governance tools
Video: See how to use IBM Navigator on Cloud to enhance team productivity
An Inc. magazine article shows how a zoo learned a lot from big data
Access geospatial analytics in the Bluemix Analytics Warehouse service
IBM Institute for Business Value study: Apply advanced analytics to enhance patient outcomes
IBM ebook: Gain insights for the chief data officer role
Big data in a minute: The composable business