Big data pushes the scalability frontier. But where does the frontier begin and end?
When people think about big data, they generally envision macroscopic scaling: in other words, bigness that spans the globe or, at the very least, some large data center. People generally associate big data platforms with server farms that sprawl across huge tracts of real estate, and which are populated by ever-larger racks of processing, storage, memory, and interconnect nodes arranged in endless rows.
However, big data platforms must also push the nanoscopic frontier of scalability. It’s useful to think of a consolidated big data platform, architecturally, as a fractal structure. What that means is that big data must be self-similar on all platform scales—from macro to nano—and leverage the full parallel processing resources that are available at each level.
Big data’s self-similarity comes from one of its central architectural features: pervasive parallelization. This architectural principle should operate concurrently at on three levels of scalability in the architecture of your big data platform:
Scale-in will steadily increase its importance in your scalability strategy. Miniaturization remains the juggernaut uber-trend, and subatomic density is its frontier. We all know that today’s handheld consumer gadgets have far more computing, memory, storage, and networking capacity than the state-of-the-art mainframes that IBM and others were selling back in the Beatles era. And we’re all starting to get our heads around quantum computing, atomic storage, synaptic computing, and other “scale-in” approaches that will keep pushing Moore’s Law forward for the foreseeable future.
For enterprises that are serious about pervasively scaling their consolidated big data architectures, a balanced strategy should incorporate all three approaches—scale-up, scale-out, and scale-in—depending on the workloads you’re trying to optimize.
You will need to scale big data elastically and concurrently toward both the infinite and the infinitesimal.
What do you think? Let me know in the comments.
Open access broadens Hadoop analytics accessibility for InfoSphere BigInsights
IBM Redbook: IBM information governance solutions can enhance information control
IBM Redbook: Apply information governance principles and practices in a big data landscape
See a video series on the chief data officer and other data professionals at this Big Data & Analytics Hub blog
Discover a holistic approach to risk management
IBM Big Data in a Minute: Learn more about gaining human insights from data