Big data pushes the scalability frontier. But where does the frontier begin and end?
When people think about big data, they generally envision macroscopic scaling: in other words, bigness that spans the globe or, at the very least, some large data center. People generally associate big data platforms with server farms that sprawl across huge tracts of real estate, and which are populated by ever-larger racks of processing, storage, memory, and interconnect nodes arranged in endless rows.
However, big data platforms must also push the nanoscopic frontier of scalability. It’s useful to think of a consolidated big data platform, architecturally, as a fractal structure. What that means is that big data must be self-similar on all platform scales—from macro to nano—and leverage the full parallel processing resources that are available at each level.
Big data’s self-similarity comes from one of its central architectural features: pervasive parallelization. This architectural principle should operate concurrently at on three levels of scalability in the architecture of your big data platform:
Scale-in will steadily increase its importance in your scalability strategy. Miniaturization remains the juggernaut uber-trend, and subatomic density is its frontier. We all know that today’s handheld consumer gadgets have far more computing, memory, storage, and networking capacity than the state-of-the-art mainframes that IBM and others were selling back in the Beatles era. And we’re all starting to get our heads around quantum computing, atomic storage, synaptic computing, and other “scale-in” approaches that will keep pushing Moore’s Law forward for the foreseeable future.
For enterprises that are serious about pervasively scaling their consolidated big data architectures, a balanced strategy should incorporate all three approaches—scale-up, scale-out, and scale-in—depending on the workloads you’re trying to optimize.
You will need to scale big data elastically and concurrently toward both the infinite and the infinitesimal.
What do you think? Let me know in the comments.
Forrester report: Extract business value from social content
IBM white paper: Could your content be working harder—smarter?
And take advantage of open source InfoSphere Streams components
Podcast: Build a business case for real-time analytics
White paper: Deploy Hadoop to gain insights from mainframe data
Big data in a minute: Lighten the big data load