Evolving Supercomputing in the Era of Big Data
The word supercomputer is everywhere these days, but in general industry parlance, it has never had a clear, concrete definition. To be honest, neither have mainframe, minicomputer, microcomputer, nor personal computer, but that never stopped these terms from becoming ubiquitous descriptors for computing architectures. However, supercomputer, like the other terms for computing platforms, has connotations galore.
In the high-performance computing (HPC) sector generally, and in our culture at large, the concept of a supercomputer, per Wikipedia, has historically pivoted on a platform’s status “at the front line of contemporary processing capacity—particularly speed of calculation.”1 To loosely paraphrase this point, a supercomputer is the biggest, baddest, and most powerful, integrated information-processing system that humankind has devised so far. And, by biggest, baddest, and most powerful, what people usually mean is that the supercomputer is primarily optimized for the most demanding calculation workloads.
The concept of a supercomputer is so malleable that it has been applied to many types of distributed architectures that go beyond monolithic HPC systems in sprawling data centers. One example is the crowdsourced supercomputer—also known as virtual supercomputer—running on the IBM® World Community Grid® site, an IBM-managed environment that harnesses the surplus computer power donated by volunteers.
In a similar vein, someone recently asked me to speak about the IBM Watson™ “supercomputer.” I paused briefly, and then launched into my discussion. I don’t often hear the term supercomputer applied to Watson, but I agree that it generally fits because of Watson’s awesome HPC performance in DeepQA analytics. Indeed, we have independent corroboration that at 80 trillion floating-point operations per second (also known as 80 teraFLOPS), Watson would rank within the top 100 supercomputers in the world. However, IBM’s flagship supercomputer, in the classic sense, is undoubtedly Sequoia, which—built on the IBM Blue Gene®/Q architecture—is near the very top of all industry rankings of performance in this segment of the HPC market.
The evolution of supercomputing
These points caused me to wonder what supercomputing truly means in the era of big data analytics. Is the traditional speed-of-calculation concept still completely relevant in this new era? Are brute-force, number-crunching calculations the only thing we expect supercomputers to do? Are today’s supercomputers—of all stripes—being optimized for the most demanding big data analytics jobs as well? And do traditional supercomputing benchmarks—such as petaFLOPS (also known as quadrillions of FLOPS)—adequately capture the metrics that matter most in big data applications?
Just as important, how will supercomputing architectures need to evolve in the era of big data? Increasingly, organizations will be running their most extreme applications on massively parallel, big data cloud platforms. As that trend intensifies, big data clusters will need to be considered supercomputers. For starters, as we move into this new era, we need to ask ourselves whether the existing HPC benchmarks—petaFLOPS, gigateps, and so on—are sufficient to address all core big data workloads.
Fortunately, the supercomputing industry has already begun to address big data workloads in new performance benchmarks. Key among them is an alternative supercomputing benchmark called Graph 500, for which IBM is one of many organizations on the steering committee. Graph 500 is designed to measure supercomputer performance on the search and graph analysis functions at the heart of many leading-edge, big data applications. The metric denominates supercomputer big data performance in gigateps, which are billions of traversed edges per second. They involve searching the graph of connections between every point in a data set.
However, a critical problem emerges that involves measuring the big data performance envelope of a supercomputer or any other integrated system—balancing processors, memory, storage, I/O, and interconnect in its architecture. However, Graph 500 doesn’t appear to benchmark configuration balance.
The need for speed
In terms of the evolution of supercomputing architectures, clearly this benchmark and others will need to align with the inexorable industry push into exascale, all-in-memory, big data architectures. On the strong possibility that all-in-memory architectures will be pervasive by the end of this decade, now is the time to consider how they will grow to exascale while pushing against the speed limitations of RAM.
Yes, pondering the light-speed-surpassing wonderland into which quantum computing may deliver us is fun. But absent a practical breakthrough in that leading-edge technology, how can our current Von Neumann architectures evolve in spite of their inherent bottleneck? Wikipedia describes this bottleneck succinctly as follows:
|The shared bus between the program memory and data memory leads to the Von Neumann bottleneck, the limited throughput—data transfer rate—between the CPU and memory compared to the amount of memory. Because program memory and data memory cannot be accessed at the same time, throughput is much smaller than the rate at which the CPU can work. This seriously limits the effective processing speed when the CPU is required to perform minimal processing on large amounts of data. The CPU is continually forced to wait for needed data to be transferred to or from memory. Since CPU speed and memory size have increased much faster than the throughput between them, the bottleneck has become more of a problem—a problem whose severity increases with every newer generation of CPU.2|
A recent article discusses how this bottleneck might grow even more severe in the age of exascale computers. In the coming era, processor speed is likely to accelerate to a quintillion FLOPS, leaving today’s limited-memory bandwidth architectures in the dust.
The article also states that one important approach for expanding memory bandwidth in exascale architectures will be non-uniform memory access (NUMA). NUMA takes distributed caching to the next level by enabling processors in multiprocessor systems to transparently access local memory plus memory on different processors, buses, and networks. It also enables distributed architectures—for example, the IBM NUMA-Q® architecture shares memory pools efficiently for streamlined parallel processing of data-intensive tasks, such as big data analytics.
The next frontier for distributed computing
Just as supercomputing benchmarks must keep pace with architectures in continual flux, so must big data development paradigms. These paradigms will need to evolve to support efficient parallelization of data-intensive processes across exascale memory grids. One approach is Parallel Runtime Scheduling and Execution Controller (PaRSEC). It expresses programs as a directed acyclic graph of tasks with labeled edges that designate data dependencies.
Considering how everything is changing all around us, I’m not 100 percent sure that the concept of supercomputing is meaningful anymore. Big data’s aggressive push into all aspects of distributed computing has propelled massively parallel, cloud-oriented HPC architectures to the industry forefront.
Supercomputer feels like an unnecessary term these days. After all, super-duper, big data analytic performance is something we’re all starting to take for granted.
If you have any questions or thoughts about this topic, please share them in the comments.