By Leon Katsnelson
By Susan Visser
By Bernie Spang
By the DB2 Guys
By Fred Ho
By Louis T. Cherian
By Shweta Shandilya
By Lawrence Weber
By Serge Rielau
By Dwaine Snow
You know your big data initiative is maturing when you have a platform, tooling, and operations that you can bet the business on.
But what exactly is maturity where information technologies are concerned? Fundamentally, it’s a measure of whether an IT investment is fully production-ready and fit for enterprise prime-time deployment. Assessing whether your big data initiative is mature involves proving—at the very least—that it has the robustness and bona fides, discussed below, that businesses demand of their core infrastructure.
Is there an existing maturity yardstick against which you can gauge the production-readiness of your big data platform? Yes. Look to your established enterprise data warehousing (EDW) environments, such as those built on IBM® PureData™ System for Analytics or other massively parallel platforms. Most likely, this environment is a field-proven operation that meets the criteria for production-readiness through a rich combination of capabilities, tools, and best practices. These elements include:
Prepare for a cold slap of reality: these management capabilities that you’ve taken for granted on your EDW are probably not all available in some of the newer big data platforms you’ve deployed. Just as sobering is the fact that the management tooling you have almost certainly doesn’t span the full hodgepodge of big data platforms you’ve deployed or are considering. If you’ve deployed tactical, siloed big data platforms for distinct applications—such as Hadoop, NoSQL, in-memory databases, graph databases, cloud databases, and so on—you often have to grapple with their siloed management features.
You should concern yourself with whether your chosen big data platforms—individually and as an end-to-end infrastructure—meet the production-readiness criteria.
For sure, Hadoop and other emerging big data platforms are rapidly ramping up the maturity curve, as their ecosystems reinvent all the requisite capabilities, tools, and best practices that were pioneered in the EDW area. And in fact, as anybody who’s watching the development of these markets can see, those ecosystems are rapidly spawning all the necessary maturity tooling by evolving much of what was pioneered for EDWs. But the maturation of these other big data approaches will take several years to come to fruition.
Maturation of best practices in the newer big data niches will come as some approaches succeed in the marketplace and become integral to standard operating procedures of users everywhere. If the industry can accelerate this maturation through standardization, users will be able to standardize their own practices that much faster. By the middle of this decade, we’re likely to see a significant, widely recognized body of best practices emerge in Hadoop and in-memory databases, at minimum—reflecting the pace at which users are investing in these approaches and bringing them in line with established EDW management practices.
Look for big data best practices to crystallize first in high availability, database security, data governance, and cluster management that spans multitier topologies including Hadoop, EDW, and in-memory nodes. As they do, it will become evident that maturity is coming to the entire big data space and to each niche covered by those best practices.
For the next several years, much of the new product development in the big data arena will be geared toward playing catch-up with the EDW marketplace. The new big data platform niches are developing the tools for robust availability, reliability, security, governance, management, disaster recovery, optimization, and other enterprise-grade features we take for granted with EDW. In addition, all of the traditional EDW middleware and application ecosystem offerings—including data integration, data quality, virtualization, business intelligence, and predictive analytics—are being retooled or rethought entirely with these new big data platforms in mind.
How production-ready are your big data investments? What are you doing to make them ready? What tools and capabilities would you like to see IBM and other solution providers offer to help you make them ready? Let me know in the comments.
DB2 TechTalk: Deep Dive on BLU Acceleration in DB2 10.5, Super Analytics Super Easy
Thursday, May 30: 12:30 – 2:00 PM ET
Big Data Seminar 2013, Featuring Krish Krishnan
June 14 in New York City
marcus evans Pharma Data Analytics Conference
July 10-11 in Philadelphia
IBM Smarter Content Summit 2013
Big Data at the Speed of Business
Broadcast event replay now available
Information on Demand 2013: Early Bird Registration Now Open
November 3-7 in Las Vegas