By Leon Katsnelson
By Susan Visser
By Bernie Spang
By the DB2 Guys
By Fred Ho
By Louis T. Cherian
By Shweta Shandilya
By Lawrence Weber
By Serge Rielau
By Dwaine Snow
The idea behind Fit for Purpose Architectures is a deceptively simple (yet extremely powerful) one—and it will greatly impact your IT infrastructures over the next 10+ years.
Going forward, enterprise IT will have many more options than have historically been available to tightly pair a given compute request to an execution capability that is optimized for that task. The advantages of this tighter pairing have become compelling to a point where they cannot be ignored, and we are starting to see breakaway performance from firms that have mastered this pairing.
Many of these breakaways have been driven by big data use cases that spurred development of new technologies to solve unique problems. In some cases, there is an emphasis on sheer performance. In others, the technology is focused on data type manipulation. In every case, however, users need to be able to take the highly specific and optimized execution environment for a given compute task and relate it back to the rest of the existing enterprise architecture.
Fit for Purpose Architectures do not replace today’s largely relational-oriented systems but instead augment them, interoperate with them, and allow them to do what they do best. This pairing of traditional and new approaches has enabled novel solutions, such as real-time ad placement in the digital media space. We are now seeing never-before-possible solutions move into the traditional enterprise.
So why now? Over the past twenty years, enterprise architects have seen significant changes in how applications are delivered, integrated, and scaled. They have not, however, seen major shifts in where most of this data is stored, or the type of analytics being performed on the data.
The fields of structured data management and unstructured information management have matured, but matured separately. Persistence has largely meant relational data stores or a separate Enterprise Content Management (ECM) repository. The current model requires separating structured and unstructured data and treating them as separate endeavors. Unstructured data goes in the systems that have largely been designed to handle documents. Structured data goes in databases.
Most of the innovations of the past ten years have focused on ubiquity of access and developing web-based architectures, rather than on the base storage and compute paradigms. Analytics only happens once the data is stored—not while the data is being generated, and typically after a time-intensive ETL process. Our core paradigms of where information gets stored, how it’s queried, and how it’s managed throughout its lifecycle have remained relatively unchanged.
What we have seen in the last four years, however, is the emergence of technologies that are revolutionary rather than evolutionary in their approach to solving problems. The new components in a Fit for Purpose Architecture reject some of our long-held notions about atomic handling of transactions, immediate consistency, and the need to strictly structure the data in return for excellence in handling a specific compute problem.
By now, the stories of companies like Google, Facebook, LinkedIn, eBay, and Yahoo are well-known. These firms dealt with data challenges whose scale and scope was previously unheard-of—so they had no choice but to create and adopt revolutionary technologies.
Some of these revolutionary technologies include Hadoop (for ultra large-scale data-agnostic storage and jobs), Cassandra (for extreme write performance), Neo4J (for graph analytics), and InfoSphere Streams (for ultra-high performance and efficiency on in-motion data). All of these technologies provide new capabilities by making trade-offs where conventional database or ECM systems do not. In doing so, they provide levels of flexibility, scalability, and sheer performance that have not been available previously.
It is important to note, of course, that these revolutionary technologies do not render existing solutions any less critical. Relational databases will be the most broadly deployed technology because the two use cases where existing databases excel—high-performance complex queries on structured data and high transaction rates with strong transaction consistency guarantees on structured data—are extremely important for a broad set of application and data handling needs.
The long-term ramifications of the NoSQL or new-SQL movements that these organizations initiated and incubated are just now being felt in the enterprise. In the future, conventional enterprises will follow suit by tightly matching the underlying compute problem to the best platform for handling it. Relational databases will continue to be extremely important, but they will not be the default choice; they will be the most common for sure, but not presumed. The best way to solve a particular problem, including considerations such as future uncertainty over information sources and scale requirements, will drive selection of the technology. We are already seeing some of these technologies—mainly Apache Hadoop—be broadly deployed for sandbox or experimentation purposes. The next shift will see Hadoop and the other technologies move into production-oriented use cases.
We’ll cover this shifting paradigm in much more detail in future articles, with a specific eye on how big data technologies can be deployed in a Fit for Purpose Architecture model in conventional enterprises.
In the meantime, let me know what you think in the comments. Do you see this trend gaining traction in your organization? How is it affecting your technology selection processes?
IBM Big Data, Integration and Governance 2013 Forums
Attend an event near you to learn how leading organizations are making sense of massive amounts and new types of information to create value
DB2 TechTalk: Deep Dive on BLU Acceleration in DB2 10.5, Super Analytics Super Easy
Thursday, May 30: 12:30 – 2:00 PM ET
Informix Chat with the Lab: Primary Storage Manager (PSM) a Parallel Backup Alternative to Ontape
Thursday, May 30: 11:30 – 1 PM ET
Big Data Executive Summit
June 7 (Dallas) and June 10 (San Francisco)
Big Data Seminar 2013, Featuring Krish Krishnan
June 14 in New York City
Hadoop Summit North America
marcus evans Pharma Data Analytics Conference
July 10-11 in Philadelphia
IBM Smarter Content Summit 2013
Big Data at the Speed of Business
Broadcast event replay now available
Information on Demand 2013: Early Bird Registration Now Open
November 3-7 in Las Vegas