Information Strategy

Introducing Fit for Purpose Architectures

A subtle shift is underway in IT. What will it mean for your organization?

The idea behind Fit for Purpose Architectures is a deceptively simple (yet extremely powerful) one—and it will greatly impact your IT infrastructures over the next 10+ years.

Going forward, enterprise IT will have many more options than have historically been available to tightly pair a given compute request to an execution capability that is optimized for that task. The advantages of this tighter pairing have become compelling to a point where they cannot be ignored, and we are starting to see breakaway performance from firms that have mastered this pairing.

Many of these breakaways have been driven by big data use cases that spurred development of new technologies to solve unique problems. In some cases, there is an emphasis on sheer performance. In others, the technology is focused on data type manipulation. In every case, however, users need to be able to take the highly specific and optimized execution environment for a given compute task and relate it back to the rest of the existing enterprise architecture.

Fit for Purpose Architectures do not replace today’s largely relational-oriented systems but instead augment them, interoperate with them, and allow them to do what they do best. This pairing of traditional and new approaches has enabled novel solutions, such as real-time ad placement in the digital media space. We are now seeing never-before-possible solutions move into the traditional enterprise.

So why now? Over the past twenty years, enterprise architects have seen significant changes in how applications are delivered, integrated, and scaled. They have not, however, seen major shifts in where most of this data is stored, or the type of analytics being performed on the data.

The fields of structured data management and unstructured information management have matured, but matured separately. Persistence has largely meant relational data stores or a separate Enterprise Content Management (ECM) repository. The current model requires separating structured and unstructured data and treating them as separate endeavors. Unstructured data goes in the systems that have largely been designed to handle documents. Structured data goes in databases.

Most of the innovations of the past ten years have focused on ubiquity of access and developing web-based architectures, rather than on the base storage and compute paradigms. Analytics only happens once the data is stored—not while the data is being generated, and typically after a time-intensive ETL process. Our core paradigms of where information gets stored, how it’s queried, and how it’s managed throughout its lifecycle have remained relatively unchanged.

What we have seen in the last four years, however, is the emergence of technologies that are revolutionary rather than evolutionary in their approach to solving problems. The new components in a Fit for Purpose Architecture reject some of our long-held notions about atomic handling of transactions, immediate consistency, and the need to strictly structure the data in return for excellence in handling a specific compute problem.

By now, the stories of companies like Google, Facebook, LinkedIn, eBay, and Yahoo are well-known. These firms dealt with data challenges whose scale and scope was previously unheard-of—so they had no choice but to create and adopt revolutionary technologies.

Some of these revolutionary technologies include Hadoop (for ultra large-scale data-agnostic storage and jobs), Cassandra (for extreme write performance), Neo4J (for graph analytics), and InfoSphere Streams (for ultra-high performance and efficiency on in-motion data). All of these technologies provide new capabilities by making trade-offs where conventional database or ECM systems do not. In doing so, they provide levels of flexibility, scalability, and sheer performance that have not been available previously.

It is important to note, of course, that these revolutionary technologies do not render existing solutions any less critical. Relational databases will be the most broadly deployed technology because the two use cases where existing databases excel—high-performance complex queries on structured data and high transaction rates with strong transaction consistency guarantees on structured data—are extremely important for a broad set of application and data handling needs.

The long-term ramifications of the NoSQL or new-SQL movements that these organizations initiated and incubated are just now being felt in the enterprise. In the future, conventional enterprises will follow suit by tightly matching the underlying compute problem to the best platform for handling it. Relational databases will continue to be extremely important, but they will not be the default choice; they will be the most common for sure, but not presumed. The best way to solve a particular problem, including considerations such as future uncertainty over information sources and scale requirements, will drive selection of the technology. We are already seeing some of these technologies—mainly Apache Hadoop—be broadly deployed for sandbox or experimentation purposes. The next shift will see Hadoop and the other technologies move into production-oriented use cases.

We’ll cover this shifting paradigm in much more detail in future articles, with a specific eye on how big data technologies can be deployed in a Fit for Purpose Architecture model in conventional enterprises.

In the meantime, let me know what you think in the comments. Do you see this trend gaining traction in your organization? How is it affecting your technology selection processes?
 

 
Previous post

Failure Is Not a Four-Letter Word

Next post

Overcoming Performance Obstacles in Data Encryption

Tom Deutsch

Tom Deutsch (Twitter: @thomasdeutsch) is chief technology officer (CTO) for the IBM Industry Solutions Group, and focuses on data science as a service. Tom played a formative role in the transition of Apache Hadoop–based technology from IBM Research to the IBM Software Group, and he continues to be involved with IBM Research's big data activities and the transition from research to commercial products. In addition, he created the IBM® InfoSphere® BigInsights™ Hadoop–based software, and he has spent several years helping customers with Hadoop, InfoSphere BigInsights, and InfoSphere Streams technologies by identifying architecture fit, developing business strategies, and managing early stage projects across more than 200 engagements. Tom came to IBM through the FileNet acquisition, where he had responsibility for FileNet’s flagship content management product and spearheaded FileNet product initiatives with other IBM software segments, including the Lotus and InfoSphere segments. Tom has also worked in the Information Management in the CTO’s office and with a team focused on emerging technology. He helped customers adopt innovative IBM enterprise mash-ups and cloud-based offerings. With more than 20 years of experience in the industry, and as a veteran of two startups, Tom is an expert on the technical, strategic, and business information management issues facing the enterprise today. Most of his work has been on emerging technologies and business challenges, and he brings a strong focus on the cross-functional work required to have early stage projects succeed. Tom has coauthored a book on big data and multiple thought-leadership papers. He earned a bachelor’s degree from Fordham University in New York and an MBA degree from the University of Maryland University College.