Behind the Buzz About NoSQL
Your company just introduced a new web application, and it’s changing your life. As an application developer, you’re expected to respond to any reported user problem by trying to fix it the same day. That means you’re suddenly modifying the application and uploading new versions to the web three or four times a day, not once or twice a month. And that’s why you’re starting to consider NoSQL.
Open source NoSQL data management systems can seem ideal for making rapid-fire application changes. Unlike traditional SQL relational databases, NoSQL data stores don’t require you to get approval for data model changes, ask DBAs to build a new column, or work with IT to ensure your changes won’t interfere with other applications running on the same database.
Normally, turning multiple changes around daily would require sign-offs from other parts of the organization that you just don’t have time to get. But with NoSQL, you don’t need them.
Looking to NoSQL to help boost agility
Agility is a prime motivator for many companies exploring NoSQL. Organizations are under constant pressure to deploy critical business applications that ensure competitive advantage and maximize revenue. As developers work to create or modify applications, this work often requires making changes to related databases.
NoSQL defines a category of open source database management systems that differ from the classic SQL relational database management system (RDBMS). Instead of using traditional tables, rows, and columns, NoSQL may be built to query groups of documents with loosely defined fields. “That means developers can avoid the formality of the relational approach that requires safeguards, checks, and balances when changes are made,” says Curt Cotner, vice president and chief technology officer for database servers at IBM. “With NoSQL, a developer can just make a change and go.”
Using traditional IT systems and SQL relational database technology, developers may not be able to change database-related applications without involving other segments of the IT team. “At the same time, SQL data stores such as IBM DB2 offer significant advantages that many organizations want to retain,” explains Cotner. “These organizations need a way to leverage the benefits of DB2 while satisfying the need for increased agility.”
The Many Flavors of NoSQL
Today’s open source NoSQL data stores fall into several different categories:
- Document store—Similar to a content management system, designed to store documents and index the documents for quick access
- Graph store—Designed for storing arbitrarily complex collections of data and expressing the relationships between data elements using triples, which are three-part statements, such as “Bob is 35”
- Key value store—Enables populating a database using keys, such as “Joe Smith,” and associated values such as his address
- Tabular store—Similar to a relational system, designed to hold data in a spreadsheet-like format where entries can be searched and retrieved
- XML stores—Specifically for XML content, using a query language for XML documents, such as Xquery
- Object stores—Designed specifically to store and retrieve objects based on their associated metadata
The NoSQL landscape is in flux. More than 100 open source databases are available, some of them just three to four years old with relatively small followings contributing to their development.
Recognizing the advantages of traditional SQL databases
NoSQL architectures often provide weak consistency guarantees, such as eventual consistency, compared to the full atomicity, consistency, isolation, and durability (ACID) characteristics of SQL databases. “The assurance provided by ACID is essential in many types of database-driven online transactions,” says Cotner. “Think about transfering funds between bank accounts. You need certainty across multiple changes to debit one account and credit another.”
These are critical considerations. ACID characteristics of SQL help guarantee that all parts of Cotner’s hypothetical financial transaction are carried out accurately, even if the system is interrupted by power loss or other failure. Atomicity guarantees that an incomplete transaction can’t happen. The consistency property ensures that all updates can be expected to propagate immediately and consistently through the system. Isolation refers to the requirement that no transaction should be able to interfere with another transaction. And durability means that transactions will not be lost if the database crashes.
In contrast, the majority of NoSQL data stores offer what is called eventual consistency. Data store updates eventually propagate throughout the infrastructure and application. Yet at any point in time a user may find updates have not completely propagated and do not yet appear in the user interface. “Instant updates aren’t a big deal for some types of transactions, such as online customers updating their addresses,” says Cotner. “Since people don’t move every day, eventual consistency is sufficient.”
But in some cases eventual consistency is just not good enough. “If a customer withdrew $20,000 from an account and then quickly withdrew another $20,000, it would be important to know that $40,000 was being withdrawn,” says Cotner. “You wouldn’t want to misinterpret the second $20,000 as a duplicate entry error. But without ACID built in, you might.”
Mature database management systems like DB2 also offer advantages like high availability and data compression that the newer NoSQL systems have not had time to develop. High availability is essential for guaranteeing uptime of business-critical applications, while compression greatly reduces the amount of memory needed for data storage. Finally, DB2 is designed and developed in the labs to deliver industry-leading performance and has a strong record in database performance benchmarks. Database performance is a key driver of overall system performance.
Having your cake and eating it too: DB2 with NoSQL APIs
Organizations that use DB2 can have the best of both worlds. To provide the agility of NoSQL while retaining the ACID, high availability and data compression benefits of DB2, IBM is enabling DB2 users to accommodate a new programming model using the graph store paradigm with data triples.
As of April 2012, IBM is providing this new programming model through a new application programming interface that supports multiple calls and a NoSQL software solution stack that ships with DB2. This NoSQL technology is included free of charge with DB2 on distributed platforms and the DB2 Connect product. The API works with any release of DB2. “While IBM is developing other types of NoSQL data stores for future use with DB2, we’ve developed and released the graph store technology first,” says Cotner. “It’s one of the most flexible and useful of the NoSQL approaches.”
In traditional relational databases, each row has an arbitrary number of columns. By contrast, everything in the graph store has just three columns—called a triple—representing data as a noun, a verb, and a noun. For example, “Gary lives in Gilroy” is a triple. As the developer, you could then insert another triple with the home address: 1234 South Street is in Gilroy. Working with this data, you could then make an additional entry: 1234 South Street has bedrooms, 4.
The data store could go on to include materials such as flooring, linked to another entry with the brand name of the wood flooring, and a link to the manufacturer describing the materials that make up the flooring. “As long as some nouns match up, one party can link any of its triples to any of another party’s triples,” says Cotner. “For instance, a graph store can easily show the relationship between a retailer’s version of a customer profile and a supplier’s version of the same profile. That way, both businesses get a fuller picture and can provide better service.”
Providing additional NoSQL and NewSQL capabilities
DB2 also offers a second type of NoSQL-like database: the XML data store. These data stores make it easy to manage today’s growing volumes of web-based data, and some open source NoSQL projects are based on XML as the underlying data type. DB2 users can manage an XML data store using existing DB2 capabilities.
The DB2 pureXML feature offers sophisticated capabilities to store, process, and manage XML data in its native hierarchical format. Additionally, DB2 users can use query languages such XPath and XQuery to process XML documents. The XML data model provides schema flexibility that can be a big advantage in use cases where there is ongoing change in the information being managed.
Some NoSQL and NewSQL approaches also focus on horizontal scalability. If more capacity is needed, an organization can spread the data and workload across more machines, making the server infrastructure as wide as necessary. “That scalability is also available when using NoSQL APIs with DB2,” says Cotner. “In fact, we’ve successfully tested DB2 with over 128 machines without encountering scalability problems.”
Lastly, the data compression capabilities built into IBM DB2 facilitate the NoSQL and NewSQL data store approaches that run the database entirely in main memory. These approaches are designed to boost performance by reducing the stress on critical resources like buffer pools. Data compression makes such an approach more practical by reducing the amount of memory required for data storage—leaving more memory available for other processing tasks and ultimately helping to reduce server hardware costs.
IBM’s Own Experience: Combining DB2 with a NoSQL Graph Store
The IBM Rational software group, which is focused on development environments, needs to manage an especially wide variety of data from different work teams. This makes the group a good candidate for setting up a graph store using NoSQL triples.
For comparison purposes, the graph store was first implemented using an open source NoSQL solution, but the IBM Rational team soon encountered performance and availability problems with the open source solution.
- The open source NoSQL solution used an asynchronous indexing mechanism.
- As new triples were added into the data store, this asynchronous indexing mechanism slowed significantly to process the new triples—and often stopped completely, locking up the entire database.
The graph store was then implemented using the DB2 API and NoSQL solution stack.
- The IBM Rational team saw a substantial performance improvement after switching to DB2.
- DB2 was four times faster than the open source solution.
Since DB2 is a mature technology, IBM developers and contributors have had ample time to eliminate database problems such as indexing lock-up that may still occur in some newer solutions.
Extending the capabilities of your DB2 environment
The NoSQL technology inside DB2 opens up new opportunities for your organization to seize the advantages of NoSQL. Organizations gain the flexibility and agility benefits of NoSQL deployments while retaining the ACID properties, availability, and performance advantages of the DB2 relational database technology. DB2 also brings with it enterprise-class attributes like compression that allow organizations to minimize disk space—attributes that open source projects don’t provide.
Organizations that use DB2 have a clear path when it comes to NoSQL. “You don’t have to lose the quality assurance of SQL or make extensive NewSQL architectural changes to gain NoSQL flexibility,” says Cotner. “Instead, you can use our NoSQL API to extend DB2 and implement the same paradigm that many of the NoSQL proponents are promoting—including horizontal scalability not available in traditional Oracle databases.”