Big Data and Warehousing

Why Data Matters

How volume, variety, and velocity are making big data a business-level concern

In business, data has evolved from the domain of accountants and IT to an ever widening audience consisting of both users within the business, and the customers and vendors outside the business. As citizens in an increasingly digital world, data surrounds us and connects us to data and to people: friends, associates, or people we don’t even know or tangentially know.

Today, we continue to generate data consciously, from every item we purchase, to medical records from a visit to the doctor, to Internet searches and interactions with social networks. All of this data is captured and recorded with our implicit or explicit knowledge.

However, data is also generated unconsciously, without our knowledge, as we move through the world and live our increasingly digital lives. Cellular phones generate location data, and our cars generate location data and diagnostic data. In our own homes, smart meters generate data about when we heat and cool our homes, or do our laundry. Sensors like GPS in automobiles and cellular phones, cellular towers, and smart electric meters generate data without a direct action, other than living our day-to-day lives.

This sensor and machine data often drives automated processes, setting off a chain reaction of processes that create even more data. For instance, smart meters allow utilities to better understand usage patterns and predict demand to prevent both unnecessary power generation and power shortages, while location data helps cellular carriers manage your phone call.

Today, data moves and grows at the speed of machines, producing the phenomenon known as big data. This data is often characterized by the three Vs: volume, velocity, and variety. Big data presents new opportunities and challenges for business. Fortunately, the economics of data have changed, giving rise to new models to manage and process data, allowing companies to pursue new strategies to enhance their existing data models and processes.

The volume of data pushes companies to more aggressively pursue strategies such as massively parallel processing to speed processing, and archiving to manage storage resources efficiently. The velocity of data encourages the processing and filtering of data in streams to reduce “noise” in the data stream and provide real-time analysis. Increasingly companies are discovering that the best way to expand their business is not by looking at their own customer data, but to look outside their own walls, and leverage social data and public data sets to increase the variety of data they integrate and process, allowing the discovery of new insights from their own data.

Time is quickly becoming an important dimension of data. In the past, data moved slowly, often updated on monthly, weekly, daily or hourly batch cycles. As data accelerates, and the time interval between updates shrinks, it becomes a stream of continuous updates. In the past, businesses could manage data at rest. In today’s “always on, always connected” world of data driven by mobile devices and their demanding users, successful businesses must handle data in real-time (streams), near-time (operational analytics), and at rest.

Compliance with regulations requiring detailed history of data and auditing of data has brought temporal semantics to the forefront. Data changing frequently makes “WHEN” (the time dimension) a vital part of ensuring data accuracy, and so IBM database software such as DB2 have embraced temporal semantics, removing the burden from application code.

The time dimension is a vital part of understanding when data is valid and when it is no longer useful. Some data has value only within a narrow window of time, and so it must be analyzed and acted upon in-stream, using IBM products such as IBM InfoSphere™ Streams. If data is not acted upon within that window, the data is no longer valid, and it can be discarded or archived.

The time dimension is also critical to data governance and compliance. Different classes of data, such as data used to make a business decision in a public company, or medical data, or transaction data have different archiving and retention policies, some driven by government regulations. IBM provides a broad range of archiving and governance products across all types of data to help companies comply with growing regulations and growing data.

Data and the process of managing, refining, parsing, interpreting, analyzing, and governing data and turning it into trusted information will increasingly become a collaborative effort in the best organizations. IT will need to take on the role of explorer and steward, connecting data to an ever widening audience consisting of business users and identifying new sources of data that can provide new insights. Working together with your peers in other parts of the business that you might not often venture in the past will become necessary in companies that learn to turn trusted information into a competitive asset.

Previous post

IBM Big Data: Transforming E-mail Marketing Effectiveness for Constant Contact

Next post

It's All About the Data

Reed Meseck

Reed Meseck is Senior Competitive Specialist for IBM’s Information Management Division. In this role, Mr. Meseck can leverage his more than 25 years of technology and executive management experience to help add business value from a number of different perspectives to IBM’s portfolio of software products, and advance IBM’s competitive position in the marketplace. His technical roles have spanned from software engineer to architect to consultant to researcher, working on numerous software and hardware products including operating systems, firmware, compilers, In-Circuit Emulators, performance monitors and data management systems. He is an accomplished author and lecturer, speaking at industry conferences such as ACM/SIGMOD, Software Development Conference, and numerous compensated seminars, and authoring numerous papers. He has also participated in standards organizations and was a member of the REXX ANSI committee.

  • jessi

    nice article!!

  • http://goo.gl/wH3qG Doug Laney

    IBM continues to reference the “3Vs” of big data without the professional courtesy of citing Gartner’s original research from 2001 first defining them: http://goo.gl/wH3qG