Industrial processes fascinate me, not just in their automation, speed, and efficiency, but also in their ability to give precise shape to an otherwise unruly raw material. Big data is next to useless as a decision-support tool if it’s not systematically hammered into forms that are consumable by end users. In IBM Data magazine the week of February 23, 2015, three new articles from established contributors discuss key aspects of industrialized data curation, ingest, and refinement.
Developers write the core execution logic that powers applications, and data is among the artifacts that they integrate into their creations. Paula Wiles Sigmon discusses how developers can leverage reusable cloud-based services to ensure that only refined data is incorporated into their applications. Having cloud-based refinement services that automatically incorporate quality data into applications can save developers a lot of work while ensuring they don’t overlook this critical data. “Because data refinement may not be on the developer’s own priority list,” says Sigmon, “all this discovery and preparation needs to be achievable without weighing down the developer with lots of additional work or slowing down the development process.”
Ingestion and other refinement processes must be structured, scalable, and robust to be considered industrial grade. Edd Dumbill describes the essential features of scalable ingestion processes, which ensure that only high-quality data gets delivered downstream, regardless of where it originated, what its latency or structure might be, or how much of it there is. This approach involves precision engineering of many repeatable processes. According to Dumbill, “The variety of ingestion processes is as diverse as all the potential data sources and needs to account for a broad spectrum of data formats, schemas, volumes, and update frequencies that are required.”
Extracting information from numerical data can be as simple as arithmetic or, if the numbers are embedded and contextualized within text, considerably trickier. Jacques Roy goes in-depth on how automated, repeatable text-analytics algorithms in the ingest process can extract information from numerical data. “Analyzing text doesn’t mean looking only for keywords,” Roy says. “Numbers can also be the information analysts need to glean from data.”
Thanks for reading and engaging. And please check out our latest NewsBytes and upcoming events for opportunities to educate yourself on the power of data.
Editor in Chief, IBM Data magazine
Industrial processes fascinate me, not just in their automation, speed, and efficiency, but also in their ability to give precise shape to an otherwise unruly raw material.