Big data analytics has emerged as a significant tool for business, government, and society to leverage the oceans of available data by using very powerful, relatively cost-effective analytics technology. Along with the challenging economic climate and accelerating pace of change, many organizations hope to use advanced analytics tools and data to surf these relentless waves of change. A rising number of organizations have therefore already begun to formulate their big data strategy and build their capabilities.
The ability to achieve the full potential of big data analytics requires not just data, tools, and infrastructure, but also quantitative skills to traverse the huge mountains of data. One key challenge organizations face is recruiting people with the skills to navigate all this data. Skills shortage in big data analytics is significant and is predicted to escalate. Some estimate the shortage to be in the hundreds of thousands in the US alone.1 In particular, many organizations are unable to fill the data scientist role that they have deemed so critical for big data analytics.
Data scientists have contributed immensely to the development of the big data analytics phenomenon. However, as organizations increasingly embark on this journey, a more strategic, cost-effective, and sustainable approach than current attitudes is needed. The approach outlined in this three-part series recognizes that a successful initiative depends on more than a single talent. It requires many roles—business and technical, internal, and external—all working collaboratively within a common vision, culture, and architecture. In short, it requires an analytics ecosystem.
The data scientist role has personified the big data analytics phenomenon and captured our imagination. Jeff Hammerbacher and D.J. Patil, who at that time were at Facebook and LinkedIn, respectively, coined the data scientist title in 2008.2 The role can also be seen as an extension of the Wall Street quantitative analysts—or “quants”—of the 1990s: highly intelligent, curious, mathematical individuals applying methods from eclectic fields to new problems.
Data scientists arose out of necessity in early data-driven companies such as Google, Yahoo, and Amazon. Back then there were no analytics tools and no big data platforms, but there certainly were large and rapidly growing mountains of data that could—thanks to Moore’s Law—now be fortuitously analyzed. Early data scientists fashioned their own tools, developed their own algorithms, and even conducted academic-style research.
As is typical of nascent technologies that are at the peak of inflated expectations, the role has been described using many superlatives that make it sound spectacular. For example, John W. Foreman, chief data scientist at MailChimp.com said data science “can call presidential races” and “reveal more about your shopping habits than you’d dare tell your mother.”3
Although no one doubts the enormous value of data science and the need to train more data scientists, the role should be critically examined to enable the democratization of big data analytics by making its practices more cost-effective and sustainable.
In a recent interview in ZDNet magazine, Andrew Nusca queried a prominent analytics vendor’s chief executive officer (CEO): “Data scientists! They’re in demand. They’re rare. They’re expensive. Business leaders think they need them, even if they’re not sure what they do. What gives?”4 The fact that data scientists are very hard to find and expensive are not the only problems. The unbalanced, almost exclusive focus on the role has diverted attention from some key aspects required to establish successful and sustainable big data analytics capabilities.
Some organizations have confused the skills with the individual. While the combination of mathematical, statistical, and coding skills are vital for big data analytics, these skills can be acquired and developed across a team5 and not just within a single individual. There is no doubt that the wait to find the ideal candidate has caused significant, possibly unnecessary, delays in starting the big data analytics journey in some organizations.
In some cases the wait and disappointment may cause organizations to postpone the pursuit entirely. Further, some organizations that thought themselves lucky to hire a data scientist discovered they needed much more than one individual to realize and scale the benefits of big data analytics. The following issues arise from the unbalanced, exclusive focus on the role of the data scientist:
Parts 2 and 3 of this series address the elements that can make big data analytics initiatives successful and delve into the details of core, extended, and external analytics ecosystems. In the meantime, please share any thoughts or questions in the comments.
1 “Big data: The Next Frontier for Innovation, Competition, and Productivity,” Insights & Publications report, McKinsey & Company, May 2011.
2 “Data Scientist: The Sexiest Job of the 21st Century,” by Thomas H. Davenport and D.J. Patil, Harvard Business Review, October 2012.
3 Data Smart: Using Data to Transform Information into Insight, by John W. Foreman, Wiley, November 2013.
4 “Do We Really Need Data Scientists?” by Andrew Nusca for Between the Lines, ZDNet, February 2013.
5 “Are You Recruiting a Data Scientist, or Unicorn?” by Jeff Bertolucci, InformationWeek, November 2013.
6 “Deriving Innovation from a Data-Driven Mind-set: Part 1,” by Ahmed Fattah, IBM Data magazine, January 2014.
7 “Hazards of Prophecy: The Failure of Imagination,” in the collection Profiles of the Future: An Enquiry into the Limits of the Possible, by Arthur C. Clarke, 1962 (rev. 1973).
8 “Made in IBM Labs: New Data Discovery and Visualization Capabilities Help Business Users Uncover Hidden Patterns via the Cloud,” IBM press release, November 2013.
IBM big data in a minute: Bringing the power of Hadoop to the enterprise
Video: The right tool for the job
Nature of analytics video: IBM and the swan of all fears
IBM redesigns its Big Data & Analytics website with IBM Watson Foundations capabilities
Visit a website with comprehensive resources dedicated to the chief data officer role
Podcast: Learn about the InfoSphere Streams project at GitHub