Subscribe to our weekly newsletter

From the Editor

thinking-like-a-data-scientist

Thinking Like a Data Scientist

The best data scientists are critical thinkers par excellence. Acquiring skills in quantitative analysis and programming is not enough, and even a strong background in some domain specialty doesn’t guarantee a data scientist’s effectiveness. Data scientists also need the ability to drill conceptually through the heart of whatever problem they’re analyzing. Thinking like a data scientist is no mean feat. It may require thinking outside the box, and even beyond the scope of the current data set and constructed statistical model, in the quest for the golden insight.

In IBM Data magazine the week of January 26, 2015, we have three excellent new articles focusing on different types of thinking that are the key to productivity as a data scientist. People’s words and actions frequently disconnect from each other, and behavioral analytics is often a necessary counterweight to linguistic analysis when data scientists seek the heart of intention and propensity. First-time contributor Andy Thurai shows how Match.com’s data scientists tackled a thorny analytics problem by rethinking what data sources are most relevant to customer behaviors. “Specifically,” Thurai writes, “Match.com figured out that building models based on what people say—their wants—is not enough. In other words, it determined that users’ actions and their actual needs are totally different from their wants. Ultimately, Match.com was able to predict user behavior based on users’ actions on its Match.com website with an enhanced success rate.”

Telling factual stories with data analytics is where the best data scientists shine. Rich Hughes provides a comprehensive profile of what it takes to be an effective data scientist. And much of what it takes hinges on not only being able to craft compelling data visualizations, but also using them to explain their findings to colleagues, customers, and other stakeholders and interested parties. Thinking skills include the thoughtfulness of data scientists who are sensitive to the full social context of their efforts. Hughes cites a data scientist’s astute observation: “‘People make a mistake by forgetting that data science is a team sport.… [T]here’s not one single data scientist that does it all on [his or her] own.’”

Data scientists need to think like detectives. The data discovery process often involves sleuthing for clues regarding what data may be available and which of it is most relevant to the problem at hand. Edd Dumbill takes us inside the data discovery value chain, spelling out the diverse criteria that working data scientists use to sift through data sets of any volume, variety, and velocity. Just as important, he says, data scientists must grin and bear the tedium that such searches often entail. “In the data science world,” Dumbill writes, “obtaining and cleaning data is often recognized as always being 80 percent of the work of any analysis, because of database silos within organizations and a complex and opaque ecosystem of external data sources.”

Thanks for reading and engaging. And please check out our latest NewsBytes and upcoming events for opportunities to educate yourself on the power of data.

From the Editor

James Kobielus (@jameskobielus)
Editor in Chief, IBM Data magazine

 

Recent ArticlesSocials


Discover IBM Forums

IBM Big Data & Analytics Hub
Big Data
James Kobielus
Big Data Evangelist at IBM

Big Data and Analytics
Cortnie Abercrombie
Big Data and Analytics Category Team, Emerging Roles and Markets Leader at IBM

Big Data Strategy
Natasha Bishop
Big Data Strategy Lead at IBM

Data Science
Kirk Borne
Data Scientist and Professor of Astrophysics and Computational Science at George Mason University

Information Management
Beth Smith
General Manager, Information Management at IBM

InfoSphere Streams
Kimberly Madia
InfoSphere Streams Worldwide Marketing Manager at IBM

InfoSphere
David Corrigan
InfoSphere Director or Marketing at IBM

InfoSphere Portfolio
Paula Wiles Sigmon
InfoSphere Portfolio Program Director at IBM

Information Integration and Governance
Jeff Scheepers
Information Integration and Governance Marketing at IBM

InfoSphere Portfolio
Leslie Wiggins
InfoSphere Strategy and Marketing at IBM

developerWorks
Information Management and More
Susan Visser
Content and Social Curator at IBM

Other Community Blogs

The IBM DB2 Blog
News and Information from the IBM DB2 Team
The DB2 Guys

The World of DB2 for z/OS
DB2 11 for z/OS
Surekha Parekh

Toolbox.com
An Expert’s Guide to DB2 Technology
Chris Eaton

SoftwareTradecraft.com
Coding, Business Development, and Careers
Sam Lightstone

Contributors

Andy Thurai : Program Director, Application Programming Interfaces, Internet of Things, and Connected Cloud at IBM
Andy Thurai
Program Director, Application Programming Interfaces, Internet of Things, and Connected Cloud at IBM
Rich Hughes : Marketing Program Manager for Data Warehousing at IBM
Rich Hughes
Marketing Program Manager for Data Warehousing at IBM
Edd Dumbill : Vice President, Strategy, Silicon Valley Data Science
Edd Dumbill
Vice President, Strategy, Silicon Valley Data Science
Steve Adler : Information Strategist at IBM
Steve Adler
Information Strategist at IBM
Trey Anderson III : Product Manager, InfoSphere Master Data Management
Trey Anderson III
Product Manager, InfoSphere Master Data Management
Tony Andrews : Instructor, Themis, specializing in SQL and IBM Champion
Tony Andrews
Instructor, Themis, specializing in SQL and IBM Champion
Kevin Arhelger : Software Engineer at IBM
Kevin Arhelger
Software Engineer at IBM
Howard Baldwin : Freelancer Business and Technology Writer, Silicon Valley
Howard Baldwin
Freelancer Business and Technology Writer, Silicon Valley
Thomas Beebe : Senior Database Consultant at Advanced DataTools
Thomas Beebe
Senior Database Consultant at Advanced DataTools
David Birmingham : Senior Principal at Brightlight Consulting
David Birmingham
Senior Principal at Brightlight Consulting
Sam Bisbee : Chief Experience Officer (CXO) at Cloudant, an IBM Company
Sam Bisbee
Chief Experience Officer (CXO) at Cloudant, an IBM Company
Natasha Bishop : Big Data Strategy Lead at IBM
Natasha Bishop
Big Data Strategy Lead at IBM
Sean Blake : Certified Storage Consultant at IBM
Sean Blake
Certified Storage Consultant at IBM
Joe Borges : Senior Technical Specialist at BMO Financial Group
Joe Borges
Senior Technical Specialist at BMO Financial Group
Kevin Brown : Chief Architect for Informix at IBM
Kevin Brown
Chief Architect for Informix at IBM
Thuan Bui : InfoSphere Optim Technical Enablement, IBM Silicon Valley Lab
Thuan Bui
InfoSphere Optim Technical Enablement, IBM Silicon Valley Lab
Robert Catterall : DB2 Specialist at IBM
Robert Catterall
DB2 Specialist at IBM
Michael Cavaretta : Technical Leader, Predictive Analytics and Data Science, Research and Advanced Engineering at Ford Motor Company
Michael Cavaretta
Technical Leader, Predictive Analytics and Data Science, Research and Advanced Engineering at Ford Motor Company
Louis Cherian : Social Media Manager for IBM Data magazine
Louis Cherian
Social Media Manager for IBM Data magazine
Mandy Chessell : IBM Distinguished Engineer, Master Inventor, Fellow of the Royal Academy of Engineering
Mandy Chessell
IBM Distinguished Engineer, Master Inventor, Fellow of the Royal Academy of Engineering
Jeannie Cramer : Editorial Director, IBM Data magazine
Jeannie Cramer
Editorial Director, IBM Data magazine
Tom Deutsch : Program Director, Big Data at IBM
Tom Deutsch
Program Director, Big Data at IBM
Trisha Dutta : Product Manager, InfoSphere, at IBM
Trisha Dutta
Product Manager, InfoSphere, at IBM
Brian Eckhardt : CEO, InfoMagnetics Technologies Corporation
Brian Eckhardt
CEO, InfoMagnetics Technologies Corporation
Thomas Eunice : Software Architect at IBM
Thomas Eunice
Software Architect at IBM
Ahmed Fattah : Client Technical Adviser, Financial Services and Public Sectors at IBM Australia
Ahmed Fattah
Client Technical Adviser, Financial Services and Public Sectors at IBM Australia
Al Finkelstein : Software Architect and Developer
Al Finkelstein
Software Architect and Developer
Andrew Foo : Senior IT Architect, Smarter Planet Solutions Team atn IBM
Andrew Foo
Senior IT Architect, Smarter Planet Solutions Team atn IBM
Ion Freeman : Data Scientist at Bank of America Merrill Lynch
Ion Freeman
Data Scientist at Bank of America Merrill Lynch
Robert Friske : Worldwide Product Manager at IBM
Robert Friske
Worldwide Product Manager at IBM
Sachin Ghai : Delivery Manager at IBM in Gurgaon, India
Sachin Ghai
Delivery Manager at IBM in Gurgaon, India
Dan Gibson : DB2 Lab Advocate at IBM
Dan Gibson
DB2 Lab Advocate at IBM
Cuneyt Goksu : zIM Specialist; Member, zChampions Team, Big Data, and Analytics
Cuneyt Goksu
zIM Specialist; Member, zChampions Team, Big Data, and Analytics
Rajesh Govindan : Manager, Information Management, SAP, and OEM, IBM Software Group, at IBM
Rajesh Govindan
Manager, Information Management, SAP, and OEM, IBM Software Group, at IBM
Peter Hagelund : Executive IT Specialist and Chief Architect, InfoSphere Optim Data Lifecycle Management, at IBM
Peter Hagelund
Executive IT Specialist and Chief Architect, InfoSphere Optim Data Lifecycle Management, at IBM
Fern Halper : Advanced Analytics and Cloud Computing Consultant at TDWI
Fern Halper
Advanced Analytics and Cloud Computing Consultant at TDWI
Stewart Hanna : Worldwide Sales Leader, Big Data, in the IBM Software Group
Stewart Hanna
Worldwide Sales Leader, Big Data, in the IBM Software Group
Bob Hayes : Chief Customer Officer at TCELab and President, Business Over Broadway
Bob Hayes
Chief Customer Officer at TCELab and President, Business Over Broadway
Marc Hebert : Chief Operating Officer at Estuate
Marc Hebert
Chief Operating Officer at Estuate
Nancy Hensley : Director, Strategy and Marketing, Database Software and Systems at IBM
Nancy Hensley
Director, Strategy and Marketing, Database Software and Systems at IBM
Fred Ho : Program Director at IBM
Fred Ho
Program Director at IBM
Praveenkumar Hosangadi : Product Marketing Manager and Part of the InfoSphere Team at IBM
Praveenkumar Hosangadi
Product Marketing Manager and Part of the InfoSphere Team at IBM
Jeff Huth : Product Manager, Information Management, and Context Computing at IBM
Jeff Huth
Product Manager, Information Management, and Context Computing at IBM
W. H. Inmon : Data Warehousing Expert
W. H. Inmon
Data Warehousing Expert
R. V. Joshi : Research Staff Member at IBM’s T. J. Watson Research Center
R. V. Joshi
Research Staff Member at IBM’s T. J. Watson Research Center
Art S. Kagel : Principal Database Consultant, Advanced Data Tools Corporation
Art S. Kagel
Principal Database Consultant, Advanced Data Tools Corporation
Mary Kasal : Senior Healthcare Professional
Mary Kasal
Senior Healthcare Professional
Lester Knutsen : President, Advanced DataTools Corporation
Lester Knutsen
President, Advanced DataTools Corporation
James Kobielus : Big Data Evangelist at IBM
James Kobielus
Big Data Evangelist at IBM
Mady Korada : Enterprise Architect
Mady Korada
Enterprise Architect
Elizabeth Koumpan : IBM Senior Certified Application Architect and Open Group Distinguished IT Architect
Elizabeth Koumpan
IBM Senior Certified Application Architect and Open Group Distinguished IT Architect
Mark Krafick : Senior DB2 Database Administrator and IBM Champion
Mark Krafick
Senior DB2 Database Administrator and IBM Champion
Krish Krishnan : Founder and CEO, Sixth Sense Advisors, Inc.
Krish Krishnan
Founder and CEO, Sixth Sense Advisors, Inc.
Karthik Kumar : Software Engineer at Intel
Karthik Kumar
Software Engineer at Intel
Richard R. Lee : Executive Consultant, Senior Thought Leaders and Managing Partner at IMEC
Richard R. Lee
Executive Consultant, Senior Thought Leaders and Managing Partner at IMEC
Christian Lenke : Lead Technical Sales Professional at IBM
Christian Lenke
Lead Technical Sales Professional at IBM
Sam Lightstone : Senior Technical Staff Member, Next-Generation Data Analytics, at IBM
Sam Lightstone
Senior Technical Staff Member, Next-Generation Data Analytics, at IBM
Jay Limburn : IBM Senior Technical Staff Member and Senior Inventor
Jay Limburn
IBM Senior Technical Staff Member and Senior Inventor
Stuart Litel : President, IIUG; CTO, Kazer Technologies; and IBM Gold Consultant
Stuart Litel
President, IIUG; CTO, Kazer Technologies; and IBM Gold Consultant
Guy M. Lohman : Manager, Disruptive Information Management Architectures, IBM Research
Guy M. Lohman
Manager, Disruptive Information Management Architectures, IBM Research
Kimberly Madia : Worldwide Data Security Strategist at IBM
Kimberly Madia
Worldwide Data Security Strategist at IBM
Ethel Mahoney : Worldwide Business Leader for Offerings and Marketing, Information Management
Ethel Mahoney
Worldwide Business Leader for Offerings and Marketing, Information Management
Biswajit Maji : Technology Architect at InfoSys Limited
Biswajit Maji
Technology Architect at InfoSys Limited
Jim Martin : US representative, Fundi Software
Jim Martin
US representative, Fundi Software
Sourav Mazumder : Big Data Architect, IBM Software Group at IBM
Sourav Mazumder
Big Data Architect, IBM Software Group at IBM
Vincent McBurney : Practice Lead, Information Management at Certus Solutions
Vincent McBurney
Practice Lead, Information Management at Certus Solutions
Bruce McGaughy : CTO and Senior Vice President of Engineering
 at ProPlus Design Solutions, Inc.
Bruce McGaughy
CTO and Senior Vice President of Engineering
 at ProPlus Design Solutions, Inc.
Mike Miller : Chief Scientist and Cofounder of Cloudant
Mike Miller
Chief Scientist and Cofounder of Cloudant
Cristian Molaro : Independent DB2 Specialist and IBM Gold Consultant
Cristian Molaro
Independent DB2 Specialist and IBM Gold Consultant
Swati Moran : Worldwide Channel Sales and Business Development Manager at IBM
Swati Moran
Worldwide Channel Sales and Business Development Manager at IBM
David Mould : Senior Predictive Analytics Scientist
David Mould
Senior Predictive Analytics Scientist
Nate Murphy : President, Nate Murphy and Associates
Nate Murphy
President, Nate Murphy and Associates
Graeme Noseworthy : Marketer, IBM Watson Foundations
Graeme Noseworthy
Marketer, IBM Watson Foundations
Terrence O’Donnell : Managing Editor, IBM Data magazine
Terrence O’Donnell
Managing Editor, IBM Data magazine
Lakshmi Palaniappan : Senior Software Engineer, IBM Silicon Valley Lab
Lakshmi Palaniappan
Senior Software Engineer, IBM Silicon Valley Lab
Forrest Palmer : Business Analytics Business Unit at IBM
Forrest Palmer
Business Analytics Business Unit at IBM
Chad J. Peruba : Business Unit Executive, Performance Management Strategy, at IBM
Chad J. Peruba
Business Unit Executive, Performance Management Strategy, at IBM
Frank Petersen : Systems Programmer
Frank Petersen
Systems Programmer
Stephanie Pettinos : Product Marketing, IBM Informix, IBM Software Group, Information Management
Stephanie Pettinos
Product Marketing, IBM Informix, IBM Software Group, Information Management
Alex Philp : Founder and President of GCS
Alex Philp
Founder and President of GCS
Dan Potter : Vice President, Product Marketing, Datawatch Corporation
Dan Potter
Vice President, Product Marketing, Datawatch Corporation
Jack Probst : Senior Consultant at Pink Elephant
Jack Probst
Senior Consultant at Pink Elephant
Tom Rieger : Manager, Information Management Research Team at IBM
Tom Rieger
Manager, Information Management Research Team at IBM
Matthew D. Riemer : Big Data Stampede Team Member, IBM Software Group, Information Management
Matthew D. Riemer
Big Data Stampede Team Member, IBM Software Group, Information Management
Dusty Rivers : Senior Systems Engineer and Principal Technical Architect at GT Software, Inc.
Dusty Rivers
Senior Systems Engineer and Principal Technical Architect at GT Software, Inc.
Adam Ronthal : Technical Marketing, Big Data, Cloud, and Aappliances
Adam Ronthal
Technical Marketing, Big Data, Cloud, and Aappliances
Robert Routzahn : Marketing Manager, InfoSphere Information Integration and Big Data at IBM
Robert Routzahn
Marketing Manager, InfoSphere Information Integration and Big Data at IBM
Jacques Roy : Worldwide Technical Sales, InfoSphere Streams at IBM
Jacques Roy
Worldwide Technical Sales, InfoSphere Streams at IBM
Suresh Sane : Senior IT Manager, Database and PeopleSoft Administration at Bi-Lo Holdings
Suresh Sane
Senior IT Manager, Database and PeopleSoft Administration at Bi-Lo Holdings
Aartika Sardana : Product and Channel Marketing, InfoSphere IIG Portfolio
Aartika Sardana
Product and Channel Marketing, InfoSphere IIG Portfolio
Arvind Sathi : Worldwide Big Data Architect at IBM
Arvind Sathi
Worldwide Big Data Architect at IBM
Neena Sathi : Executive IT Architect at IBM
Neena Sathi
Executive IT Architect at IBM
Berni Schiefer : Distinguished Engineer at the IBM Toronto Lab
Berni Schiefer
Distinguished Engineer at the IBM Toronto Lab
Jennifer Shin : Founder and Principal at 8 Path Solutions LLC
Jennifer Shin
Founder and Principal at 8 Path Solutions LLC
Paula Wiles Sigmon : Program Director for InfoSphere Portfolio Marketing
Paula Wiles Sigmon
Program Director for InfoSphere Portfolio Marketing
Gord Sissons : Product Marketing Manager InfoSphere BigInsights at IBM
Gord Sissons
Product Marketing Manager InfoSphere BigInsights at IBM
Olaf Stephan : Lead Services Specialist, Information Management, at IBM
Olaf Stephan
Lead Services Specialist, Information Management, at IBM
Kurt Struyf : DB2 Consultant at Suadasoft
Kurt Struyf
DB2 Consultant at Suadasoft
Julian Stuhler : Principal Consultant at Triton Consulting
Julian Stuhler
Principal Consultant at Triton Consulting
Manjunath B. Subramanian : IBM InfoSphere Master Data Management product manager at IBM
Manjunath B. Subramanian
IBM InfoSphere Master Data Management product manager at IBM
Scott Sumner-Moore : IBM Senior Certified Application Architect and Open Group Distinguished IT Architect
Scott Sumner-Moore
IBM Senior Certified Application Architect and Open Group Distinguished IT Architect
Hemant Suri : Product Manager, PureData System for Analytics, powered by Netezza at IBM
Hemant Suri
Product Manager, PureData System for Analytics, powered by Netezza at IBM
Richard Talbot : Director, Big Data, Analytics, and Cloud Infrastructure at IBM
Richard Talbot
Director, Big Data, Analytics, and Cloud Infrastructure at IBM
Mathews Thomas : Lead Architect, Communications Sector, IBM Global Industry Solution Center, at IBM
Mathews Thomas
Lead Architect, Communications Sector, IBM Global Industry Solution Center, at IBM
Brian Vile : Program Director, Product Marketing, InfoSphere, at IBM
Brian Vile
Program Director, Product Marketing, InfoSphere, at IBM
Boris Vishnevsky : Executive Senior Certified IT Architect at IBM
Boris Vishnevsky
Executive Senior Certified IT Architect at IBM
Janki Vora : Senior IT Specialist and Data Scientist at IBM
Janki Vora
Senior IT Specialist and Data Scientist at IBM
Mike Walker : Senior Database Consultant at Advanced DataTools
Mike Walker
Senior Database Consultant at Advanced DataTools
Anju Willard : Worldwide Product Marketing Manager, InfoSphere MDM
Anju Willard
Worldwide Product Marketing Manager, InfoSphere MDM
Isaac Yassin : IBM Gold Consultant, Information Management at IBM
Isaac Yassin
IBM Gold Consultant, Information Management at IBM
Chris Young : Managing Technology Writer at TDA Group
Chris Young
Managing Technology Writer at TDA Group
David Zaharchuk : Global Industry Research Lead, Institute for Business Value at IBM
David Zaharchuk
Global Industry Research Lead, Institute for Business Value at IBM
Paul Zikopoulos : Vice President, Information Management Technical Sales, Big Data, and Competitive Database, at IBM
Paul Zikopoulos
Vice President, Information Management Technical Sales, Big Data, and Competitive Database, at IBM