We have noticed that there is a lot of careless talk in the marketplace about the relationship between the trend toward big data as a new reality for IT and the emergence of cognitive computing as an alternative approach to traditional analytics. We are offering a perspective on the two trends in simple terms—differentiating among them and defining their relationship to each other—in the hope of cutting through the current atmosphere of media hype and confusion. Our goal is to help bring about a sense of clarity around the core issues, and it is our belief that this will help move the conversation about cognitive computing onto a crisper and more intelligent foundation.
In the introduction to this series of blog posts, we described four distinct but overlapping levels on which the terms big data and cognitive computing are currently operating. We need to get better at understanding and differentiating these meanings. The four levels are:
1) The mission or purpose of big data vs. that of cognitive computing
2) The foundation technologies of each
3) The functional description of what these trends and their technologies actually do for people
4) The symbolic level, where our public conversation has already transformed these terms into labels for various business strategies, worldviews, and hype campaigns.
Part 1 of this series addressed the issues around the mission or purpose of the two trends, concluding that it is important to understand that as its purpose, big data remodels the data center, the database, and the data warehouse to accommodate today’s transformed digital environment. Conversely, as its purpose, cognitive computing leverages a broad suite of evolving discovery, analysis, human interaction, and solution development technologies to offer a new kind of digital assistance or augmentation that operates in near-human terms.
In this Part 2 of the series, we take up the matter of technologies. We need to recognize that each trend—i.e. big data and cognitive computing—rests on a unique technology foundation. And we propose that the two foundations are related but also fundamentally different. So we have a “ground truth” based on distinct developments and innovations in the technology environment. For example, there is little dispute that big data is a phenomenon of the spread of digital technology across consumer, commercial, government, and scientific life (and most any other life you care to add). At the same time, cognitive computing has been associated with bringing computing machines to play in such challenges as bringing “human-like” insights to Jeopardy game-playing, making personal digital assistants intelligent, accelerating human genome analysis, and improving medical outcomes through diagnosis and treatment recommendations. All of these examples are based on the cognitive applications’ ability to process “beyond-human” quantities of disparate data while analyzing and presenting suggestive, non-trivial, timely solutions.
Prefiguring the emergence of the big data trend, we all recognize that in the spread of the internet, global access to inexpensive content “publishing” both personal and professional, the adoption of online video and other rich media, the explosion of mobile devices, the overnight rise of social and user-generated media, the proliferation of log files tracking all of this activity on a packet-by-packet basis, and on and on—the very nature of data has changed rapidly and is now irretrievably big.
So big is this big data, that in increasing numbers of applications, traditional means of creation, capture, organization, and storage of it threaten to break or become meaningless under the onslaught. As a result, innovative technologists are devising new approaches to try to keep pace. So Google, for example, faced the problem of how to manage the exponential growth of their web search indexes and came up with the idea of Map Reduce, an early approach to harnessing commodity hardware clusters to transform the level of efficiency of content processing and index creation. At roughly the same time that Google was focusing on these distributed processing innovations, technologists at Yahoo!, seeking to supplement the reigning SQL database storage paradigms for performance and scalability reasons, invented non-SQL storage models designed to replace traditional DBMS parallel processing with distributed processing. The Hadoop Distributed File System they developed is now the most prominent model, supplemented by an ecosystem of multiple “non-SQL” software packages that extend the capabilities and connectibilities of the Hadoop storage core.
I review this bit of recent technology history to point out that the term “big data” is not simply a reference to the quantity of bytes we now generate, although of course intuitively it is that as well. But more importantly it also references a set of software resources, assets, and practices that have now built up a legacy of over a decade of development and are supporting many of the most critical compute applications on the planet.
So what can we say about cognitive computing’s technology foundation? The first observation is that it does not have to be involved with big data at all. While IBM’s Jeopardy-playing Watson ingested an impressive quantity of encyclopedias, history books, magazines, political broadsides, and previous Jeopardy questions and answers, this hardly constituted a big data application on a scale familiar to Google, the intelligence community, telecommunications carriers, etc. Watson was much more dependent on “big memory,” as it utilized recent innovations in “in-memory” processing approaches to discover, synthesize, and statistically analyze possible responses to those arcane Jeopardy questions in real time. The Watson Jeopardy application is much more usefully understood as a data science triumph than as a big data feat.
Quoting the definition of cognitive computing presented on this site, its use as a human problem solver at the forefront:
Cognitive computing makes a new class of problems computable. It addresses complex situations that are characterized by ambiguity and uncertainty; in other words it handles human kinds of problems. In these dynamic, information-rich, and shifting situations, data tends to change frequently, and it is often conflicting. The goals of users evolve as they learn more and redefine their objectives. To respond to the fluid nature of users’ understanding of their problems, the cognitive computing system offers a synthesis not just of information sources but of influences, contexts, and insights.
The technology foundation of cognitive computing is not fundamentally about programming, processing, or storage paradigms, or about data flows and stream handling, but rather about the broad ranging data analysis technologies addressing discovery, disambiguation, contextual understanding, inference, recommendation, probabilistic reasoning, and human/machine communications. So instead of Map Reduce, Hadoop, No-SQL, Pig, Hive, Spark, Sqoop, and other big data tools and technologies, cognitive computing relies on technologies such as voice recognition, text-to-speech, language recognition, natural language processing in its many forms, machine learning in its many forms, neural networks, Bayesian statistics and inferencing, support vector machines, many kinds of statistical analysis, voting algorithms, not to mention a heavy dependence on human interaction and visualization design. We can layer cognitive computing on a big data foundation, if it is available, in order to understand, infer or reason about the evidence the data contains.
While I hope I have made an argument for the distinctness of big data and cognitive computing in both the mission and the technology levels, I want to close by pointing out the valuable relationship between the two trends. The most important symbiosis between the two is that the availability of big data-scale quantities of data is tremendously helpful for the kinds of machine learning algorithms and methodologies on which cognitive computing depends for the accuracy and contextual appropriateness of its answers or solution recommendations. The flip side of this value of increased power of analysis for cognitive applications is of course the new kinds of analytic value these applications offer those who are trying to make some kind of sense out the petabytes, exabytes, or zettabytes of data that are collecting in their enterprise big data “lakes” or black holes, or other kinds of repositories.
Big data and cognitive computing will continue to be interrelated and will continue to be spoken about together as if they were all of a piece. In fact, they are not, and in subsequent posts I will look at other distinctions as well as inter-relationships between the trends and the terms at the functional and symbolic levels.Share