Computers can crunch mind-boggling arrays of data. They can even win quiz shows. But are there more powerful applications of this analytical power yet to come? Claudia Pearce ’89 M.S., ’94 Ph.D., computer science, is the Senior Computer Science Authority at the National Security Agency (NSA). The winner of UMBC’s Alumna of the Year Award in Engineering and Information Technology in 2014, Pearce is diligently seeking the answer to that question.
Watson is IBM’s Deep Question Answering system. You might recall that when Watson was put to the test against human contestants on the television quiz show, Jeopardy!, the system successfully bested its competitors in providing questions to answers whose associated question was already known. (And won $1,000,000.)
But like any game, Jeopardy! has its rules – and its limits. Along with my colleagues and others in the field who study big data and predictive analytics, I’ve been wondering whether the techniques implemented in Watson could be used as a powerful knowledge discovery tool to find the questions to answers whose associated questions are unknown.
Subspecialties in the fields of computer science and statistics such as knowledge discovery, machine learning, data mining, and information retrieval are commonly applied in medicine and in the natural and physical sciences – and increasingly in the social sciences, advertising, and cybersecurity, too. (It’s often called “computational biology” or “computational advertising.”)
And as the scope of computational practices has increased, the resources needed to perform it have shrunken tremendously. Ten years ago, massive computation was primarily in the areas of physics, astronomy, and biology, where petabytes of data were collected and analyzed using massive high performance computing systems. The advent of Cloud computing technologies –and their increasing public availability – now allows institutions, companies, and users to rent time for large-scale computation without the enormous costs of creating and maintaining supercomputers.
Additionally, programming and data storage paradigms have evolved to make use of the inherent parallelism in many domain applications. This trend has created new applications for computer science that provide individuals and organizations access to a plethora of online information in real time.
Real-time data sources spur not only social media, but online commerce, video streaming, and geolocation. Wireless technologies and smartphones put that information in the palm of our hands.
The power and speed of these technologies have aided the machine learning and data mining techniques at the heart of analytics, from retrieval of simple facts to trends and predictions. Advertising applications, for instance, analyze your click stream and cookies so that ads tailored to your interests appear as you browse in real time.
Yet the process of developing and maintaining analytics has its costs. First, there is labor. It usually requires teams of people to identify and solve problems in various domains. Analysts (who are usually experts in their subject) develop a collection of research questions in their discipline. They are teamed with statisticians, computer scientists, and others to develop and write programs to put the data in a usable form and create machine learning applications, tools, and algorithms. This combination of data and programming combines into analytics designed to answer a question in a given line of inquiry.
Continue reading more here: https://umbcmagazine.wordpress.com/umbc-magazine-spring-2015/beyond-watson/