Understanding ‘Watson’ and the Age of Analytics

Law Technology News

February 22, 2011

By Nick Brestoff, M.S., J.D.


By now, I’m pretty sure you’ve heard the news that IBM’s computer, Watson, bested two of the greatest human champions of the game show “Jeopardy!” Even though Watson blundered on one question on the first day, and mishandled the Final Jeopardy question on the second day, it has already demonstrated that it can successfully compete with the two most successful players the show has ever seen. Any of us would lose to either of them.


The question is: What’s under Watson’s hood? IBM isn’t saying much about the technical side of things, except that the company built the computer “on commercially available POWER7 systems [which] ensures the acceleration of businesses adopting workload optimized systems in industries where knowledge acquisition and analytics are important.” Some are calling it a “super search engine.” But Watson’s success is a fine reason for me to tell you more about what IBM has said about Watson, and to explain the concept of “concept search.”


Let’s walk through the “Jeopardy!” process. At the top of the show, the categories are revealed and verbalized by host Alex Trebek. The game starts when a contestant chooses a category, and a clue is revealed. For those of us at home, that’s when the clue is shown on screen. We read the clue while we listen to Mr. Trebek read it to us, which takes about three seconds.


According to Dr. Eric Brown, an IBM research manager who works with Watson (whose core technology is called “DeepQA,” for “deep question-answering”), Watson is not listening to host Mr. Trebek when he’s reading the clue. So speech recognition is not involved. Instead, Watson is taking a snapshot of the clue when it’s revealed and processing words. Now doesn’t that sound familiar? [You can read the Kurzweil website interview with Dr. Brown, from which the quotes in this article are taken, here: http://www.kurzweilai.net/how-watson-works-a-conversation-with-eric-brown-ibm-research-manager.]


So Watson starts by taking in the full text as an ASCII file. Then, during the three seconds when Trebek is reading the clue and the contestants can’t ring in, Watson is processing those words. And it’s a tricky bag of words at that. The clues include irony, puns, words with double meanings, metaphors, and so on.


Before it answers, Watson must rank its potential answers and decides if it’s confident enough about the answer it’s going to give, then ring in. (The buzzer is disabled while Trebek is reading. The contestants can ring in only when he finishes.)


First, let’s look at the hardware. According to Brown, IBM’s Blue Gene (which works great for highly parallel processing problems like protein folding or weather forecasting) was not up to the task, because what’s going on inside Watson is different. While Blue Gene uses 850 MHz power processors, Watson uses a bank of Power 750 servers, each of which as 32 cores, and each one of them is a 3.55 GHz processor. Each server has up to 256 GB of RAM, and Watson has 90 of them, which means that there are 2,880 cores.


Brown used the phrase “singe-threaded analytics” to describe what Watson is doing, saying the analytics are not internally parallelized, but are running as sequential processes. What are they? Again, IBM is not saying, but Brown said that Watson is running several copies of a particular analytic or collection of analytics. So several different searches are being run, he says, and each of which might generate 50 results; and each search can produce several candidate answers, so that Watson might have between 300 to 500 possible answers for the clue.


Here’s what happens next: “Now, all of these candidate answers can be processed independently and in parallel, so now they fan out to answer-scoring analytics that are distributed across this cluster, and these answer-scoring analytics score the answers. Then, we run additional searches for the answers to gather more evidence, and then run deep analytics on each piece of evidence, so each candidate answer might go and generate 20 pieces of evidence to support that answer.


“Now, all of this evidence can be analyzed independently and in parallel, so that fans out again. Now you have evidence being deeply analyzed on the cluster, and then all of these analytics produces scores that ultimately get merged together, suing a machine-learning framework to weight the scores and produce a final ranked order for the candidate answers, as well as a final confidence in them.”


The word “clustering” reminds me of the sort of concept searching that some of us do in the EDD world in order to search for potentially relevant evidence. There are several concept search methodologies, and you have probably heard of some of them. To give you an idea of how these methodologies connect back to our EDD world, here’s an (incomplete) list of some well-known vendors in the e-discovery space, and the concept methodologies they feature:

  • Content Analyst (Latent Semantic Indexing (LSI);
  • Relativity (by kCura), iConect, and Eclipse by IPRO (LSI);
  • Axcelerate by Recommind (Probabilistic LSI);

To learn more, there are Wikipedia entries for LSI and Probabilistic LSI. (Disclosure: I work with Content Analyst.)


Why do I surmise that Watson/DeepQA is using concept search? Because concept search is a way of finding patterns in unstructured data sets, such as the ones we face in the legal profession; they consist of documents, e-mails, spreadsheets, and so on. The big idea of concept search is to find documents (as output) that are responsive to a query (using a bag of words as input, including key word lists), based on co-occurrences. As output, we want documents that have our input words in them, but also the documents that do not contain any of those words, but which are, nevertheless, potentially related (they cluster together) and so are potentially relevant. Like Watson, we are looking for patterns.


And this brings me back to Watson and the good Dr. Brown, to show you that Watson is relevant to our community: According to Dr. Brown, “Watson does not take an approach of trying to curate the underlying data or build databases or structured resources, especially in manual fashion, but rather, it relies on unstructured datadocuments, things like encyclopedias, web pages [n.b., Watson is not connected to the Internet when it’s playing], dictionaries, unstructured content. . . . Similarly, when we get a question as input, we don’t try and map it into some known ontology or taxonomy, or into a structured query language. Rather, we use natural language processing techniques to try and understand what the question is looking for . . . .” (Italics added.)


So, what’s the upshot here? I say it’s this: We’ve been in the Information Age for some time now, and, with Watson, we’ve taken a baby step to maturity. Welcome to the Age of Analytics.


# # #


Reprinted with permission from the February 22, 2011 issue of Law Technology News © 2011 ALM Media Properties, LLC. Further duplication without permission is prohibited. All rights reserved.