Introduction to IBM Watson

Watson is a question-answering computer system capable of answering questions posed in natural language, developed in IBM’s Deep QA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM’s first CEO, industrialist Thomas J. Watson.

Watson was created as a question answering (QA) computing system that IBM built to apply advanced natural language processing, information retrieval, knowledge representation, automated reasoning, and machine learning technologies to the field of open domain question answering.

The key difference between QA technology and document search is that document search takes a keyword query and returns a list of documents, ranked in order of relevance to the query (often based on popularity and page ranking), while QA technology takes a question expressed in natural language, seeks to understand it in much greater detail, and returns a precise answer to the question.

When created, IBM stated that, “more than 100 different techniques are used to analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses.”

In recent years, the Watson capabilities have been extended and the way in which Watson works has been changed to take advantage of new deployment models (Watson on IBM Cloud) and evolved machine learning capabilities and optimised hardware available to developers and researchers. It is no longer purely a question answering (QA) computing system designed from Q&A pairs but can now ‘see’, ‘hear’, ‘read’, ‘talk’, ‘taste’, ‘interpret’, ‘learn’ and ‘recommend’.

Software:

Watson uses IBM’s DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework implementation. The system was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using the Apache Hadoop framework to provide distributed computing.

Hardware:

The system is workload-optimized, integrating massively parallel POWER7 processors and built on IBM’s DeepQA technology, which it uses to generate hypotheses, gather massive evidence, and analyze data. Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight-core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM.

According to John Rennie, Watson can process 500 gigabytes, the equivalent of a million books, per second. IBM’s master inventor and senior consultant, Tony Pearson, estimated Watson’s hardware cost at about three million dollars. Its Linpack performance stands at 80 TeraFLOPs, which is about half as fast as the cut-off line for the Top 500 Supercomputers list. According to Rennie, all content was stored in Watson’s RAM for the Jeopardy game because data stored on hard drives would be too slow to be competitive with human Jeopardy champions.

Data:

The sources of information for Watson include encyclopedias, dictionaries, thesauri, news wire articles and literary works. Watson also used databases, taxonomies and ontologies. Specifically, DBPedia, WordNet and Yago were used.The IBM team provided Watson with millions of documents, including dictionaries, encyclopedias and other reference material that it could use to build its knowledge.

The above is a brief about IBM Watson. Watch this space for more updates on the latest trends in Technology.

Leave a Reply

Your email address will not be published. Required fields are marked *