OSCAR helps researchers by understanding chemistry
By Simon Hettrick, OMII-UK.
Previous article: Shantenu Jha: it’s good to have PALs
Chemistry – like any other language – lacks uniformity. New words are invented, old words fade out of use, styles of writing change and some writers suffer from less-than-perfect grammar. What’s more, there is no single way of referring to a chemical: one man’s salt is another man’s sodium chloride (and another’s NaCl). To search for a specific word in a chemistry text, a researcher must take into account every permutation of that word and every possible mistake in representing it. This is highly inefficient, and with more sources of chemistry information becoming available every day, it’s not getting any easier to find relevant information.
‘OSCAR’s primary purpose is to recognise concepts in text that have a precise meaning’ says Murray-Rust ‘OSCAR recognises chemical names, adjectives and processes, and is able to link them into their meaning using an ontology’. This frees the researcher from having to find every permutation of a specific word, because OSCAR automatically links the word with its alternatives. It also enriches the text by providing further information about the terms it identifies, such as chemical properties and molecular structure.
The key to OSCAR’s ability is natural-language processing. A carefully chosen corpus of documents, in which researchers had identified the words related to chemistry, have been processed by OSCAR. This has allowed it to learn how to identify the context in which chemistry-related words are used. OSCAR has also been programmed to look for clues that show a link to chemistry, for example the prefix methyl. The result is software that can process a text and determine whether the author used the word cat to refer to Chloramphenicol acetyltransferase, or something else. It does the job very well: in a test of precision and recall, OSCAR achieved 83%. Humans manage a slightly higher 90%, but they do the job many millions of times more slowly.
Three major European organisations have spotted the potential of OSCAR. The Royal Society of Chemistry are using OSCAR to make searches of their online journal papers more accurate. Christoph Steinbeck is investigating a similar system for the European Bioinformatics Institute that he says will ‘provide the community with the tools for large-scale harvesting of chemical data hidden in the past 100 years of printed literature’. The European Patent Office are investigating OSCAR-assisted searches, which will provide a far higher degree of certainty that all of the documents relevant to a patent application have been identified.
‘There are many benefits to open source… anyone can take the software and do roughly what they like with it, they can extend it… and we can expect contributions back from (our users)’ says Murray-Rust. The other main benefit is one of competition. OSCAR has competitors that perform a similar function, but they are closed-source packages that usually must be purchased before they can be used. OSCAR can be downloaded, installed and tested for free.
OSCAR’s open-source background put it in the right position for a collaboration with OMII-UK, which was funded by EPSRC and JISC. This year, OMII-UK developers used their software engineering expertise to improve OSCAR’s code and structure. The software’s performance was improved, the code was modularised to make future debugging and upgrading more straightforward, and an automatic test regime was developed and implemented.
Finding relevant information is one of the foundations of good research, but it is time consuming and can be extremely frustrating. OSCAR’s success stems from the fact that it has identified and fulfilled a need within the chemistry community, which should allow researchers to spend less time searching for information and more time performing their research.





© The University of Southampton on behalf of OMII-UK. All Rights Reserved. |