OSCAR3 (Open Source Chemistry Analysis Routines) is software for the semantic annotation of chemistry papers. The modules OPSIN (a name-to-structure converter) and ChemTok (a tokeniser for chemical text) are also available as standalone libraries.


The latest version of OSCAR3 can be downloaded from SourceForge

System requirements

OSCAR3 requires Java, and is designed to work with Java 1.5 and higher.

OSCAR3 should run on Linux machines, most MACs, Windows 98/ME/XP/2K/2003/Vista/7.

Further information


OSCAR3 is developed by the Murray-Rust Research Group at the University of Cambridge.


Peter Murray-Rust describes OSCAR

In this video, Peter Murray-Rust (head of the group that developed OSCAR) discusses OSCAR's operation. He is joined by Daniel Lowe, one of OSCAR's developers. The video was recorded at the UK e-Science All Hands Meeting 2009.

Overview of OSCAR server

OSCAR overview video(info)

An overview of OSCAR3 showing the software being used to search PubMed abstracts, process them and browse the results.

UIMA demos

UIMA demo with CAS visual debugger tool(info)

This demo uses OSCAR3 to generate annotations to be used in the UIMA (Unstructured Information Management Architecture) framework, and uses the CAS visual debugger tool from the UIMA SDK.

UIMA demo with document analyser tool(info)

This demo also uses OSCAR3 to generate annotations to be used in the UIMA framework. However, this demo uses the document analyser tool from the UIMA SDK.

What does it do?

Oscar3 is a tool for shallow, chemistry-specific parsing of chemical documents. It identifies (or attempts to identify):

  • Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms, some enzymes and reaction names.
  • Ontology terms: if you can do it by string-matching, you can get OSCAR to do it.
  • Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections.

In addition, where possible the chemical names that are detected are annotated with structures, either via lookup or name-to-structure parsing ("OPSIN"), and with identifiers from the chemical ontology ChEBI

OSCAR3 also includes the Oscar Server, a Jetty-powered set of servlets. These provide the following services:

  • Parsing of text/HTML by OSCAR.
  • Text/InChI/SMILES/SMILES substructues/SMILES similarity search of papers, coupled with keyword and ontology-based search, using Lucene and the CDK.
  • List of all names found / all names that co-occur with a search term or terms.
  • Online management of a chemical/stopword lexicon.
  • Manual editing of SciXML fragments containing named entities, for creating of gold standards and training data.

Add new attachment

Only authorized users are allowed to upload new attachments.

List of attachments

Kind Attachment Name Size Version Date Modified Author Change note
Oscar3-web-demo.htm 0.6 kB 1 04-Nov-2009 10:27 SimonHettrick
Oscar3-web-demo.swf 9369.3 kB 1 04-Nov-2009 10:27 SimonHettrick
UIMA-demo-1.htm 0.7 kB 1 04-Nov-2009 11:13 SimonHettrick
UIMA-demo-1.swf 7469.5 kB 1 04-Nov-2009 11:13 SimonHettrick
UIMA-demo-2.htm 0.7 kB 1 04-Nov-2009 11:13 SimonHettrick
UIMA-demo-2.swf 13839.8 kB 1 04-Nov-2009 11:13 SimonHettrick
« This page (revision-17) was last changed on 13-Apr-2010 16:44 by SimonHettrick [RSS]

© The University of Southampton on behalf of OMII-UK. All Rights Reserved. | Terms of Use | Privacy Policy | PageRank Checker