OSCAR3 (Open Source Chemistry Analysis Routines) is software for the semantic annotation of chemistry papers. The modules OPSIN (a name-to-structure converter) and ChemTok (a tokeniser for chemical text) are also available as standalone libraries.
The latest version of OSCAR3 can be downloaded from SourceForge
OSCAR3 requires Java, and is designed to work with Java 1.5 and higher.
OSCAR3 should run on Linux machines, most MACs, Windows 98/ME/XP/2K/2003/Vista/7.
- Documentation: OSCAR3 is provided with a readme file to get the software running. Once running, HTML documentation is available through OSCAR3's web interface.
- Licence: OSCAR3 is distributed under all clauses of the artistic licence v1.0 except clause 8.
- Source is available from SourceForge
- Known issues
- OSCAR3 Announce mailing list
- OSCAR3 Developers mailing list
- OSCAR3 support page
OSCAR3 is developed by the Murray-Rust Research Group at the University of Cambridge.
Peter Murray-Rust describes OSCAR
In this video, Peter Murray-Rust (head of the group that developed OSCAR) discusses OSCAR's operation. He is joined by Daniel Lowe, one of OSCAR's developers. The video was recorded at the UK e-Science All Hands Meeting 2009.
Overview of OSCAR server
An overview of OSCAR3 showing the software being used to search PubMed abstracts, process them and browse the results.
This demo uses OSCAR3 to generate annotations to be used in the UIMA (Unstructured Information Management Architecture) framework, and uses the CAS visual debugger tool from the UIMA SDK.
This demo also uses OSCAR3 to generate annotations to be used in the UIMA framework. However, this demo uses the document analyser tool from the UIMA SDK.
What does it do?
Oscar3 is a tool for shallow, chemistry-specific parsing of chemical documents. It identifies (or attempts to identify):
- Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms, some enzymes and reaction names.
- Ontology terms: if you can do it by string-matching, you can get OSCAR to do it.
- Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections.
In addition, where possible the chemical names that are detected are annotated with structures, either via lookup or name-to-structure parsing ("OPSIN"), and with identifiers from the chemical ontology ChEBI
OSCAR3 also includes the Oscar Server, a Jetty-powered set of servlets. These provide the following services:
- Parsing of text/HTML by OSCAR.
- Text/InChI/SMILES/SMILES substructues/SMILES similarity search of papers, coupled with keyword and ontology-based search, using Lucene and the CDK.
- List of all names found / all names that co-occur with a search term or terms.
- Online management of a chemical/stopword lexicon.
- Manual editing of SciXML fragments containing named entities, for creating of gold standards and training data.
Add new attachment
List of attachments
|Kind||Attachment Name||Size||Version||Date Modified||Author||Change note|
|Oscar3-web-demo.htm||0.6 kB||1||04-Nov-2009 10:27||SimonHettrick|
|Oscar3-web-demo.swf||9369.3 kB||1||04-Nov-2009 10:27||SimonHettrick|
|UIMA-demo-1.htm||0.7 kB||1||04-Nov-2009 11:13||SimonHettrick|
|UIMA-demo-1.swf||7469.5 kB||1||04-Nov-2009 11:13||SimonHettrick|
|UIMA-demo-2.htm||0.7 kB||1||04-Nov-2009 11:13||SimonHettrick|
|UIMA-demo-2.swf||13839.8 kB||1||04-Nov-2009 11:13||SimonHettrick|