Cardiology will pump the deep web
by Simon Hettrick, OMII-UK
Previous article: The cancer BioInformatics Grid chooses Taverna
Next article: Rapid portlets are a hit with chemists
The Deep Web is the mass of data—making up about three quarters of the Web—that is inaccessible to most users. This hidden data represents a critical resource to consider as we embark on the creation of the Semantic Web. The SADI (Semantic Automated Discovery and Integration) project is working to expose Deep Web data – any Deep Web data – as if it were a Semantic Web resource. One early success has been the CardioSHARE project, which has exposed data for the use of cardiovascular health researchers.
One can imagine the connections between every conceivable input on the Web and the outputs of every conceivable Web Service as a virtual graph. This graph could be used to discover information by tracing the desired output from a Web Service through to the necessary input. However, the virtual graph changes constantly as the underlying analytical tools and data resources change. Rather than attempting to keep up with these changes – and falling prey to the well-known problems suffered by data warehouses – SADI dynamically queries the virtual graph as if it existed, but without instantiating it permanently. This is performed by constructing only those segments of the virtual graph needed to answer a given question at a given time, which it does by discovering and invocating the appropriate Web Services.
SADI is a Web Service framework that uses Semantic Web technologies (RDF/OWL) to discover and invocate Web Services. In this way, the output from a Web Service can be dynamically exposed on the Semantic Web when it is needed. SADI provides a prototype, standards-based query interface that explores these dynamically exposed Semantic Web resources, making them appear to be traditional, Semantic Web data stores.
CardioSHARE (Cardiovascular Semantic Health and Research Environment) is based at the iCAPTURE Centre for Cardiovascular and Pulmonary Research in Vancouver. It is an application of SADI aimed at health researchers, which allows complex queries to be simplified into straightforward named-references to commonly understood subjects, data-types, biological relationships, or biological properties. CardioSHARE makes it easy to compose complex queries, by hiding the complexity of the query and any analytical steps, from the user.
For example, a doctor may find from his studies of heart disease that a certain type of patient was more responsive to the drug Warfarin. He publishes this information in a manuscript, and simultaneously puts the same OWL definition of this type of patient on the Web as MyName:WarfarinHyperResponders. Another doctor, on seeing the publication could then immediately query her database for patients of type MyName:WarfarinHyperResponders’. This would allow her to check whether any of the patients in her pending drug-trials fall into this category, so that she could accommodate this new knowledge and amend her drug-trial protocol .
The structured approach to sharing expert knowledge made possible by SADI will forever change the way knowledge discovery is achieved. The simplification of complex query and analysis tasks, encourages more frequent and deeper exploration of existing data. This will, no doubt, result in a large number of fortuitous discoveries as knowledge that is buried deep in the Web is exposed.





© The University of Southampton on behalf of OMII-UK. All Rights Reserved. |