The cancer BioInformatics Grid chooses Taverna
by Ravi Madduri, University of Chicago
Previous article: Interoperation takes a step forward
Next article: Cardiology will pump the deep web
Thousands of scientists spend their lives researching cancer, its causes and potential treatments. But there is currently no way in which researchers can easily share their results. Without adequate sharing, effort is wasted and the full benefit of the research cannot be gained. A group in the US aims to solve this problem with the founding of the cancer Biomedical Informatics Grid (caBIG™), which is using OMII-UK’s Taverna software.
To aid results sharing, the US National Cancer Institute sponsored the caBIG™ project. The service-based grid infrastructure for caBIG™ is called caGrid, and is built on the Globus Toolkit 4.0. It allows a wide variety of cancer-research data sources and analytical capabilities to be exposed as services. Currently, about 50 cancer centres and 30 other organisations are working collaboratively on caBIG™, and more than 68 services are provided in caGrid. Within this complex infrastructure, a number of challenges arise which are addressed by Taverna.
Discovery, the ability to find what is available in the community, is a common problem in any bioinformatics research community, and caBIG™ is no exception. A centralised index service is provided by caBIG™. It indexes service metadata, which allows users to discover the services that fit their needs. Taverna provides an extensible service-discovery mechanism called Scavenger that we customised to provide users with the ability to discover caBIG™ services from the Taverna user interface.
Composition is the process of describing an experiment by linking individual services into a workflow. This involves the addition of data and control dependencies between services. It may also involve data transformations if the output of one service is not in the format required for another service’s input. As opposed to business processes, which consist of complex control logic, scientific workflows are more focused on parallel data flow. In a data flow, tasks and links represent data processing and data transport. Tasks without data interdependencies can be executed in parallel. Taverna’s modeling style and the ability to drag and drop service operations is perfectly suited to the needs of the caGrid community.
Execution occurs when the workflow engine invokes the services in the order defined by the workflow. An engine for scientific workflows should be aware of the data and computation resources that it can leverage. Taverna provides a functional model, called implicit iteration, to ease parallel execution. This capability is useful when the cardinality of inputs cannot be estimated at build-time, which often happens in caGrid workflows. We extended the processor framework to create a caGrid processor that allows execution of WS-RF compliant caBIG™ services. We also added support for the resource pattern in caBIG™ services as part of the processor logic, and created a caBIG™ Workflow service around the Taverna Execution Engine. Users can now submit workflows to a workflow service running on a dedicated host, which means they can execute their workflows without dedicating their computer to Taverna.
Intermediate results generated by component services, as well as the final results, are of great value and deserve to be stored for later analysis. Taverna gives intermediate and final data items a unique identifier and stores them properly. The extension framework allows users to add customised functions like data visualisation, tracking and querying.
Scientific workflows can benefit greatly from community experience. Scientists can share data, services, workflows, and the knowledge obtained in doing experiments. Taverna’s sister project, myExperiment, is a Web 2.0 community for the sharing of workflows. Our future work will add the functionality to upload Taverna workflows to the caBIG™ portal. We are also working on adding the ability to invoke caBIG™ services that are secured using standard caBIG™ security and identity-management standards.
Clearly Taverna is well suited to the caGrid infrastructure. With its drag-and-drop service operations and XML Splitters, one can quickly connect services within a workflow without having to write complex XML/XPath code. It takes a good deal less time to construct a workflow with Taverna than with competing workflow editors. Taverna makes the cancer researchers’ job easier, and increases their productivity so they can spend more time on their research and less time worrying about the technology they need to rely on.





© The University of Southampton on behalf of OMII-UK. All Rights Reserved. |