Extending Taverna to Integrate MatLab
Google Summer of Code 2008 ideas
Primary mentor Katy Wolstencroft (katherine.wolstencroft at manchester.ac.uk
)
Secondary mentor Stian Soiland-Reyes (soiland at gmail.com
)
Background
The Taverna workbench (developed in the myGrid project
) provides a framework for designing and executing workflows using distributed data resources and analysis tools. As well as the ability to access distributed resources, Taverna allows the automation of experiments. The workflow itself defines how and when during an experiment a service should be invoked and the workflow can iterate over multiple data objects, enabling repetitive tasks to proceed without the scientist's intervention. By combining services together in Taverna workflows, scientists can automatically access and analyse large amounts of data from a large number of distributed resources from their own desktops. Accessing these resources at their source means that individual scientists do not require local supercomputing power and do not have the overhead associated with the maintenance of data resources.
To date, Taverna can access over 3500 services from the Life Science domain and this number is continually growing. These services are a mixture of Web services, local java services, R statistical analysis scrips, BioMoby services and many more. A growing number of scientists, however, have requested the ability to combine Taverna workflows with MatLab scripts. Matlab is a library of
Project Goals
Create a Matlab processor plugin for Taverna that will:
- Allow scientists to add Matlab scripts to their Taverna workflows
- Provide a GUI interface to the Matlab coding environment, so that scientists can build new scripts or modify scripts already in workflow
- Provide a mechanism for username and password access to both local and remote Matlab installations
The architecture of Taverna is extensible and there are already well-defined procedures for creating new processors. The challenge in this project is to build a processor that the scientists will find easy to use and will be a natural extension to their current working methods with Matlab
Project Requirements
To be able to do this project the student must be able to program in Java. Familiarity with GUI writing in Java and/or Matlab would also be an advantage
Notes
First of all we need to decide if we should be communicating with a Matlab installation, or if we should use an open source alternative such GNU Octave.
The advantage of going with Octave would be that since it's open source we're free to link to it, all the APIs should be accessible, etc. The disadvantage would be that the scripts probably would not be able to use any of the fancy optional packs/libraries that are available for Matlab, for instance various statistical tools. But another advantage would be that any Taverna user would be able to use the "Matlab" plugin without having a Matlab licence, hence making workflows containing Matlab scripts shareable. The licence of GNU Octave is GPL, so such a plugin would also have to be GPL. However, the plugin itself could be optional for Taverna, and would only "upgrade" Taverna from LGPL to GPL if installed.
On the other hand, if we are going to do a link to a real Matlab installation (which in itself requires a valid Matlab licence), then we need to look at what kind of APIs for invoking Matlab scripts that are available and legal to use. Since Taverna is LGPL it would be legal from our side to have a plugin that links with closed-source Matlab, and that plugin could (should!) also be LGPL, but we have not yet researched what kind of licensing would be required on the Matlab side.
As for the matlab scripts themselves we can assume that they are owned or at least permitted to be used by the workflow designer, so we shouldn't need to worry about licensing issues there.
Then we would like to see some consideration/rough research into what would be a viable way to communicate with Matlab or Octave from Taverna. Note that Taverna is written in Java. For instance, we already have support for running R scripts from Taverna, implemented using (already existing product) RServ, which is a kind of TCP/IP interface for sending an R script to an R server for execution.
If Matlab or Octave have public C interfaces, by using JNI for Java those could be wrapped and used directly from the plugin, again given that this is permitted by the license.
If there does not exist any MatLab "server" or open API to execute a Matlab script, one way to solve this would be to develop our own "Matlab execution server" in Matlab itself, using some kind of eval() to execute the script. This could potentially be a big task in itself (which we could narrow the project to), because there are issues such as how to serialise an array of array of floating point numbers when sending them over the wire.
Another hack that comes to mind would be to simply run Matlab from the command line, which we assume is possible. (Hopefully the candidate would probably know Matlab better than us!)
We are not expecting the student to write another implementation of the Matlab language itself, as that has already been done by say the GNU Octave people, and would be too big a task.
So in total what I'm asking the student now is possibly for a bit technical plan for how they could implement this. We don't need any Gantt charts or UML diagrams, but just some kind of list of "stories" and some rough sketches about how they see the final solution working.
For instance, the student should already have installed and played with Taverna and got an idea about how our workflow system works and how the plugin could fit in. Read about the Beanshell processor
and R processor
in the Taverna manual.
Feel free to append your applications with your views and comments on this.
Contact us
If you want to contact us for a relaxed chat, you can use GTalk/Jabber to stian@soiland-reyes.com - or you can join #taverna on IRC on irc.freenode.net





© The University of Southampton on behalf of OMII-UK. All Rights Reserved. |