Workflows for William-Beuren Syndrome analysis using Taverna
Williams-Beuren Syndrome (WBS) is a rare, sporadically occurring microdeletion disorder caused by a 1.5 Mb deletion located in chromosome band 7q11.23. It is a complex, multisystem genetic disorder characterised by a complex phenotype of physical and behavioural attributes.
The region most commonly deleted in WBS is approximately 1.5Mb and typically causes the deletion of 24 genes. This region is flanked by 320-500kb of highly repetitive sequence. The repetitive and complex nature of which makes it difficult to sequence and difficult to map. Consequently, this region contains gaps in the genomic sequence and could contain genes, pseudogenes or regulatory elements that contribute to WBS. In order to fully understand the pathology of WBS and to determine genotype to phenotype correlations, a complete and comprehensive map of the WBS region is required.
The aim of the project is to close the genomic gaps in the WBS region and characterise any genes or regulatory elements that are discovered. myGrid workflows were used to automate the time-consuming and repetitive series of analyses required to achieve this objective.
The sequencing effort in the human genome is a continuous process. Sequencing over gapped regions is ongoing. As new sequence is produced, it can be compared to the known sequence surrounding the gaps to determine any overlap. If there are sequences with overlap, these can be investigated further to characterise genes and extend the mapped region.
This type of analysis involves the use of multiple services at multiple sites, for example, BLAST for similarity searches, GenBank to retrieve new sequence data and RepeatMasker to mask repetitive DNA sequence regions. For gene characterisation, gene finding tools need to be used, such as GenScan, followed by functional motif identification tools, such as, signalP and pscan, after potential genes have been translated into amino acid sequences.
The WBS analyses described require intensive input from the bioinformatician. Results from one analysis must be cut-and-pasted into the input for the next. Reformatting is often required between analyses, making the process time-consuming and the mundane, repetitive nature of the exercise makes it prone to human error.
Automating the WBS analyses using myGrid workflows reduces these problems. Scheduling of workflow services to run in series means that the bioinformatician is free to do other research, perhaps running other workflows, whilst the experiment is running.
The careful capture of provenance information during the experiment invocation and the ability to capture results and semantic details of experiments in the myGrid Information Model and KAVE (Knowledge Annotation and Verification of Experiments) also provide great advantages in data handling.
Performing a single WBS analysis manually can take anywhere between 1 and 2 weeks. Performing the same analysis using myGrid can reduce this time to a matter of hours.
Figure 1 shows the results of 4 workflow cycles (approximately 10 hours). The gapped region in this case contained a complement of known genes. All were identified correctly and their relative map positions in the region were able to be determined, refining the knowledge of the WBS region.

Figure 1
R. Stevens, H.J. Tipney, C. Wroe, T. Oinn, M. Senger, P. Lord, C.A. Goble, A. Brass and M. Tassabehji Exploring Williams-Beuren Syndrome Using myGrid in Proceedings of 12th International Conference on Intelligent Systems in Molecular Biology, 31st Jul-4th Aug 2004, Glasgow, UK, published Bioinformatics Vol. 20 Suppl. 1 2004, i303-i310,
Motivation: In silico experiments necessitate the virtual organization of people, data, tools and machines. The scientific process also necessitates an awareness of the experience base, both of personal data as well as the wider context of work. The management of all these data and the co-ordination of resources to manage such virtual organizations and the data surrounding them needs significant computational infra-structure support.
Results: In this paper, we show that myGrid, middleware for the Semantic Grid, enables biologists to perform and manage in silico experiments, then explore and exploit the results of their experiments. We demonstrate myGrid in the context of a series of bioinformatics experiments focused on a 1.5 Mb region on chromosome 7 which is deleted in Williams-Beuren syndrome (WBS). Due to the highly repetitive nature of sequence flanking/in the WBS critical region (WBSCR), sequencing of the region is incomplete leaving documented gaps in the released sequence. myGrid was used in a series of experiments to find newly sequenced human genomic DNA clones that extended into these 'gap' regions in order to produce a complete and accurate map of the WBSCR. Once placed in this region, these DNA sequences were analysed with a battery of prediction tools in order to locate putative genes and regulatory elements possibly implicated in the disorder. Finally, any genes discovered were submitted to a range of standard bioinformatics tools for their characterization. We report how myGrid has been used to create workflows for these in silico experiments, run those workflows regularly and notify the biologist when new DNA and genes are discovered. The myGrid services collect and co-ordinate data inputs and outputs for the experiment, as well as much provenance information about the performance of experiments on WBS.
ViewFurther details on research using myGrid can be found here
© The University of Southampton on behalf of OMII-UK. All Rights Reserved. |
Terms of Use |
Privacy Policy |