Bioinformatics, e-Science and the Grid: a symbiotic relationship
Previous article: Making the grid invisible
Next article: GSoC Students
As the first in a series of articles focusing on different aspects of e-Research, we asked myGrid’s Katy Wolstencroft for a précis on Bioinformatics.
In the last decade, the field of bioinformatics has moved from being a specialist discipline on the fringes of biological sciences to an integral part of laboratory research. This change came about largely from great technological advances that have enabled whole genomes to be sequenced for the first time. As the community rushed towards sequencing the entire human genome, smaller organisms, such as yeast and C.elegans, were completed. Biologists were rapidly swamped by vast amounts of useful data, and they looked towards bioinformaticians to find new ways of analysing and managing it. An important feature of this new data was that it was freely available over the internet, but it was not produced or stored in any one central location since many laboratories around the world were involved in its production.
To date, there are over 750 whole genome sequences available, allowing scientists to compare related organisms and make inferences about one organism from what is known about another (comparative genomics). This was just the beginning. Genome sequencing was the foundation for the development of other types of ‘omics’ - a collection of high-throughput and often computationally intensive techniques for analysing the expression levels of genes (transcriptomics), proteins (proteomics) or metabolites (metabolomics) in particular experimental conditions.
These developments led to even more data, and new requirements for computational power and distributed-computing technologies. This is the reason that the bioinformatics community became an early adopter of e-Research and grid technologies. Laboratory experiments are expensive in both time and resources, and it is often the case that individual laboratories cannot invest in the compute infrastructure required for large-scale analyses of the data they produce. The promise of grid infrastructure and distributed access to remote supercomputing centres means that bioinformaticians can ‘tap into’ these resources as and when they are required, making high-throughput experimentation accessible to a much wider group of scientists. For combining data produced locally with other data in the public domain, web services and workflows also play an important role. Where next for bioinformatics? The data deluge does not seem to be subsiding. Technological advances in the laboratory are continuing at break-neck speed. For example, the newly emerging ‘Next Generation’ sequencing technologies are expected to revolutionise the rate at which raw data can be produced over the next year.
There are also ever more sophisticated and ambitious projects in the pipeline to make use of the vast and growing wealth of in silico biological data. For instance, the 1000 genome project consortium is producing a deep catalogue of human genetic variation by analysing the genomes of 1000 different people from diverse ethnic backgrounds. The emergence of the new field of ‘systems biology’ is also having a large impact on bioinformatics. Systems biology combines many scientific disciplines, including bioinformatics, chemoinformatics, biology, chemistry and mathematical modelling. This results in yet more in silico biological information of even greater diversity. These projects tend to be large, distributed consortiums working towards grand challenges of building in silico models of cells, and even whole organisms. Perhaps we are on the way towards the virtual, digital human.
Katy Wolstencroft.





© The University of Southampton on behalf of OMII-UK. All Rights Reserved. |