Porting to Windows of the SPRINT framework which enables parallel computation of statistical analysis of post-genomic high-throughput biological data
Google Summer of Code 2010 ideas
Primary Mentor: Muriel Mewisson
Secondary Mentor: Terry Sloan
Project: http://www.r-sprint.org
.
Background
The analysis of genetic data requires large amounts of computational processing power and memory to complete. The last few years have seen the widespread introduction of high-throughput and highly parallel experiments in biological research. Microarray-based techniques are a prominent example, allowing for simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. These studies generate an unprecedented amount of data and test the limits of existing bioinformatics computing infrastructure. SPRINT (www.r-sprint.org) is a collaborative project between EPCC (www.epcc.ed.ac.uk) and the Division of Pathway Medicine (DPM) (http://www.pathwaymedicine.ed.ac.uk/) which aims to provide the microarray community with an easy access to High Performance Computing (HPC) in order to allow for the efficient analysis of post-genomic microarray data.
A popular tool for biostatisticians to analyse microarray data is the free statistical software package R/Bioconductor. However, R is inherently sequential and cannot be easily or efficiently used on HPC platforms without substantial modifications to the R code. SPRINT provides a Simple Parallel R INTerface to HPC allowing the biological researchers to reap the benefits of HPC while hiding the complexity of programming for HPC.
Project Goals
- Porting to Windows of the Simple Parallel R INTerface (SPRINT) framework currently implemented for High Performance Computing (HPC) on Unix
- Comparing the performances of the Windows implementation versus the Unix implementation
Project Description
SPRINT has two main components, an intelligent HPC harness and a library of parallelized R functions. The purpose of this project is to port the current implementation of the SPRINT harness which only runs on Unix platforms to Windows platforms. The SPRINT HPC harness is programmed in C using MPI, tools which are available for Windows platforms. Therefore the development work is to be carried out in C and MPI. The resulting implementation will be benchmarking and its performances compared to existing Unix implementation running on ECDF cluster and HECToR, the UK national supercomputing service. The student will work closely with the SPRINT team at EPCC and DPM.
Project Requirements
Essential: C, parallel programming, MPI, Windows programming
Advantage but non essential: R, statistical programming
Benefit to the Student
The student will:
- contribute to an ongoing research programme developing open source software;
- develop software for HPC clusters;
- gain an understanding of statistical methods used in the analysis of post- genomic high-throughput biological data;
- use general statistical tools such as R and Bioconductor.





© The University of Southampton on behalf of OMII-UK. All Rights Reserved. |