New subject: [Ct-grid] Next BiG Grid Meeting: Monday 7 June at 13.30h in room H3.31

4 Jun 2010


      The Next BiG Grid meeting will be held on Monday 7 June
from 13.30-14.30h in room H3.31.
Agenda:
- Operations summary
- Middleware
- Applications
- AOB
- Presentation By Jan Bot:
*Logic Networks on the Grid: handling 15 million jobs* (abstract below)
This meeting may be joined by teleconferencing[1] in the Big Grid room.
The appointed volunteer to take minutes is Ronald.
See also the minutes of the previous meetings[2] and the schedule for
taking minutes[3] for upcoming meetings.
1. http://www.nikhef.nl/pub/projects/grid/gridwiki/index.php/SurfVideoConf
2. http://www.nikhef.nl/pub/departments/ct/grid/2010/
3. http://www.nikhef.nl/pub/departments/ct/grid/2010/minutes.html
Enjoy the weekend,
Dennis van Dok
*Logic Networks on the Grid: handling 15 million jobs*
Jan Bot, Jeroen de Ridder, Marcel Reinders
In the Delft Bioinformatics Lab, logic networks, i.e. combinations of a
few Boolean logic gates, are being employed to model gene regulation in
cancer. To this end, a dataset containing cancerous mutations in
combination with gene activity measurements is used. A good model is
defined as the combination of mutations that best predict the gene
activity. To find these models, a smart optimization strategy is
employed that evaluates a large collection of candidate network
topologies as well as a large number of input combinations.
To evaluate the significance of these networks it is vital to determine
the likelihood that these models will be found by chance. To this end
the same optimization strategy has to be applied to a permuted version
of the dataset. This results in a large amount (15 million) of small
jobs which each generate a small amount of output data. Due to the large
variance in running time (anywhere between two seconds and several
hours) and the associated risk of violating the maximum wall-clock
timer, collecting all outputs simultaneously may not be efficient or
result in data loss of successful jobs. Moreover, the grid
middleware cannot cope with this amount of small files. Therefore, we
have implemented a solution based on ToPoS (for the inputs) and an
XML-RPC server with a database back-end to store the outputs. In a
real-life application this solution has proven to be able to handle all
life science grid nodes running at once and also cope with the 600
additional nodes provided by our desktop cluster.
Since it is expected that many more bioinformatics applications will
suffer from this or similar problems, our solutions should prove useful
in many more permutation or cross-validation procedures carried out on
the Grid.