The Next BiG Grid meeting will be held on Monday 7 June from 13.30-14.30h in room H3.31.
Agenda:
- Operations summary - Middleware - Applications - AOB
- Presentation By Jan Bot:
*Logic Networks on the Grid: handling 15 million jobs* (abstract below)
This meeting may be joined by teleconferencing[1] in the Big Grid room.
The appointed volunteer to take minutes is Ronald.
See also the minutes of the previous meetings[2] and the schedule for taking minutes[3] for upcoming meetings.
1. http://www.nikhef.nl/pub/projects/grid/gridwiki/index.php/SurfVideoConf 2. http://www.nikhef.nl/pub/departments/ct/grid/2010/ 3. http://www.nikhef.nl/pub/departments/ct/grid/2010/minutes.html
Enjoy the weekend,
Dennis van Dok
*Logic Networks on the Grid: handling 15 million jobs*
Jan Bot, Jeroen de Ridder, Marcel Reinders
In the Delft Bioinformatics Lab, logic networks, i.e. combinations of a few Boolean logic gates, are being employed to model gene regulation in cancer. To this end, a dataset containing cancerous mutations in combination with gene activity measurements is used. A good model is defined as the combination of mutations that best predict the gene activity. To find these models, a smart optimization strategy is employed that evaluates a large collection of candidate network topologies as well as a large number of input combinations.
To evaluate the significance of these networks it is vital to determine the likelihood that these models will be found by chance. To this end the same optimization strategy has to be applied to a permuted version of the dataset. This results in a large amount (15 million) of small jobs which each generate a small amount of output data. Due to the large variance in running time (anywhere between two seconds and several hours) and the associated risk of violating the maximum wall-clock timer, collecting all outputs simultaneously may not be efficient or result in data loss of successful jobs. Moreover, the grid middleware cannot cope with this amount of small files. Therefore, we have implemented a solution based on ToPoS (for the inputs) and an XML-RPC server with a database back-end to store the outputs. In a real-life application this solution has proven to be able to handle all life science grid nodes running at once and also cope with the 600 additional nodes provided by our desktop cluster.
Since it is expected that many more bioinformatics applications will suffer from this or similar problems, our solutions should prove useful in many more permutation or cross-validation procedures carried out on the Grid.
Hi,
The minutes from 7-june: http://www.nikhef.nl/pub/departments/ct/grid/2010/20100607.html Are the same as the ones from 19 April. (Maybe exact the same was said, I don't know I wasn't there)
Also "vled" should be "vlet" unless there is a software distribution called "vled".
Cheers,
Piter.
On 06/04/2010 10:51 AM, Dennis van Dok wrote:
The Next BiG Grid meeting will be held on Monday 7 June from 13.30-14.30h in room H3.31.
Agenda:
Operations summary
Middleware
Applications
AOB
Presentation By Jan Bot:
*Logic Networks on the Grid: handling 15 million jobs* (abstract below)
This meeting may be joined by teleconferencing[1] in the Big Grid room.
The appointed volunteer to take minutes is Ronald.
See also the minutes of the previous meetings[2] and the schedule for taking minutes[3] for upcoming meetings.
- http://www.nikhef.nl/pub/projects/grid/gridwiki/index.php/SurfVideoConf
- http://www.nikhef.nl/pub/departments/ct/grid/2010/
- http://www.nikhef.nl/pub/departments/ct/grid/2010/minutes.html
Enjoy the weekend,
Dennis van Dok
*Logic Networks on the Grid: handling 15 million jobs*
Jan Bot, Jeroen de Ridder, Marcel Reinders
In the Delft Bioinformatics Lab, logic networks, i.e. combinations of a few Boolean logic gates, are being employed to model gene regulation in cancer. To this end, a dataset containing cancerous mutations in combination with gene activity measurements is used. A good model is defined as the combination of mutations that best predict the gene activity. To find these models, a smart optimization strategy is employed that evaluates a large collection of candidate network topologies as well as a large number of input combinations.
To evaluate the significance of these networks it is vital to determine the likelihood that these models will be found by chance. To this end the same optimization strategy has to be applied to a permuted version of the dataset. This results in a large amount (15 million) of small jobs which each generate a small amount of output data. Due to the large variance in running time (anywhere between two seconds and several hours) and the associated risk of violating the maximum wall-clock timer, collecting all outputs simultaneously may not be efficient or result in data loss of successful jobs. Moreover, the grid middleware cannot cope with this amount of small files. Therefore, we have implemented a solution based on ToPoS (for the inputs) and an XML-RPC server with a database back-end to store the outputs. In a real-life application this solution has proven to be able to handle all life science grid nodes running at once and also cope with the 600 additional nodes provided by our desktop cluster.
Since it is expected that many more bioinformatics applications will suffer from this or similar problems, our solutions should prove useful in many more permutation or cross-validation procedures carried out on the Grid.
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid