Hi, Ever used this? If so, does it work?
DMTCP is a distributed checkpointing system that can not only checkpoint sequential programs, but also threaded programs (with pthreads), families of processes (made with fork), and distributed processes across machines (like MPI). You can find reference for it here:
Would it be worthwhile to investigate DMTCP on grid and see how well it works by wrapping it around the job execution boundaries? Thanks, Sander
don't know this one ...
Cheers,
Sven On Thursday 08 January 2009 19:39:17 Sander Klous wrote:
Hi, Ever used this? If so, does it work?
DMTCP is a distributed checkpointing system that can not only checkpoint sequential programs, but also threaded programs (with pthreads), families of processes (made with fork), and distributed processes across machines (like MPI). You can find reference for it here:
Would it be worthwhile to investigate DMTCP on grid and see how well it works by wrapping it around the job execution boundaries? Thanks, Sander _______________________________________________ Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid
I don't know it, it is also quite new, but it looks nice. For a detailed intro paper by the authors, including results (so it does seems to work): http://arxiv.org/abs/cs/0701037
Cheers,
Mischa
On Thu, Jan 08, 2009 at 09:02:17PM +0100, Sven Gabriel wrote:
don't know this one ...
Cheers,
Sven On Thursday 08 January 2009 19:39:17 Sander Klous wrote:
Hi, Ever used this? If so, does it work?
DMTCP is a distributed checkpointing system that can not only checkpoint sequential programs, but also threaded programs (with pthreads), families of processes (made with fork), and distributed processes across machines (like MPI). You can find reference for it here:
Would it be worthwhile to investigate DMTCP on grid and see how well it works by wrapping it around the job execution boundaries? Thanks, Sander _______________________________________________
Hi,
We're planning a presentation via the teleconferencing system with Brian Bockelman about Hadoop FS and how CMS workflows can run on Hadoop FS in production.
Brian can't make it to Europe, but offered to share his experiences via the telecon setup at Nikhef.
More info on Hadoop: http://hadoop.apache.org/core/
What I like to know is what the best moment in time would be to have this presentation scheduled.
Note: Due to the time difference we'll probably be planning this at the end of the day.
Please cast you preferred moment in time when interested: http://www.doodle.com/s3hi59qg366mxxb4
Oscar