Hi Sander,
I'm very interested in the details, opportunities and experiences with Hadoop, but I don't have a clue about any of these things.
Hadoop: http://hadoop.apache.org/core/
Hadoop FAQ
1. What is Hadoop?
[WWW] Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the [WWW] Google File System and of [WWW] MapReduce. For some details, see HadoopMapReduce.
Oscar
Sander Klous wrote:
Hi What is Hadoop exactly? Reading through the specs it doesn't look like a competitor for dCache. So, can anybody comment on the remark below: "HDFS is much better than dCache"? Thanks, Sander
Plan:
- Gain experience w/ the Hadoop File System via a large
installation on the cluster in B240.
- Assuming it really works according to our expectations,
investigate how to best "pseudo-daemoncore-ify" the HDFS service so that runs under the condor_master, responds to Condor administrative commands such as on, off, restart, reconfig, etc, and hopefully has similar behavior for a debug log and config settings. Note: some challenges here include HDFS service is implemented in Java, perhaps relies on ssh for network authentication, etc.
- Enable the Condor file transfer service to stage files in/out of
HDFS, in addition to just the shadow.
That's really interesting.
As one datapoint: I was talking to Brian Bockelman about HDFS. He has been deployed HDFS at his OSG site in Nebraska, and thinks that it is great, much better than dCache.
An interesting question, once we get far enough along with this work: can the interface to HDFS be made pluggable? That is, can Condor call out to a file transfer service? I don't know if it makes sense, but we might be able to plug in other services one day.
Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid