Hi Brian,
The performance and scalability of HDFS is very interesting.
A few questions: - How do you mount HDFS on a system? Is FUSE the only option for a regular mount? As I understand the information on the Hadoop website it would require application to use the Java Virtual FS bindings to Hadoop FS or redirect trough FUSE. - If FUSE is the option, what would the performance penalty be? - For Grid-enabling HDFS do you refer to the Java version of the GridFTPd or the C implementation?
thanks,
Oscar
Brian Bockelman wrote:
Hey Jeff,
I'd be happy to come and talk about Hadoop and quantify my comments. It really is an amazing, well-engineered file system. We currently have a 30TB install (completely from scratch space in worker node disks). Some of the highlights:
- Resource consumption: If I push it as hard as I can (several hundred
CMSSW processes reading the same file via POSIX I/O), it maybe uses a 1/3 of a CPU. Can't make its memory consumption show up in graphs.
- Performance: Doing the same test with *one file* and 2 replicas, I can
hit 80Gbps. Yahoo claims that their biggest clusters get up to 2.2Tbps. I haven't been able to make the namenode break a sweat
- Manageability: Straightforward. The namenode is journaled, and can
checkpoint. There's only two types of nodes (namenode and datanode); both are straightforward to manage. Out of the box, there are no knobs to tweak, no optimizations to perform to get decent performance. Just works. If you lose a data node, it starts re-replicating files within about 10 seconds. [As a side note, data recovery is actually pretty straightforward. We lost the disk which holds the namenode's info this weekend, and restoring from an automatically-created checkpoint was easy.]
If you want to grid-enable it, you reuse the Globus GridFTP server (plus bindings) and the BeStMan SRM server. I've clocked it at 8.2Gbps for WAN traffic.
For us, the best potential upside is the manageability, although the fact that performance-wise, it does things dCache can't do. If it can prove to require far less admin time, we are hoping to expand use beyond the current test bed.
Brian
On Oct 18, 2008, at 1:50 AM, Jeff Templon wrote:
Yo,
I'm talking to Brian Bockleman about other things, maybe if I can convince him to come, he can tell us about his experiences with HDFS.
Brian, now you have another reason to come ... give us a short technical seminar about HDFS? It could just be you with a piece of chalk and a blackboard, as informal as you like?
JT
Oscar Koeroo wrote:
Hi Sander, I'm very interested in the details, opportunities and experiences with Hadoop, but I don't have a clue about any of these things. Hadoop: http://hadoop.apache.org/core/ Hadoop FAQ
- What is Hadoop?
[WWW] Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the [WWW] Google File System and of [WWW] MapReduce. For some details, see HadoopMapReduce. Oscar Sander Klous wrote:
Hi What is Hadoop exactly? Reading through the specs it doesn't look like a competitor for dCache. So, can anybody comment on the remark below: "HDFS is much better than dCache"? Thanks, Sander
Plan:
- Gain experience w/ the Hadoop File System via a large
installation on the cluster in B240.
- Assuming it really works according to our expectations,
investigate how to best "pseudo-daemoncore-ify" the HDFS service so that runs under the condor_master, responds to Condor administrative commands such as on, off, restart, reconfig, etc, and hopefully has similar behavior for a debug log and config settings. Note: some challenges here include HDFS service is implemented in Java, perhaps relies on ssh for network authentication, etc.
- Enable the Condor file transfer service to stage files in/out
of HDFS, in addition to just the shadow.
That's really interesting.
As one datapoint: I was talking to Brian Bockelman about HDFS. He has been deployed HDFS at his OSG site in Nebraska, and thinks that it is great, much better than dCache.
An interesting question, once we get far enough along with this work: can the interface to HDFS be made pluggable? That is, can Condor call out to a file transfer service? I don't know if it makes sense, but we might be able to plug in other services one day.
Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid
Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid
On Oct 20, 2008, at 3:11 AM, Oscar Koeroo wrote:
Hi Brian,
The performance and scalability of HDFS is very interesting.
A few questions:
- How do you mount HDFS on a system? Is FUSE the only option for a
regular mount? As I understand the information on the Hadoop website it would require application to use the Java Virtual FS bindings to Hadoop FS or redirect trough FUSE.
The only available option is FUSE.
The FUSE bindings are written on top of libhdfs, which is a C library which uses JNI to invoke its operations.
- If FUSE is the option, what would the performance penalty be?
Negligible for what we do. I don't have any applications which push very hard on the metadata server (CMS apps use is trivial), however.
There is a bit of a performance penalty in terms maximum throughput - you can probably only hit 40-60MB/s per single stream. (CMS software, again, only sucks up about 10MB/s).
- For Grid-enabling HDFS do you refer to the Java version of the
GridFTPd or the C implementation?
C implementation.
The Globus GridFTP server is modular (using DSI, Direct Storage Interface); we use libhdfs to pass the data directly from the GridFTP server to HDFS, no FUSE involved.
Brian
thanks,
Oscar
Brian Bockelman wrote:
Hey Jeff,
I'd be happy to come and talk about Hadoop and quantify my comments. It really is an amazing, well-engineered file system. We currently have a 30TB install (completely from scratch space in worker node disks). Some of the highlights:
- Resource consumption: If I push it as hard as I can (several
hundred CMSSW processes reading the same file via POSIX I/O), it maybe uses a 1/3 of a CPU. Can't make its memory consumption show up in graphs.
- Performance: Doing the same test with *one file* and 2 replicas,
I can hit 80Gbps. Yahoo claims that their biggest clusters get up to 2.2Tbps. I haven't been able to make the namenode break a sweat
- Manageability: Straightforward. The namenode is journaled, and can
checkpoint. There's only two types of nodes (namenode and datanode); both are straightforward to manage. Out of the box, there are no knobs to tweak, no optimizations to perform to get decent performance. Just works. If you lose a data node, it starts re-replicating files within about 10 seconds. [As a side note, data recovery is actually pretty straightforward. We lost the disk which holds the namenode's info this weekend, and restoring from an automatically-created checkpoint was easy.]
If you want to grid-enable it, you reuse the Globus GridFTP server (plus bindings) and the BeStMan SRM server. I've clocked it at 8.2Gbps for WAN traffic.
For us, the best potential upside is the manageability, although the fact that performance-wise, it does things dCache can't do. If it can prove to require far less admin time, we are hoping to expand use beyond the current test bed.
Brian
On Oct 18, 2008, at 1:50 AM, Jeff Templon wrote:
Yo,
I'm talking to Brian Bockleman about other things, maybe if I can convince him to come, he can tell us about his experiences with HDFS.
Brian, now you have another reason to come ... give us a short technical seminar about HDFS? It could just be you with a piece of chalk and a blackboard, as informal as you like?
JT
Oscar Koeroo wrote:
Hi Sander, I'm very interested in the details, opportunities and experiences with Hadoop, but I don't have a clue about any of these things. Hadoop: http://hadoop.apache.org/core/ Hadoop FAQ
- What is Hadoop?
[WWW] Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the [WWW] Google File System and of [WWW] MapReduce. For some details, see HadoopMapReduce. Oscar Sander Klous wrote:
Hi What is Hadoop exactly? Reading through the specs it doesn't look like a competitor for dCache. So, can anybody comment on the remark below: "HDFS is much better than dCache"? Thanks, Sander
Plan:
- Gain experience w/ the Hadoop File System via a large
installation on the cluster in B240.
- Assuming it really works according to our expectations,
investigate how to best "pseudo-daemoncore-ify" the HDFS service so that runs under the condor_master, responds to Condor administrative commands such as on, off, restart, reconfig, etc, and hopefully has similar behavior for a debug log and config settings. Note: some challenges here include HDFS service is implemented in Java, perhaps relies on ssh for network authentication, etc.
- Enable the Condor file transfer service to stage files in/out
of HDFS, in addition to just the shadow.
That's really interesting.
As one datapoint: I was talking to Brian Bockelman about HDFS. He has been deployed HDFS at his OSG site in Nebraska, and thinks that it is great, much better than dCache.
An interesting question, once we get far enough along with this work: can the interface to HDFS be made pluggable? That is, can Condor call out to a file transfer service? I don't know if it makes sense, but we might be able to plug in other services one day.
Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid
Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid