Drenth, Eduard wrote:
Could you please confirm the correctness of the drawing I've made for the grid?
The picture is largely OK, but it gets some of the details wrong:
- There is no user interaction from WMS to CE, or from SE to tape.
- There can be direct interaction from a UI with the CE
- The interface between WMS and CE is either CondorG (for LCG-CE) or ICE (for CREAMCE).
- The interface between the CE and the WN is through another box, the batch system head node (a.k.a. LRMS or Local Resource Management System). This is either the globus-jobmanager interface (for LCG-CE) or BLAH (for CREAMCE) and can talk to different kinds of batch systems such as Torque/Maui, Condor, Sun Grid Engine, LSF, or others.
- SRM is not a box of its own, it is an interface offered by the SE.
- There can be direct interfacing between UI and SE
- The LFN, SURL and TURL are all used in conjunction, for the same data but for different context:
* an LFN is a symbolic name, which the LFC can resolve to one or more SURLS (for different SEs)
* The SURL identifies a file uniquely at a single SE.
* For data retrieval, ask the SE to give you a TURL for a specific SURL; the TURL can, in priniple, be used only once and for a limited time.
Does it effectively visualize the working of the grid (storage)? Do you have any suggestions?
Maybe another kind of diagram (e.g. a sequence diagram) may shed more light on it.
Have you seen the grid tutorial handouts? They can be found on http://www.nikhef.nl/~dennisvd/.
I have some more questions.
1 Which identifiers are allowed in JDL for storage references (lfn, guid, surl, turl)?
Not sure about guid and surl, that would be a fun test. Like I mentioned, a turl is a temporary thing. But plain gsiftp urls should work.
2 Is nagios present at all systems that make up the grid?
No, this is a site-local decision. Any well-managed site should have something like Nagios, though.
3 Can someone describe the process of scheduling and load
balancing for the WMS and the CE, or point me to documentation?
See http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/index.shtml
The basics are not hard: The requirements in the JDL are matched by the WMS with a set of CEs; then a ranking is made based on the lowest 'estimated response time'.
I am mainly interested in how and when these processes use information on the availability of the resources needed by a job.
AFAIK, all info comes from the information system. I have not heard of sites that publish live downtime from their monitoring systems to the information system, but sites will make sure that they update the (non)availability when downtime is planned ahead (e.g. by publishing zero available worker nodes).
HTH,
Dennis