Hi,
posted to the wrong list :-)
Begin forwarded message:
From: Hurng-Chun Lee Hurng-Chun.Lee@cern.ch Date: November 6, 2008 11:46:34 GMT+01:00 To: silvia@science.uva.nl Cc: tier1-ams@sara.nl Subject: [Tier1-ams] data deletion - experience from ATLAS Reply-To: tier1-ams@sara.nl
Hi Silvia,
The following points summarize what I used to do (and what ATLAS is doing) for cleaning up the grid storage at SARA/NIKHEF. I don't know if this will be useful for you so just for your information.
=== firstly, ATLAS has it's own data management model based on the concept of "dataset". A dataset is a bundle of the grid files with same physics properties (e.g. files are generated by the same production run or derived from the same physics stream). To maintain the link between the ATLAS dataset and the grid files, ATLAS has its own data management system called "DDM". DDM consists of a service, a metadata catalogue and a set of client tools. Users use the client tools to send requests to the service for certain data management operations (metadata lookup, data migration, deletion, etc.); while the service works together with the metadata catalogue to execute the operations on behalf of the users.
As "DDM" is the official data management tool of ATLAS, all the users (both data production and data analysis users) are requested to use it for all kind of data management activities (production, replication, deletion, etc.). By this, all the ATLAS data on the grid can be traced by the central operation team. Through the metadata stored in the DDM catalogue, the ATLAS central operation knows how many data has been produced around the world by ATLAS.
secondly, ATLAS fully adopts the SRMv2 space token to manage the storage areas on a site. Certain type of data is stored on a particular storage area identified by a space token (by seeing the token id, one has the idea about what kind of ATLAS data should be stored there). The SRMv2 service also provides a "uniform" management interface for heterogeneous storages at sites as it supposes to be. Through the SRMv2 interface, the ATLAS central operation can actually list what are sitting on a site and knows how much space has been used by ATLAS.
the first point shows the data from ATLAS point of view; while the second point gives you the data physically stored at a site. By comparing these two, one can make some consistency check and also perform some deletion accordingly. Of course, there is another layer for SRMv2 to local storage name space mapping to be checked; but for ATLAS, this is a site issue (or let's say the storage technology issue) should be done by the sites (or storage technicians).
Unforunately, if some ATLAS user tried not to use "DDM" to do data management operations, their production will not be considered as the ATLAS data (as their data not registered in DDM catalogue). In this case, they are considered as the dark data polluting the ATLAS storage area. By policy, this kind of data will be removed without any notification to the users (as it's also technically difficult to identify the owner of the dark data).
There was one time I tried to do some consistency check with NIKHEF and SARA site admins to remove files not recognized by ATLAS (i.e. files not presented in the ATLAS DDM). What we did was that I firstly dumped a list of files (i.e. SURLs) that ATLAS thinks to be at NIKHEF/SARA based on the DDM catalogue, secondly asked site admins to dump a list of files (i.e. the name space) from the corresponding storage area with local tools, and then compared two lists to get the final deletion list (we also double checked the list to ensure the important data are not deleted by accident). We then cleaned up the LFC and the storage according to the deletion list. We did it manually with site admin because by that time we were cleaning old storage area without SRMv2 interface to query what are located at site; and it was a massive deletion on site and better to use local tools for better performance.
I also have an impression that there must be some scripts already done by ATLAS to automate the procedure as the consistency check is run periodically by the central operation. If you have interests, I can try to ask the experts on this. ===
Hurng_______________________________________________ tier1-ams mailing list tier1-ams@horus.sara.nl https://horus.sara.nl/mailman/listinfo/tier1-ams