Dear grid users,
The resource broker service in the NIKHEF grid facility has recently been hindered by a full disk. The full disk contained the sandbox directories. (The sandbox directory of a job contains the input files before submission and the output after execution of the job). As a consequence, it was not possible to submit jobs via the NIKHEF grid facility.
After investigation of the problem, it was obvious that: 1) Users sometimes store very large files (>> 10MB) in their sandbox directories. 2) The contents of sandbox directories are not retrieved after the job ends.
What can you do to prevent this problem? 1) If your jobs produce output files larger than circa 1 MB, do not leave the files in the sandbox directory, but copy the them to a storage element. Please refer to http://www.dutchgrid.nl/Org/Nikhef/tutorial.pdf for an introduction of using storage elements. Smaller files can be left in the sandbox. 2) Fetch the output of your jobs after the jobs have finished.
What have we done to prevent this problem in the future? 1) We have enabled quota on the disk containing the sandboxes. Every user has a quotum that is sufficiently large to run hundreds of grid jobs and store the output of small files in the sandbox directories. Any output that does not fit in the quotum will be lost. 2) We will enable clean up scripts that automatically remove the contents of sandbox directories after 28 days. It will not be possible to retrieve the output after this period of 28 days.
Best regards, Ronald