hi all,
the other action point from last monday's grid-overleg: why are downtime notifications sent only a day before they actually occur?
The short answer is: this is exactly as specified. If you read
https://edms.cern.ch/file/829986/0.1/EGEE-downtime-notification-procedure.pd... which in turn refers to https://edms.cern.ch/file/829986/0.1/EGEE-intervention-procedures.pdf then most downtime notifications fall into category C: − Send broadcast 1 day in advance − Broadcast targets: all ROCs, affected VOs, all production sites in the related ROC so the actual broadcast happens only 1 day in advance !!
of course , this makes the downtime notification procedure kinda useless for us. As far as I see there are 2 options:
1) create an email list infrastructure-announce@biggrid.nl to which all BigGrid downtime notifications must be sent (on penalty of death-by-tickling or something like that)
2) the GOCDB, where all site downtimes are entered, luckily has a programming interface where you can retrieve all downtimes for a particular site after a particular date: we could thus create a cron job that periodically checks the GOCDB and broadcasts any messages , as appropriate. For example, if I do curl --cert $X509_USER_PROXY --key $X509_USER_PROXY --CApath $X509_CERT_DIR -k \
"https://goc.gridops.org/gocdbpi/public/?method=get_downtime&topentity=SA..." I get <?xml version="1.0"?> <ROOT> <DOWNTIME ID="45205458" CLASSIFICATION="SCHEDULED"> <HOSTNAME>srm.grid.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> The tape backend will be down for the whole day due to maintenance. srm.grid.sara.nl will be up but data on tape cannot be accessed. </DESCRIPTION> <START_DATE>1251705600</START_DATE> <END_DATE>1251748800</END_DATE> <FORMATED_START_DATE>2009-08-31 08:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-31 20:00</FORMATED_END_DATE> </DOWNTIME> <DOWNTIME ID="43305437" CLASSIFICATION="SCHEDULED"> <HOSTNAME>ce.gina.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> maintenance to broken switch hardware </DESCRIPTION> <START_DATE>1251190800</START_DATE> <END_DATE>1251219600</END_DATE> <FORMATED_START_DATE>2009-08-25 09:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-25 17:00</FORMATED_END_DATE> </DOWNTIME> </ROOT>
so this API _does_ work... now we need a little XML scripting magic to turn it into a broadcast email - any takers? The upside of this approach is that we have a single entry point for downtimes , i.e. less chance of errors.
share and enjoy,
JJK
Hi Jan,
Within my group, we may take this task, but we need at first to discuss it.
Some of the questions are: - Is the downtime registration interface part of the API? - Do we need an interface where users can specify their notification preferences. - Does the GODB data model support the flexibility required by the users? - Which programming/Scripting language to be used? - etc.
Regards,
Ammar,
Jan Just Keijser wrote:
hi all,
the other action point from last monday's grid-overleg: why are downtime notifications sent only a day before they actually occur?
The short answer is: this is exactly as specified. If you read
https://edms.cern.ch/file/829986/0.1/EGEE-downtime-notification-procedure.pd...
which in turn refers to https://edms.cern.ch/file/829986/0.1/EGEE-intervention-procedures.pdf then most downtime notifications fall into category C: − Send broadcast 1 day in advance − Broadcast targets: all ROCs, affected VOs, all production sites in the related ROC so the actual broadcast happens only 1 day in advance !!
of course , this makes the downtime notification procedure kinda useless for us. As far as I see there are 2 options:
- create an email list infrastructure-announce@biggrid.nl to which all
BigGrid downtime notifications must be sent (on penalty of death-by-tickling or something like that)
- the GOCDB, where all site downtimes are entered, luckily has a
programming interface where you can retrieve all downtimes for a particular site after a particular date: we could thus create a cron job that periodically checks the GOCDB and broadcasts any messages , as appropriate. For example, if I do curl --cert $X509_USER_PROXY --key $X509_USER_PROXY --CApath $X509_CERT_DIR -k \
"https://goc.gridops.org/gocdbpi/public/?method=get_downtime&topentity=SA..."
I get
<?xml version="1.0"?>
<ROOT> <DOWNTIME ID="45205458" CLASSIFICATION="SCHEDULED"> <HOSTNAME>srm.grid.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> The tape backend will be down for the whole day due to maintenance. srm.grid.sara.nl will be up but data on tape cannot be accessed. </DESCRIPTION> <START_DATE>1251705600</START_DATE> <END_DATE>1251748800</END_DATE> <FORMATED_START_DATE>2009-08-31 08:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-31 20:00</FORMATED_END_DATE> </DOWNTIME> <DOWNTIME ID="43305437" CLASSIFICATION="SCHEDULED"> <HOSTNAME>ce.gina.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> maintenance to broken switch hardware </DESCRIPTION> <START_DATE>1251190800</START_DATE> <END_DATE>1251219600</END_DATE> <FORMATED_START_DATE>2009-08-25 09:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-25 17:00</FORMATED_END_DATE> </DOWNTIME> </ROOT>
so this API _does_ work... now we need a little XML scripting magic to turn it into a broadcast email - any takers? The upside of this approach is that we have a single entry point for downtimes , i.e. less chance of errors.
share and enjoy,
JJK _______________________________________________ Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid
Hi,
It seems there is a different option. https://savannah.cern.ch/support/?106465
But not implemented yet ??
Mentions the possibility to configure certain types of resources as core services from a user / vo perspective. This could for example be done for all storage elements.
This can, as stated be done using the vo-id card in the cic ops portal. (user personalized notification perspective)
But could be done in the GOCDB as well this last thing is something one might not wan't to do since then all vo's would be affected by that. (service delivery perspective)
Regards,
tom
Tom Visser Phone: +31617411603 Mail: tom.visser@sara.nl SARA Computing & Networking Services High Performance Computing and Visualization http://www.sara.nl
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Ammar Benabdelkader Sent: Friday, August 21, 2009 4:08 PM To: Jan Just Keijser Cc: ct-grid@nikhef.nl Subject: Re: [Ct-grid] action point: Look at CIC portal concerning downtime notifications.
Hi Jan,
Within my group, we may take this task, but we need at first to discuss it.
Some of the questions are: - Is the downtime registration interface part of the API? - Do we need an interface where users can specify their notification preferences. - Does the GODB data model support the flexibility required by the users? - Which programming/Scripting language to be used? - etc.
Regards,
Ammar,
Jan Just Keijser wrote:
hi all,
the other action point from last monday's grid-overleg: why are downtime notifications sent only a day before they actually occur?
The short answer is: this is exactly as specified. If you read
https://edms.cern.ch/file/829986/0.1/EGEE-downtime-notification-procedure.pd...
which in turn refers to https://edms.cern.ch/file/829986/0.1/EGEE-intervention-procedures.pdf then most downtime notifications fall into category C: − Send broadcast 1 day in advance − Broadcast targets: all ROCs, affected VOs, all production sites in the related ROC so the actual broadcast happens only 1 day in advance !!
of course , this makes the downtime notification procedure kinda useless for us. As far as I see there are 2 options:
- create an email list infrastructure-announce@biggrid.nl to which all
BigGrid downtime notifications must be sent (on penalty of death-by-tickling or something like that)
- the GOCDB, where all site downtimes are entered, luckily has a
programming interface where you can retrieve all downtimes for a particular site after a particular date: we could thus create a cron job that periodically checks the GOCDB and broadcasts any messages , as appropriate. For example, if I do curl --cert $X509_USER_PROXY --key $X509_USER_PROXY --CApath $X509_CERT_DIR -k \
"https://goc.gridops.org/gocdbpi/public/?method=get_downtime&topentity=SA..."
I get
<?xml version="1.0"?>
<ROOT> <DOWNTIME ID="45205458" CLASSIFICATION="SCHEDULED"> <HOSTNAME>srm.grid.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> The tape backend will be down for the whole day due to maintenance. srm.grid.sara.nl will be up but data on tape cannot be accessed. </DESCRIPTION> <START_DATE>1251705600</START_DATE> <END_DATE>1251748800</END_DATE> <FORMATED_START_DATE>2009-08-31 08:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-31 20:00</FORMATED_END_DATE> </DOWNTIME> <DOWNTIME ID="43305437" CLASSIFICATION="SCHEDULED"> <HOSTNAME>ce.gina.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> maintenance to broken switch hardware </DESCRIPTION> <START_DATE>1251190800</START_DATE> <END_DATE>1251219600</END_DATE> <FORMATED_START_DATE>2009-08-25 09:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-25 17:00</FORMATED_END_DATE> </DOWNTIME> </ROOT>
so this API _does_ work... now we need a little XML scripting magic to turn it into a broadcast email - any takers? The upside of this approach is that we have a single entry point for downtimes , i.e. less chance of errors.
share and enjoy,
JJK _______________________________________________ Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid
_______________________________________________ Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid
Have you guys seen this: https://cic.gridops.org/index.php?section=vo&page=SDnotification_v2
There you can subscribe to downtime notifications of site of your liking. You can also choose between an RSS feed and email.
Indeed the programmatic interface (as Gilles Mathieu puts it) of the GOCdb works fine too.
It is annoying that a site admin has no control anymore over when downtime notifications are send. In WLCG there is an agreement that full day downtimes are announced at least a week in advance. The removal of this features makes it difficult for a site admin to fulfill this agreement. The LHC VOs use the programmatic interface to be informed about downtimes. I can ask in the weekly operations meeting if that feature can be put back in. My preference would be that a downtime notification would be send twice. The submitter should be allowed to select one point in time to send the notification and it should be send 24 hours in advance.
Ron
On Fri, 2009-08-21 at 14:44 +0200, Jan Just Keijser wrote:
hi all,
the other action point from last monday's grid-overleg: why are downtime notifications sent only a day before they actually occur?
The short answer is: this is exactly as specified. If you read
https://edms.cern.ch/file/829986/0.1/EGEE-downtime-notification-procedure.pd... which in turn refers to https://edms.cern.ch/file/829986/0.1/EGEE-intervention-procedures.pdf then most downtime notifications fall into category C: − Send broadcast 1 day in advance − Broadcast targets: all ROCs, affected VOs, all production sites in the related ROC so the actual broadcast happens only 1 day in advance !!
of course , this makes the downtime notification procedure kinda useless for us. As far as I see there are 2 options:
- create an email list infrastructure-announce@biggrid.nl to which all
BigGrid downtime notifications must be sent (on penalty of death-by-tickling or something like that)
- the GOCDB, where all site downtimes are entered, luckily has a
programming interface where you can retrieve all downtimes for a particular site after a particular date: we could thus create a cron job that periodically checks the GOCDB and broadcasts any messages , as appropriate. For example, if I do curl --cert $X509_USER_PROXY --key $X509_USER_PROXY --CApath $X509_CERT_DIR -k \
"https://goc.gridops.org/gocdbpi/public/?method=get_downtime&topentity=SA..." I get
<?xml version="1.0"?>
<ROOT> <DOWNTIME ID="45205458" CLASSIFICATION="SCHEDULED"> <HOSTNAME>srm.grid.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> The tape backend will be down for the whole day due to maintenance. srm.grid.sara.nl will be up but data on tape cannot be accessed. </DESCRIPTION> <START_DATE>1251705600</START_DATE> <END_DATE>1251748800</END_DATE> <FORMATED_START_DATE>2009-08-31 08:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-31 20:00</FORMATED_END_DATE> </DOWNTIME> <DOWNTIME ID="43305437" CLASSIFICATION="SCHEDULED"> <HOSTNAME>ce.gina.sara.nl</HOSTNAME> <SEVERITY>OUTAGE</SEVERITY> <DESCRIPTION> maintenance to broken switch hardware </DESCRIPTION> <START_DATE>1251190800</START_DATE> <END_DATE>1251219600</END_DATE> <FORMATED_START_DATE>2009-08-25 09:00</FORMATED_START_DATE> <FORMATED_END_DATE>2009-08-25 17:00</FORMATED_END_DATE> </DOWNTIME> </ROOT>
so this API _does_ work... now we need a little XML scripting magic to turn it into a broadcast email - any takers? The upside of this approach is that we have a single entry point for downtimes , i.e. less chance of errors.
share and enjoy,
JJK _______________________________________________ Ct-grid mailing list Ct-grid@nikhef.nl https://mailman.nikhef.nl/cgi-bin/listinfo/ct-grid