Dear all,
I cannot resist either :-). I agree very much with the analysis of Floris. Basically, running a VM should be handled the same as running a job. For what is the difference anyway with other jobs? The OS is just another application run by the user, and this application should have the same permissions as running other jobs. Then no additional policies are needed.
Of course you can go a step further, which is that you allow the set up in some way of your own managed nodes and clusters, although in a virtual environment. So, more or less you delegate the classical administrative responsibilities of sites to other authorities. Of course you can decide to do that, however the hosting organization cannot take any responsibility anymore for any misuse or whatsoever. It is the same as saying that part of the resources is reserved and managed by another organization.
Groet,
Jules
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Floris Sluiter Sent: 23 April 2010 15:35 To: ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
Dear all,
I could not resist to also share my thoughts on this...
What I understand is that a certain "trusted" VM producer can supply a VirtualMachine to the community. After a endorser "endorses" its use, any member of the VO can boot this image on any site that trusts the VO and this endorsing procedure. A VM machine is then booted by the user, however the virtual machine is booted with the same "system" rights as a GridNode or a VO box.
The catch is: The VM now runs with far more rights then an ordinary job. It runs with system rights. And there is in the policy no mechanism in place to monitor it. And what's more: there is no liability clause for the endorser (and to what extend of damages are they liable; 1M$ 10M$ ??) I would strongly advice any site against implementing such a policy! It is very unsafe to allow end-users to gain rights as system users on your trusted network.
In our HPC Cloud we certainly do allow users to become root within their own VMS inside their own Vlan. However from a system point of view the whole virtual cluster runs with ONLY user credentials, not with system credentials. So it is perfectly OK for them to set their interfaces in promiscuous mode, to do all kinds of LDAP calls or even to try and hack every ip-addres in their own range, etc. However, if users want access to the outside world, we very strictly monitor the traffic and the VM. The more rights they want, the more they have to subject to rules. And the most they can get is a "public" ip in the DMZ, were the security settings will only allow what a specific user requested and needs.
What would be possibly be acceptible for the Grid is a VM that would run with user rights inside their own VLAN, with the same security permission settings and ACLs as any other Grid job. Interestingly enough, the group of Davide Salomoni implemented just that, you can find his OGF presentation here http://www.ogf.org/OGF28/materials/1994/salomoni_ogf28_100316.pdf End users can submit Grid jobs or their own VM. On a worker node there is a bait job that monitors the load. If there is a Grid job in the queue, a virtual-gridnode is started on the workernode and it accepts the job. If there is a user submitted VM in the queue, that gets booted. (slide 96): It is in production with currently 1400 on-demand Virtual Machines, O(10) supported Virtual Images, serving 20 different user communities; on average, more than 20,000 jobs are executed each day through WNoDeS. The plan is to have 4000 Virtual Machines by April 2010 and progressively integrate all Tier-1 resources. It is fully compatible with existing Grid infrastructure.
I think we should seriously reconsider what rights we are willing to give to users...
Kindest regards,
Floris
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Oscar Koeroo Sent: donderdag 22 april 2010 17:55 To: ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
On 22/4/10 1:52 PM, Sander Klous wrote:
in my opinion, one of the main advantages of VMs, is it allows
you to
make assumptions about many things being exactly the same, on
all
sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans
the
possibility to impose a specific way of obtaining your workload
on
every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for
a weak
reason.
Okay, I will raise this point in the afternoon. As an alternative
we can
not ban anything in the policy related to obtaining a workload.
This
means that some of the images will try to connect to the batch
system and
we have to make sure these images are not run or won't affect the infrastructure at Nikhef. Of course this also means that these
images
will fail when they are started at Nikhef. If they want, other
sites can
do the same for images that will try to get their work from pilot
job
frameworks.
I'm not sure how successful this intervention will be. In
previous
discussions multiple sites did not like the idea of VMs
prescribing the
way they wanted to obtain their workload.
Just putting the policy in the background for a minute, if I would take the Alice VO as an example use case, then it doesn't make sense to connect back to the batch system when the image has been launched. A service within the VM could launch the AliEn('s have landed) pilot job framework and crunch on the data from there. In this pilot job mode, I called this the cloud-approach a while back i.e. executed on an infrastructure like Claudia i.e. close your eyes and launch a VM on some hardware (yes, I'm skipping details intentionally).
If you would use this in a batch system integrated way, then it would be launched by the batch system (INFN VM approach). This would not really mean that a pbs_mom is connecting to the site's Torque service from within the VM. If the latter would be the case, then it would act as a class 1 VM WN in the batch system of which I doubt you really want this to ever happen IMHO as it is both made off-site, has VO specific stuff added to it and would potentially mix with your regular cluster nodes.
I would change Point 7.4 "Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
to:
7.4a "How a running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
7.4b "The methods in which VM can be contextualized must adhere to site local policies"
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid _______________________________________________ ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid