Re: [Ct-grid] Fwd: Updated draft of VM policy

23 Apr 2010


      Dear all,
I cannot resist either :-).
I agree very much with the analysis of Floris. 
Basically, running a VM should be handled the same as running a job. For what is the difference anyway with other jobs? The OS is just another application run by the user, and this application should have the same permissions as running other jobs. Then no additional policies are needed.
Of course you can go a step further, which is that you allow the set up in some way of your own managed nodes and clusters, although in a virtual environment. So, more or less you delegate the classical administrative responsibilities of sites to other authorities. Of course you can decide to do that, however the hosting organization cannot take any responsibility anymore for any misuse or whatsoever.
It is the same as saying that part of the resources is reserved and managed by another organization.
Groet,
Jules
...
-----Original Message-----
From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl]
On Behalf Of Floris Sluiter
Sent: 23 April 2010 15:35
To: ct-grid@nikhef.nl
Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
Dear all,
I could not resist to also share my thoughts on this...
What I understand is that a certain "trusted" VM producer can
supply a VirtualMachine to the community. After a endorser
"endorses" its use, any member of the VO can boot this image on any
site that trusts the VO and this endorsing procedure.
A VM machine is then booted by the user, however the virtual
machine is booted with the same "system" rights as a GridNode or a
VO box.
The catch is: The VM now runs with far more rights then an ordinary
job. It runs with system rights. And there is in the policy no
mechanism in place to monitor it. And what's more: there is no
liability clause for the endorser (and to what extend of damages
are they liable; 1M$ 10M$ ??)
I would strongly advice any site against implementing such a
policy! It is very unsafe to allow end-users to gain rights as
system users on your trusted network.
In our HPC Cloud we certainly do allow users to become root within
their own VMS inside their own Vlan. However from a system point of
view the whole virtual cluster runs with ONLY user credentials, not
with system credentials. So it is perfectly OK for them to set
their interfaces in promiscuous mode, to do all kinds of LDAP calls
or even to try and hack every ip-addres in their own range, etc.
However, if users want access to the outside world, we very
strictly monitor the traffic and the VM. The more rights they want,
the more they have to subject to rules. And the most they can get
is a "public" ip in the DMZ, were the security settings will only
allow what a specific user requested and needs.
What would be possibly be acceptible for the Grid is a VM that
would run with user rights inside their own VLAN, with the same
security permission settings and ACLs as any other Grid job.
Interestingly enough, the group of Davide Salomoni implemented just
that, you can find his OGF presentation here
http://www.ogf.org/OGF28/materials/1994/salomoni_ogf28_100316.pdf
End users can submit Grid jobs or their own VM. On a worker node
there is a bait job that monitors the load. If there is a Grid job
in the queue, a virtual-gridnode is started on the workernode and
it accepts the job. If there is a user submitted VM in the queue,
that gets booted. (slide 96): It is in production with currently
1400 on-demand Virtual Machines,
O(10) supported Virtual Images, serving 20 different user
communities; on average, more than 20,000 jobs are executed each
day through WNoDeS. The plan is to have 4000 Virtual Machines by
April 2010 and progressively
integrate all Tier-1 resources.
It is fully compatible with existing Grid infrastructure.
I think we should seriously reconsider what rights we are willing
to give to users...
Kindest regards,
Floris
-----Original Message-----
From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl]
On Behalf Of Oscar Koeroo
Sent: donderdag 22 april 2010 17:55
To: ct-grid@nikhef.nl
Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
On 22/4/10 1:52 PM, Sander Klous wrote:
...
...
in my opinion, one of the main advantages of VMs, is it allows
you to
...
...
make assumptions about many things being exactly the same, on
all
...
...
sites.  Why else would you go to the trouble??  So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
...
So, I see what you mean by banned now: the policy indeed bans
the
...
...
...
possibility to impose a specific way of obtaining your workload
on
...
...
...
every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for
a weak
...
...
reason.
Okay, I will raise this point in the afternoon. As an alternative
we can
...
not ban anything in the policy related to obtaining a workload.
This
...
means that some of the images will try to connect to the batch
system and
...
we have to make sure these images are not run or won't affect the
infrastructure at Nikhef. Of course this also means that these
images
...
will fail when they are started at Nikhef. If they want, other
sites can
...
do the same for images that will try to get their work from pilot
job
...
frameworks.
I'm not sure how successful this intervention will be. In
previous
...
discussions multiple sites did not like the idea of VMs
prescribing the
...
way they wanted to obtain their workload.
Just putting the policy in the background for a minute, if I would
take the
Alice VO as an example use case, then it doesn't make sense to
connect back
to the batch system when the image has been launched. A service
within the
VM could launch the AliEn('s have landed) pilot job framework and
crunch on
the data from there. In this pilot job mode, I called this the
cloud-approach a while back i.e. executed on an infrastructure like
Claudia
i.e. close your eyes and launch a VM on some hardware (yes, I'm
skipping
details intentionally).
If you would use this in a batch system integrated way, then it
would be
launched by the batch system (INFN VM approach). This would not
really mean
that a pbs_mom is connecting to the site's Torque service from
within the
VM. If the latter would be the case, then it would act as a class 1
VM WN in
the batch system of which I doubt you really want this to ever
happen IMHO
as it is both made off-site, has VO specific stuff added to it and
would
potentially mix with your regular cluster nodes.
I would change Point 7.4 "Images should not be pre-configured to
obtain a
workload. How the running instance of an image obtains a workload
is a
contextualization option left to the site at which the image is
instantiated."
to:
7.4a "How a running instance of an image obtains a workload is a
contextualization option left to the site at which the image is
instantiated."
7.4b "The methods in which VM can be contextualized must adhere to
site
local policies"

ct-grid mailing list
ct-grid@nikhef.nl
https://mailman.nikhef.nl/mailman/listinfo/ct-grid
_______________________________________________
ct-grid mailing list
ct-grid@nikhef.nl
https://mailman.nikhef.nl/mailman/listinfo/ct-grid

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [Ct-grid] Fwd: Updated draft of VM policy