Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
Hi
mostly looks ok, but i do not understand two points:
1. why no pre-configured accounts? why should a combined ALICE VO image not contain an "alice" user account, already set up to go?
2. why not pre-configured to retrieve a workload?
Both of these things are valuable from the VO point of view. I would like to understand what the objections are. Other than the obvious one, being if one forbids enough of the advantages, then nobody will submit VMs ...
JT
On 22 Apr 2010, at 09:05, Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Jeff, (This time with reply-all, sorry) 1) I think what they mean with pre-configured accounts is actually pre-configured credentials, which is a security concern. I will try to get this point clarified in the working group. 2) The original proposal was to pre-configure the VM to connect back to the site batch system. This is the main mode of operation at CERN. At Nikhef this will never happen, so a different way is required to retrieve a workload (e.g. connect to a pilot job framework). Hence, in order to make an image usable cross-site, the mechanism to obtain the workerload can not be pre-configured. Thanks for the feedback, Sander
On Apr 22, 2010, at 9:15 AM, Jeff Templon wrote:
Hi
mostly looks ok, but i do not understand two points:
why no pre-configured accounts? why should a combined ALICE VO image not contain an "alice" user account, already set up to go?
why not pre-configured to retrieve a workload?
Both of these things are valuable from the VO point of view. I would like to understand what the objections are. Other than the obvious one, being if one forbids enough of the advantages, then nobody will submit VMs ...
JT
On 22 Apr 2010, at 09:05, Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi,
i see the issue with having pre-configured credentials, thanks. but accounts in the unix sense, no. good to get clarification.
clarification on the other issue (getting a workload) is also important. i can understand why you would not want the image to get a "workload" (I would call it a job) from the site's batch system. you wouldn't even allow it. configured or not.
on the other hand, workload could also refer to something in the experiment's task queue off site. this would mean that the VM "woke up" as a pilot job. I don't see anything wrong with that at all. and that would be a type of pre-configuration.
JT
On 22 Apr 2010, at 09:21, Sander Klous wrote:
Hi Jeff, (This time with reply-all, sorry)
- I think what they mean with pre-configured accounts is actually pre-configured credentials, which is a security concern. I will try to get this point clarified in the working group.
- The original proposal was to pre-configure the VM to connect back to the site batch system. This is the main mode of operation at CERN. At Nikhef this will never happen, so a different way is required to retrieve a workload (e.g. connect to a pilot job framework). Hence, in order to make an image usable cross-site, the mechanism to obtain the workerload can not be pre-configured.
Thanks for the feedback, Sander
On Apr 22, 2010, at 9:15 AM, Jeff Templon wrote:
Hi
mostly looks ok, but i do not understand two points:
why no pre-configured accounts? why should a combined ALICE VO image not contain an "alice" user account, already set up to go?
why not pre-configured to retrieve a workload?
Both of these things are valuable from the VO point of view. I would like to understand what the objections are. Other than the obvious one, being if one forbids enough of the advantages, then nobody will submit VMs ...
JT
On 22 Apr 2010, at 09:05, Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Jeff,
clarification on the other issue (getting a workload) is also important. i can understand why you would not want the image to get a "workload" (I would call it a job) from the site's batch system. you wouldn't even allow it. configured or not.
on the other hand, workload could also refer to something in the experiment's task queue off site. this would mean that the VM "woke up" as a pilot job. I don't see anything wrong with that at all. and that would be a type of pre-configuration.
The point is that "you don't see anything wrong with that", but other sites might. So, we don't want to specify it in the policy. That's why it is left a site contextualization issue, so each site can apply the mechanisms they are happy with. Another option would be to create X different versions of an image, each of them with a different way to retrieve the workload, sites can pick the ones they allow. That would not be inline with the philosophy of generating cross-site images, which is the main purpose of this policy.
On 22 Apr 2010, at 09:21, Sander Klous wrote:
Hi Jeff, (This time with reply-all, sorry)
- I think what they mean with pre-configured accounts is actually pre-configured credentials, which is a security concern. I will try to get this point clarified in the working group.
- The original proposal was to pre-configure the VM to connect back to the site batch system. This is the main mode of operation at CERN. At Nikhef this will never happen, so a different way is required to retrieve a workload (e.g. connect to a pilot job framework). Hence, in order to make an image usable cross-site, the mechanism to obtain the workerload can not be pre-configured.
Thanks for the feedback, Sander
On Apr 22, 2010, at 9:15 AM, Jeff Templon wrote:
Hi
mostly looks ok, but i do not understand two points:
why no pre-configured accounts? why should a combined ALICE VO image not contain an "alice" user account, already set up to go?
why not pre-configured to retrieve a workload?
Both of these things are valuable from the VO point of view. I would like to understand what the objections are. Other than the obvious one, being if one forbids enough of the advantages, then nobody will submit VMs ...
JT
On 22 Apr 2010, at 09:05, Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi
On 22 Apr 2010, at 09:58, Sander Klous wrote:
The point is that "you don't see anything wrong with that", but other sites might. So, we don't want to specify it in the policy. That's why it is left a site contextualization issue, so each
I argue VERY strongly against this. If there is a good reason to ban it, then state the reason and ban it. "Other sites might see something wrong with it". If they see something wrong with it, let them speak up and make a case for it. Otherwise, do not ban it in the policy.
This is a principle we've tried to follow since HEPCAL days ... "it might be" isn't good enough.
JT
Well, you don't want VMs connecting back to the batch system. At CERN this is the common way to get work inside your VM. So, a VM banned at Nikhef would be perfectly fine for CERN. You want to allow VMs connecting to pilot job frameworks, but other sites have severe restrictions on outbound IP. So, a VM banned from these sites, would be perfectly fine for Nikhef. There is no common denominator so there is no way we can specify it in the policy. I don't understand what you mean with: "do not ban it in the policy". It is not banned, it is kept as a site contextualization issue. As far as I can see, it is the only reasonable option for cross-site images.
On Apr 22, 2010, at 10:16 AM, Jeff Templon wrote:
Hi
On 22 Apr 2010, at 09:58, Sander Klous wrote:
The point is that "you don't see anything wrong with that", but other sites might. So, we don't want to specify it in the policy. That's why it is left a site contextualization issue, so each
I argue VERY strongly against this. If there is a good reason to ban it, then state the reason and ban it. "Other sites might see something wrong with it". If they see something wrong with it, let them speak up and make a case for it. Otherwise, do not ban it in the policy.
This is a principle we've tried to follow since HEPCAL days ... "it might be" isn't good enough.
JT
Hi,
so what you have told me, is that the VM should make no assumptions on what site services it can connect to. this is not the same as not being able to pick up a workload. having the VM instantiated so that at boot time, one of the things it does is to wake up the "alicepilot" account, which then contacts the alien task queue and picks up a workload ... this is *banned* by your current version of the policy, but there is no good reason to ban it, as it does not rely at all on the VM connecting back to the batch system.
there are some sites that have severe restrictions on outbound ip, but I don't think we should design the policy around this. we already have issues like this, the VO box ... where they are supposed to specify to us which ports will be used, and we make sure they are open. we can do the same thing here.
Does this make it more clear? I think the confusion, is that you were equating "getting a workload" with "connecting to the batch system".
JT
On 22 Apr 2010, at 10:25, Sander Klous wrote:
Well, you don't want VMs connecting back to the batch system. At CERN this is the common way to get work inside your VM. So, a VM banned at Nikhef would be perfectly fine for CERN. You want to allow VMs connecting to pilot job frameworks, but other sites have severe restrictions on outbound IP. So, a VM banned from these sites, would be perfectly fine for Nikhef. There is no common denominator so there is no way we can specify it in the policy. I don't understand what you mean with: "do not ban it in the policy". It is not banned, it is kept as a site contextualization issue. As far as I can see, it is the only reasonable option for cross-site images.
On Apr 22, 2010, at 10:16 AM, Jeff Templon wrote:
Hi
On 22 Apr 2010, at 09:58, Sander Klous wrote:
The point is that "you don't see anything wrong with that", but other sites might. So, we don't want to specify it in the policy. That's why it is left a site contextualization issue, so each
I argue VERY strongly against this. If there is a good reason to ban it, then state the reason and ban it. "Other sites might see something wrong with it". If they see something wrong with it, let them speak up and make a case for it. Otherwise, do not ban it in the policy.
This is a principle we've tried to follow since HEPCAL days ... "it might be" isn't good enough.
JT
Hi Jeff, No sorry, it is not clear to me yet. I don't see how we "ban" the waking up of the "alicepilot" script with the current policy. The way I see it, a user should be able to specify a startup script in the JDL. If a site supports it, it can call this startup script by contextualizing the VM.
Let's make this very concreet: - The VO software contains a script called /opt/bin/alicepilot. - The user specifies in the JDL that this script should be called on boot. - The site contextualizes the images, in this case it means that a boot script is inserted that calls /opt/bin/alicepilot with user priviliges.
So, it is perfectly possible to get work from a pilot job framework with site contextualization. What am I missing? Thanks, Sander
On Apr 22, 2010, at 11:29 AM, Jeff Templon wrote:
Hi,
so what you have told me, is that the VM should make no assumptions on what site services it can connect to. this is not the same as not being able to pick up a workload. having the VM instantiated so that at boot time, one of the things it does is to wake up the "alicepilot" account, which then contacts the alien task queue and picks up a workload ... this is *banned* by your current version of the policy, but there is no good reason to ban it, as it does not rely at all on the VM connecting back to the batch system.
there are some sites that have severe restrictions on outbound ip, but I don't think we should design the policy around this. we already have issues like this, the VO box ... where they are supposed to specify to us which ports will be used, and we make sure they are open. we can do the same thing here.
Does this make it more clear? I think the confusion, is that you were equating "getting a workload" with "connecting to the batch system".
JT
On 22 Apr 2010, at 10:25, Sander Klous wrote:
Well, you don't want VMs connecting back to the batch system. At CERN this is the common way to get work inside your VM. So, a VM banned at Nikhef would be perfectly fine for CERN. You want to allow VMs connecting to pilot job frameworks, but other sites have severe restrictions on outbound IP. So, a VM banned from these sites, would be perfectly fine for Nikhef. There is no common denominator so there is no way we can specify it in the policy. I don't understand what you mean with: "do not ban it in the policy". It is not banned, it is kept as a site contextualization issue. As far as I can see, it is the only reasonable option for cross-site images.
On Apr 22, 2010, at 10:16 AM, Jeff Templon wrote:
Hi
On 22 Apr 2010, at 09:58, Sander Klous wrote:
The point is that "you don't see anything wrong with that", but other sites might. So, we don't want to specify it in the policy. That's why it is left a site contextualization issue, so each
I argue VERY strongly against this. If there is a good reason to ban it, then state the reason and ban it. "Other sites might see something wrong with it". If they see something wrong with it, let them speak up and make a case for it. Otherwise, do not ban it in the policy.
This is a principle we've tried to follow since HEPCAL days ... "it might be" isn't good enough.
JT
On 22 Apr 2010, at 11:59, Sander Klous wrote:
Hi Jeff, No sorry, it is not clear to me yet. I don't see how we "ban" the waking up of the "alicepilot" script with the current policy. The way I see it, a user should be able to specify a startup script in the JDL. If a site supports it, it can call this startup script by contextualizing the VM.
To be concrete:
Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated.
That is what is written in the policy. I give you an image that is pre-configured that upon instantiation, contacts the ALICE central task queue to obtain a workload. It does NOT contact the batch system of the site. Explain to me, after re-reading the above excerpt from the policy, how this is not banned!!! The image is pre-configured to obtain a workload!
Let's make this very concreet:
- The VO software contains a script called /opt/bin/alicepilot.
- The user specifies in the JDL that this script should be called on boot.
- The site contextualizes the images, in this case it means that a boot script is inserted that calls /opt/bin/alicepilot with user priviliges.
So, it is perfectly possible to get work from a pilot job framework with site contextualization. What am I missing?
it is perfectly possible to do it via your way, it is also possible to do it my way (the image is simply a pilot job and needs no interaction with the JDL ... specifying the image to run is enough). But my way is banned, why??
JT
That is what is written in the policy. I give you an image that is pre-configured that upon instantiation, contacts the ALICE central task queue to obtain a workload. It does NOT contact the batch system of the site. Explain to me, after re-reading the above excerpt from the policy, how this is not banned!!! The image is pre-configured to obtain a workload!
Let's make this very concreet:
- The VO software contains a script called /opt/bin/alicepilot.
- The user specifies in the JDL that this script should be called on boot.
- The site contextualizes the images, in this case it means that a boot script is inserted that calls /opt/bin/alicepilot with user priviliges.
So, it is perfectly possible to get work from a pilot job framework with site contextualization. What am I missing?
it is perfectly possible to do it via your way, it is also possible to do it my way (the image is simply a pilot job and needs no interaction with the JDL ... specifying the image to run is enough). But my way is banned, why??
Because your way to obtain the workload will be acceptable on some sites but not on others, you cannot demand from all sites to support your way of obtaining a workload. With the contextualization as proposed by the working group it is up to the site how they eventually will obtain the workload. This procedure will be very different from site to site and depends a lot on the way VMs are integrated on the site infrastructure. CERN might choose to obtain the workload by connecting back to the batch system, Nikhef might do it in the way I propose above.
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
Hi Sander
given that we already allow many different forms of outbound connectivity, i do not see a strong objection. i am sure there is somebody out there that objects, on the other hand we have the same thing with glexec ... somebody did not want to deploy any suid program that had ever had a security issue. we are deploying it anyway.
in my opinion, one of the main advantages of VMs, is it allows you to make assumptions about many things being exactly the same, on all sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for a weak reason.
JT
I can no longer resist not participating in this lenghty thread, sorry.
On 22-04-10 13:37, Jeff Templon wrote:
Hi Sander
given that we already allow many different forms of outbound connectivity, i do not see a strong objection. i am sure there is somebody out there that objects, on the other hand we have the same thing with glexec ... somebody did not want to deploy any suid program that had ever had a security issue. we are deploying it anyway.
in my opinion, one of the main advantages of VMs, is it allows you to make assumptions about many things being exactly the same, on all sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for a weak reason.
I don't agree. The benefit of using VMs for users lies in a consistent software environment for the job (payload, pilot content, whatever). Of course, it would be *convenient* for those users if the mechanism to get the job started would be the same everywhere, but that is a system thingy.
Reality is that sites (and their admins) have implemented their own policies about network traffic etc. Those policies operate on a different level than the software environment for the job (payload). You should not ignore that if you want a workable common policy for VMs. And we all know that trying to chase site admins into a harness created by end users is not going to work!
So there is still a clear benefit for users, even if they cannot define all the rules.
Ronald
in my opinion, one of the main advantages of VMs, is it allows you to make assumptions about many things being exactly the same, on all sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for a weak reason.
Okay, I will raise this point in the afternoon. As an alternative we can not ban anything in the policy related to obtaining a workload. This means that some of the images will try to connect to the batch system and we have to make sure these images are not run or won't affect the infrastructure at Nikhef. Of course this also means that these images will fail when they are started at Nikhef. If they want, other sites can do the same for images that will try to get their work from pilot job frameworks.
I'm not sure how successful this intervention will be. In previous discussions multiple sites did not like the idea of VMs prescribing the way they wanted to obtain their workload.
On 22/4/10 1:52 PM, Sander Klous wrote:
in my opinion, one of the main advantages of VMs, is it allows you to make assumptions about many things being exactly the same, on all sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for a weak reason.
Okay, I will raise this point in the afternoon. As an alternative we can not ban anything in the policy related to obtaining a workload. This means that some of the images will try to connect to the batch system and we have to make sure these images are not run or won't affect the infrastructure at Nikhef. Of course this also means that these images will fail when they are started at Nikhef. If they want, other sites can do the same for images that will try to get their work from pilot job frameworks.
I'm not sure how successful this intervention will be. In previous discussions multiple sites did not like the idea of VMs prescribing the way they wanted to obtain their workload.
Just putting the policy in the background for a minute, if I would take the Alice VO as an example use case, then it doesn't make sense to connect back to the batch system when the image has been launched. A service within the VM could launch the AliEn('s have landed) pilot job framework and crunch on the data from there. In this pilot job mode, I called this the cloud-approach a while back i.e. executed on an infrastructure like Claudia i.e. close your eyes and launch a VM on some hardware (yes, I'm skipping details intentionally).
If you would use this in a batch system integrated way, then it would be launched by the batch system (INFN VM approach). This would not really mean that a pbs_mom is connecting to the site's Torque service from within the VM. If the latter would be the case, then it would act as a class 1 VM WN in the batch system of which I doubt you really want this to ever happen IMHO as it is both made off-site, has VO specific stuff added to it and would potentially mix with your regular cluster nodes.
I would change Point 7.4 "Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
to:
7.4a "How a running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
7.4b "The methods in which VM can be contextualized must adhere to site local policies"
Dear all,
I could not resist to also share my thoughts on this...
What I understand is that a certain "trusted" VM producer can supply a VirtualMachine to the community. After a endorser "endorses" its use, any member of the VO can boot this image on any site that trusts the VO and this endorsing procedure. A VM machine is then booted by the user, however the virtual machine is booted with the same "system" rights as a GridNode or a VO box.
The catch is: The VM now runs with far more rights then an ordinary job. It runs with system rights. And there is in the policy no mechanism in place to monitor it. And what's more: there is no liability clause for the endorser (and to what extend of damages are they liable; 1M$ 10M$ ??) I would strongly advice any site against implementing such a policy! It is very unsafe to allow end-users to gain rights as system users on your trusted network.
In our HPC Cloud we certainly do allow users to become root within their own VMS inside their own Vlan. However from a system point of view the whole virtual cluster runs with ONLY user credentials, not with system credentials. So it is perfectly OK for them to set their interfaces in promiscuous mode, to do all kinds of LDAP calls or even to try and hack every ip-addres in their own range, etc. However, if users want access to the outside world, we very strictly monitor the traffic and the VM. The more rights they want, the more they have to subject to rules. And the most they can get is a "public" ip in the DMZ, were the security settings will only allow what a specific user requested and needs.
What would be possibly be acceptible for the Grid is a VM that would run with user rights inside their own VLAN, with the same security permission settings and ACLs as any other Grid job. Interestingly enough, the group of Davide Salomoni implemented just that, you can find his OGF presentation here http://www.ogf.org/OGF28/materials/1994/salomoni_ogf28_100316.pdf End users can submit Grid jobs or their own VM. On a worker node there is a bait job that monitors the load. If there is a Grid job in the queue, a virtual-gridnode is started on the workernode and it accepts the job. If there is a user submitted VM in the queue, that gets booted. (slide 96): It is in production with currently 1400 on-demand Virtual Machines, O(10) supported Virtual Images, serving 20 different user communities; on average, more than 20,000 jobs are executed each day through WNoDeS. The plan is to have 4000 Virtual Machines by April 2010 and progressively integrate all Tier-1 resources. It is fully compatible with existing Grid infrastructure.
I think we should seriously reconsider what rights we are willing to give to users...
Kindest regards,
Floris
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Oscar Koeroo Sent: donderdag 22 april 2010 17:55 To: ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
On 22/4/10 1:52 PM, Sander Klous wrote:
in my opinion, one of the main advantages of VMs, is it allows you to make assumptions about many things being exactly the same, on all sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for a weak reason.
Okay, I will raise this point in the afternoon. As an alternative we can not ban anything in the policy related to obtaining a workload. This means that some of the images will try to connect to the batch system and we have to make sure these images are not run or won't affect the infrastructure at Nikhef. Of course this also means that these images will fail when they are started at Nikhef. If they want, other sites can do the same for images that will try to get their work from pilot job frameworks.
I'm not sure how successful this intervention will be. In previous discussions multiple sites did not like the idea of VMs prescribing the way they wanted to obtain their workload.
Just putting the policy in the background for a minute, if I would take the Alice VO as an example use case, then it doesn't make sense to connect back to the batch system when the image has been launched. A service within the VM could launch the AliEn('s have landed) pilot job framework and crunch on the data from there. In this pilot job mode, I called this the cloud-approach a while back i.e. executed on an infrastructure like Claudia i.e. close your eyes and launch a VM on some hardware (yes, I'm skipping details intentionally).
If you would use this in a batch system integrated way, then it would be launched by the batch system (INFN VM approach). This would not really mean that a pbs_mom is connecting to the site's Torque service from within the VM. If the latter would be the case, then it would act as a class 1 VM WN in the batch system of which I doubt you really want this to ever happen IMHO as it is both made off-site, has VO specific stuff added to it and would potentially mix with your regular cluster nodes.
I would change Point 7.4 "Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
to:
7.4a "How a running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
7.4b "The methods in which VM can be contextualized must adhere to site local policies"
_______________________________________________ ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Floris, The VM machine is not booted by the user, but by the site. The policy is intended in order to be able to share images created by e.g. Davide or others with other sites and for Davide to be able to run images generated elsewhere. The question is: how do we generate images and make Davide trust them enough to run them on his site?
The line HEPiX is trying to follow is that users should not be able to do more from within the VM than they are doing now from their "normal" jobs. There was a discussion on liability, but since we don't have this now on the grid (at least I am not aware of it), the VM discussion didn't seem the right place to suddenly start with this. Could you explain to me, why you think this is necessary for VMs, but not for normal jobs?
The model you follow in Cloudia is a very interesting one. In the HEPiX working group it becomes more and more clear that the Class 2 trusted VMs will be way too difficult for the end-users. Class 3 VMs are much more user friendly (as was alway advocated by Pieter). However, scalability issues up to the amount of jobs and users at the grid remain a concern until proven otherwise by Cloudia (I look forward to those results). As far as I know the WNoDeS of Davide is not a Class 3 facility (yet), but I'll have a look a his latest slides (thanks for the link). Thanks for the feedback, Sander
On Apr 23, 2010, at 3:35 PM, Floris Sluiter wrote:
Dear all,
I could not resist to also share my thoughts on this...
What I understand is that a certain "trusted" VM producer can supply a VirtualMachine to the community. After a endorser "endorses" its use, any member of the VO can boot this image on any site that trusts the VO and this endorsing procedure. A VM machine is then booted by the user, however the virtual machine is booted with the same "system" rights as a GridNode or a VO box.
The catch is: The VM now runs with far more rights then an ordinary job. It runs with system rights. And there is in the policy no mechanism in place to monitor it. And what's more: there is no liability clause for the endorser (and to what extend of damages are they liable; 1M$ 10M$ ??) I would strongly advice any site against implementing such a policy! It is very unsafe to allow end-users to gain rights as system users on your trusted network.
In our HPC Cloud we certainly do allow users to become root within their own VMS inside their own Vlan. However from a system point of view the whole virtual cluster runs with ONLY user credentials, not with system credentials. So it is perfectly OK for them to set their interfaces in promiscuous mode, to do all kinds of LDAP calls or even to try and hack every ip-addres in their own range, etc. However, if users want access to the outside world, we very strictly monitor the traffic and the VM. The more rights they want, the more they have to subject to rules. And the most they can get is a "public" ip in the DMZ, were the security settings will only allow what a specific user requested and needs.
What would be possibly be acceptible for the Grid is a VM that would run with user rights inside their own VLAN, with the same security permission settings and ACLs as any other Grid job. Interestingly enough, the group of Davide Salomoni implemented just that, you can find his OGF presentation here http://www.ogf.org/OGF28/materials/1994/salomoni_ogf28_100316.pdf End users can submit Grid jobs or their own VM. On a worker node there is a bait job that monitors the load. If there is a Grid job in the queue, a virtual-gridnode is started on the workernode and it accepts the job. If there is a user submitted VM in the queue, that gets booted. (slide 96): It is in production with currently 1400 on-demand Virtual Machines, O(10) supported Virtual Images, serving 20 different user communities; on average, more than 20,000 jobs are executed each day through WNoDeS. The plan is to have 4000 Virtual Machines by April 2010 and progressively integrate all Tier-1 resources. It is fully compatible with existing Grid infrastructure.
I think we should seriously reconsider what rights we are willing to give to users...
Kindest regards,
Floris
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Oscar Koeroo Sent: donderdag 22 april 2010 17:55 To: ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
On 22/4/10 1:52 PM, Sander Klous wrote:
in my opinion, one of the main advantages of VMs, is it allows you to make assumptions about many things being exactly the same, on all sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for a weak reason.
Okay, I will raise this point in the afternoon. As an alternative we can not ban anything in the policy related to obtaining a workload. This means that some of the images will try to connect to the batch system and we have to make sure these images are not run or won't affect the infrastructure at Nikhef. Of course this also means that these images will fail when they are started at Nikhef. If they want, other sites can do the same for images that will try to get their work from pilot job frameworks.
I'm not sure how successful this intervention will be. In previous discussions multiple sites did not like the idea of VMs prescribing the way they wanted to obtain their workload.
Just putting the policy in the background for a minute, if I would take the Alice VO as an example use case, then it doesn't make sense to connect back to the batch system when the image has been launched. A service within the VM could launch the AliEn('s have landed) pilot job framework and crunch on the data from there. In this pilot job mode, I called this the cloud-approach a while back i.e. executed on an infrastructure like Claudia i.e. close your eyes and launch a VM on some hardware (yes, I'm skipping details intentionally).
If you would use this in a batch system integrated way, then it would be launched by the batch system (INFN VM approach). This would not really mean that a pbs_mom is connecting to the site's Torque service from within the VM. If the latter would be the case, then it would act as a class 1 VM WN in the batch system of which I doubt you really want this to ever happen IMHO as it is both made off-site, has VO specific stuff added to it and would potentially mix with your regular cluster nodes.
I would change Point 7.4 "Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
to:
7.4a "How a running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
7.4b "The methods in which VM can be contextualized must adhere to site local policies"
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid _______________________________________________ ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Sander,
The VM machine is not booted by the user, but by the site.
Yes, and it runs inside the trusted network with system rights... So you have an "endorsed" user machine running with system rights. This is very unsafe.
The question is: how do we generate images and make Davide trust them enough to run them on his site?
A siteadmin doesn't need to trust VMs any more then he trusts regular user jobs, because from the system viewpoint they have the same rights. We (Davide and our HPC Cloud) run VMs with USER credentials, as a user job, and as an added feature inside its own virtual network. The outside connections that are allowed are only those that are allowed for regular user jobs and they are managed bridged connections, so they can indeed only see traffic that is their own.
Could you explain to me, why you think liability is necessary for VMs, but not for normal jobs?
If I would trust an endorser to be able to say that an image is safe for use, I would need to know how high the insurance policy is. If the usual policy applies: "yes, I think it is safe, but in case of trouble you yourself pay"; then I will not boot their machines under this policy, because I have no means to block a machine without violating the policy. I could just as easily mail them my root password, so they can manage my systems more conveniently. In that case it is also more clear who to blame in case of a mishap.
As far as I know the WNoDeS of Davide is not a Class 3 facility (yet), but I'll have a look a his latest slides.
Please do, it is a "class 3 facility", which does scale to run 20,000 jobs each day as a virtual machine. Including user submitted ones and Grid-jobs!
My point is still: People can run any VM in our HPC cloud, but only with rights that are proper for an end-user. I do not trust a VM at all or more then a regular user job (because we have no control over its contents), and I do not trust users never to make mistakes. Users are actually very much afraid of that kind of trust, they expect us to protect them in case something goes wrong. So in fact we know that we sometimes will get VMs infected by all sorts of malware, or that we will get hacking attempts by "a Korean terrorist", experience severe user mistakes, etc. Every system can and will be hacked. But when that happens the damage should be as close to zero as possible, because we will work hard to detect it as soon as it happens, as we (should) do with all our systems. And I will not help hackers by giving them a head start with root-rights on the physical network.
Cheers,
Floris
On Apr 23, 2010, at 3:35 PM, Floris Sluiter wrote:
Dear all,
I could not resist to also share my thoughts on this...
What I understand is that a certain "trusted" VM producer can supply a VirtualMachine to the community. After a endorser "endorses" its use, any member of the VO can boot this image on any site that trusts the VO and this endorsing procedure. A VM machine is then booted by the user, however the virtual machine is booted with the same "system" rights as a GridNode or a VO box.
The catch is: The VM now runs with far more rights then an ordinary job. It runs with system rights. And there is in the policy no mechanism in place to monitor it. And what's more: there is no liability clause for the endorser (and to what extend of damages are they liable; 1M$ 10M$ ??) I would strongly advice any site against implementing such a policy! It is very unsafe to allow end-users to gain rights as system users on your trusted network.
In our HPC Cloud we certainly do allow users to become root within their own VMS inside their own Vlan. However from a system point of view the whole virtual cluster runs with ONLY user credentials, not with system credentials. So it is perfectly OK for them to set their interfaces in promiscuous mode, to do all kinds of LDAP calls or even to try and hack every ip-addres in their own range, etc. However, if users want access to the outside world, we very strictly monitor the traffic and the VM. The more rights they want, the more they have to subject to rules. And the most they can get is a "public" ip in the DMZ, were the security settings will only allow what a specific user requested and needs.
What would be possibly be acceptible for the Grid is a VM that would run with user rights inside their own VLAN, with the same security permission settings and ACLs as any other Grid job. Interestingly enough, the group of Davide Salomoni implemented just that, you can find his OGF presentation here http://www.ogf.org/OGF28/materials/1994/salomoni_ogf28_100316.pdf End users can submit Grid jobs or their own VM. On a worker node there is a bait job that monitors the load. If there is a Grid job in the queue, a virtual-gridnode is started on the workernode and it accepts the job. If there is a user submitted VM in the queue, that gets booted. (slide 96): It is in production with currently 1400 on-demand Virtual Machines, O(10) supported Virtual Images, serving 20 different user communities; on average, more than 20,000 jobs are executed each day through WNoDeS. The plan is to have 4000 Virtual Machines by April 2010 and progressively integrate all Tier-1 resources. It is fully compatible with existing Grid infrastructure.
I think we should seriously reconsider what rights we are willing to give to users...
Kindest regards,
Floris
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Oscar Koeroo Sent: donderdag 22 april 2010 17:55 To: ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
On 22/4/10 1:52 PM, Sander Klous wrote:
in my opinion, one of the main advantages of VMs, is it allows you to make assumptions about many things being exactly the same, on all sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans the possibility to impose a specific way of obtaining your workload on every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for a weak reason.
Okay, I will raise this point in the afternoon. As an alternative we can not ban anything in the policy related to obtaining a workload. This means that some of the images will try to connect to the batch system and we have to make sure these images are not run or won't affect the infrastructure at Nikhef. Of course this also means that these images will fail when they are started at Nikhef. If they want, other sites can do the same for images that will try to get their work from pilot job frameworks.
I'm not sure how successful this intervention will be. In previous discussions multiple sites did not like the idea of VMs prescribing the way they wanted to obtain their workload.
Just putting the policy in the background for a minute, if I would take the Alice VO as an example use case, then it doesn't make sense to connect back to the batch system when the image has been launched. A service within the VM could launch the AliEn('s have landed) pilot job framework and crunch on the data from there. In this pilot job mode, I called this the cloud-approach a while back i.e. executed on an infrastructure like Claudia i.e. close your eyes and launch a VM on some hardware (yes, I'm skipping details intentionally).
If you would use this in a batch system integrated way, then it would be launched by the batch system (INFN VM approach). This would not really mean that a pbs_mom is connecting to the site's Torque service from within the VM. If the latter would be the case, then it would act as a class 1 VM WN in the batch system of which I doubt you really want this to ever happen IMHO as it is both made off-site, has VO specific stuff added to it and would potentially mix with your regular cluster nodes.
I would change Point 7.4 "Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
to:
7.4a "How a running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
7.4b "The methods in which VM can be contextualized must adhere to site local policies"
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid _______________________________________________ ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Floris,
The VM machine is not booted by the user, but by the site.
Yes, and it runs inside the trusted network with system rights... So you have an "endorsed" user machine running with system rights. This is very unsafe.
Actually, I am not sure it has to run with system rights. The policy says that the applications in the VM should have the same environment as the existing worker nodes. If that can be accomplished with user rights, that would be fine.
The question is: how do we generate images and make Davide trust them enough to run them on his site?
A siteadmin doesn't need to trust VMs any more then he trusts regular user jobs, because from the system viewpoint they have the same rights. We (Davide and our HPC Cloud) run VMs with USER credentials, as a user job, and as an added feature inside its own virtual network. The outside connections that are allowed are only those that are allowed for regular user jobs and they are managed bridged connections, so they can indeed only see traffic that is their own.
Sounds good to me. What about performance issues? I'm especially concerned about file I/O and network I/O.
Could you explain to me, why you think liability is necessary for VMs, but not for normal jobs?
If I would trust an endorser to be able to say that an image is safe for use, I would need to know how high the insurance policy is. If the usual policy applies: "yes, I think it is safe, but in case of trouble you yourself pay"; then I will not boot their machines under this policy, because I have no means to block a machine without violating the policy.
The policy states that the site can block machines whenever they feel like it for what ever reason they want, so I don't think this is true. I do see your point about running a VM (even if it is endorsed) as a system level process, making liability more of a concern. Maybe it is possible to reproduce the Nikhef WN environment in a VM even if it is running as a user process. Would there be ways to give the VMs access to e.g. NFS shared storage? Maybe the hypervisor can get access and then provide it to the VM. Can someone from Nikhef shed some light on this?
As far as I know the WNoDeS of Davide is not a Class 3 facility (yet), but I'll have a look a his latest slides.
Please do, it is a "class 3 facility", which does scale to run 20,000 jobs each day as a virtual machine. Including user submitted ones and Grid-jobs!
I'm sorry, I don't see anywhere in the presentation it is a class 3 facility. As you quoted, it says O(10) supported images. That's what Davide told me last time. He also told me they are working on a mechanism for users to upload arbitrary images (e.g. via http, I also see that mentioned in this presentation), but that the trust issue wasn't solved up to a level that it was deployable on 1400 cores and 20k jobs per day. That would require additional monitoring tools and security infrastructure, which at that time was not in place yet. After carefully looking at the slides I am not convinced that it is in place now. Anyway, as I told you, I'll be in Catania in the week of the 17th of May for the INFN annual meeting and I will ask him about it.
My point is still: People can run any VM in our HPC cloud, but only with rights that are proper for an end-user. I do not trust a VM at all or more then a regular user job (because we have no control over its contents), and I do not trust users never to make mistakes.
Point taken, we should look into running VMs with end-user rights. But even with end-user rights, if they get access to NFS or other shared resources, we need an endorsement procedure for the VM. I think this is the main difference with Cloudia, where the VMs don't have access to such shared resources.
Users are actually very much afraid of that kind of trust, they expect us to protect them in case something goes wrong.
Interesting insights. Thanks, Sander
Hi all (sorry for the lengthy spam, please do skip this when you are not interested in Security related to Virtual Machines),
Actually, I am not sure it has to run with system rights. The policy says that the applications in the VM should have the same environment as the existing worker nodes. If that can be accomplished with user rights, that would be fine... ... Would there be ways to give the VMs access to e.g. NFS shared storage?...
I understood this: The same environment as a WN equals root rights on the system network. Application rights equals user rights. Users are not allowed to do NFS mounts, so now I am a bit confused... The whole point of an NFS mount for a WN is to mount the software directory. The whole point of a VM is that users can install their own software and do not need an NFS mount to have that managed for them. So a user submitted "certified" VM does not need to access anything as a regular WN on the system/cluster level... And allowing it to do so is very unsafe, even booting it in the same network is very unsafe.
But even with end-user rights, if they get access to NFS or other shared resources, we need an endorsement procedure for the VM. I think this is the main difference with Cloudia, where the VMs don't have access to such shared resources.
I re-emphasize that in my opinion the proposed setup for "class 2" Virtual Machines is *unsafe* and more importantly it is not necessary. When most VOs and/or a users need a VM, it is because they need an easily configurable OS and software environment. So they stop needing the software directory, because it is more convenient to directly install the software. A VO and/or user submitted VM only needs the same rights as any regular grid-job and only needs to access resources that are available to any other grid-job. Ergo, we can treat a user submitted VM safely as any other user submitted job, with one caveat: the virtual network topology needs to contain it inside its own VLAN.
...Sounds good to me. What about performance issues? I'm especially concerned about file I/O and network I/O.
Any virtualization has overhead. The best you can squeeze it is about 90-95% percent on I/O and close to 100% native for CPU. This is independent from the network topology and with which rights a VM is running: i.e. how your network topology is designed does not make any difference on how your virtualization performs compared to the same setup for a native solution.
"Class 2 VMs" should be treated as "Class 3 VMs" and those should have the same or similar rights as any other user submitted job. In that case there is no difference between a regular grid-job and a "class 3 VM" grid-job, so they are not more or less safe. This can be accomplished by a few technical measures, that both Davide on his Grid, and we on our HPC cloud have implemented. The only policy we will need is stating that VMs will be treated as regular user jobs and that our regular rules for those apply. Additionally, on our HPC cloud at Sara we can actually allow VMs to have a public ip in case a few additional rules are observed, which we monitor closely.
Kind Regards,
Floris
On Mon, Apr 26, 2010 at 02:24:27PM +0200, Floris Sluiter wrote:
...Sounds good to me. What about performance issues? I'm especially concerned about file I/O and network I/O.
Any virtualization has overhead. The best you can squeeze it is about 90-95% percent on I/O and close to 100% native for CPU. This is independent from the network topology and with which rights a VM is running: i.e. how your network topology is designed does not make any difference on how your virtualization performs compared to the same setup for a native solution.
Hi Floris,
just a short question with respect to the network topology. Could you give some details on how you would do this. In order to get performance you need to put your VMs on a bridge, so can you get sufficient security or even a closed off VLAN with ebtables?
Furthermore, concerning the difference, or as you say equality, between a class-3 and a class-2 VM, how do you see this concerning for example syslogging? With a standard WN all logging is done by root in a site-installed WN, which means the logs have a very high level of trust. With a class-2 VM, logging is done by root inside the VM in a way endorsed by the endorser. With a class-3 VM there is no such thing, even if you demand logging to syslog (no idea how to do that with Windows), it's completely untrusted. Maybe you see this differently? The point is, even if the VM has only local user rights, just as a normal grid job, it's not clear to me how you can keep track of what that normal job is doing in the case of a VM. If a normal grid job does strange things, you can see traces in your syslog, in the VM case, you only see the VM process such as kvm, which is much harder to trace.
Cheers, Mischa
Hi all (more skipable spam/details on VMs and security),
In our view a VM is not much different from user jobs. Currently on the Grid we have no idea what a user is in fact calculating, when they call their "bank-encryption cracker" for example "cancer-research". User generated stuff is always very hard to trace, especially when they are trying to hide their traces... The first thing that clever hackers do is install a root kit, turn of all logging (or better yet filter illegal stuff out, so it seems ok) and encrypt everything else...
So what we minimally need to prevent is for example illegal access to data (scientist want to share, but certain data needs to be protected from other groups than their peers), so in case of user submitted VMs, that means strict network security. And prevent illegal access from and to the Internet (a cluster of VMs could form a perfect botnet).
which means the logs have a very high level of trust.
This is a bit naive, we do not trust machines that we do not maintain ourselves, ever (and we certainly do not think that our own work is always trustworthy). Or rather, we trust outside VMs just as much as other user generated jobs, which is not very much... With that said, in case of problems we can freeze a VM and inspect it manually with forensic tools like sleuthkit, this works for many OSses and is easier with VMs than with a physical host. And on the physical host we can monitor the syslogs etc to see if the VM tries to break out of its assigned space. On the network we do traffic and possibly packet inspection, port scans etc. The level of needed security depends on the requested usage by the user. We stack VLANS on VLANs, and on each compute node, the physical interface supports a virtual bridge, which connects to virtual interface of the VM. So separation between VLANS is both done on the physical bridges as on virtual bridges.
More details on our security policies you need to discuss with our security officers ;-)
Regards,
Floris
-----Original Message----- From: Mischa Salle [mailto:msalle@nikhef.nl] Sent: woensdag 28 april 2010 14:21 To: Floris Sluiter Cc: 'Sander Klous'; ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
On Mon, Apr 26, 2010 at 02:24:27PM +0200, Floris Sluiter wrote:
...Sounds good to me. What about performance issues? I'm especially concerned about file I/O and network I/O.
Any virtualization has overhead. The best you can squeeze it is about 90-95% percent on I/O and close to 100% native for CPU. This is independent from the network topology and with which rights a VM is running: i.e. how your network topology is designed does not make any difference on how your virtualization performs compared to the same setup for a native solution.
Hi Floris,
just a short question with respect to the network topology. Could you give some details on how you would do this. In order to get performance you need to put your VMs on a bridge, so can you get sufficient security or even a closed off VLAN with ebtables?
Furthermore, concerning the difference, or as you say equality, between a class-3 and a class-2 VM, how do you see this concerning for example syslogging? With a standard WN all logging is done by root in a site-installed WN, which means the logs have a very high level of trust. With a class-2 VM, logging is done by root inside the VM in a way endorsed by the endorser. With a class-3 VM there is no such thing, even if you demand logging to syslog (no idea how to do that with Windows), it's completely untrusted. Maybe you see this differently? The point is, even if the VM has only local user rights, just as a normal grid job, it's not clear to me how you can keep track of what that normal job is doing in the case of a VM. If a normal grid job does strange things, you can see traces in your syslog, in the VM case, you only see the VM process such as kvm, which is much harder to trace.
Cheers, Mischa
Dear all,
I cannot resist either :-). I agree very much with the analysis of Floris. Basically, running a VM should be handled the same as running a job. For what is the difference anyway with other jobs? The OS is just another application run by the user, and this application should have the same permissions as running other jobs. Then no additional policies are needed.
Of course you can go a step further, which is that you allow the set up in some way of your own managed nodes and clusters, although in a virtual environment. So, more or less you delegate the classical administrative responsibilities of sites to other authorities. Of course you can decide to do that, however the hosting organization cannot take any responsibility anymore for any misuse or whatsoever. It is the same as saying that part of the resources is reserved and managed by another organization.
Groet,
Jules
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Floris Sluiter Sent: 23 April 2010 15:35 To: ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
Dear all,
I could not resist to also share my thoughts on this...
What I understand is that a certain "trusted" VM producer can supply a VirtualMachine to the community. After a endorser "endorses" its use, any member of the VO can boot this image on any site that trusts the VO and this endorsing procedure. A VM machine is then booted by the user, however the virtual machine is booted with the same "system" rights as a GridNode or a VO box.
The catch is: The VM now runs with far more rights then an ordinary job. It runs with system rights. And there is in the policy no mechanism in place to monitor it. And what's more: there is no liability clause for the endorser (and to what extend of damages are they liable; 1M$ 10M$ ??) I would strongly advice any site against implementing such a policy! It is very unsafe to allow end-users to gain rights as system users on your trusted network.
In our HPC Cloud we certainly do allow users to become root within their own VMS inside their own Vlan. However from a system point of view the whole virtual cluster runs with ONLY user credentials, not with system credentials. So it is perfectly OK for them to set their interfaces in promiscuous mode, to do all kinds of LDAP calls or even to try and hack every ip-addres in their own range, etc. However, if users want access to the outside world, we very strictly monitor the traffic and the VM. The more rights they want, the more they have to subject to rules. And the most they can get is a "public" ip in the DMZ, were the security settings will only allow what a specific user requested and needs.
What would be possibly be acceptible for the Grid is a VM that would run with user rights inside their own VLAN, with the same security permission settings and ACLs as any other Grid job. Interestingly enough, the group of Davide Salomoni implemented just that, you can find his OGF presentation here http://www.ogf.org/OGF28/materials/1994/salomoni_ogf28_100316.pdf End users can submit Grid jobs or their own VM. On a worker node there is a bait job that monitors the load. If there is a Grid job in the queue, a virtual-gridnode is started on the workernode and it accepts the job. If there is a user submitted VM in the queue, that gets booted. (slide 96): It is in production with currently 1400 on-demand Virtual Machines, O(10) supported Virtual Images, serving 20 different user communities; on average, more than 20,000 jobs are executed each day through WNoDeS. The plan is to have 4000 Virtual Machines by April 2010 and progressively integrate all Tier-1 resources. It is fully compatible with existing Grid infrastructure.
I think we should seriously reconsider what rights we are willing to give to users...
Kindest regards,
Floris
-----Original Message----- From: ct-grid-bounces@nikhef.nl [mailto:ct-grid-bounces@nikhef.nl] On Behalf Of Oscar Koeroo Sent: donderdag 22 april 2010 17:55 To: ct-grid@nikhef.nl Subject: Re: [Ct-grid] Fwd: Updated draft of VM policy
On 22/4/10 1:52 PM, Sander Klous wrote:
in my opinion, one of the main advantages of VMs, is it allows
you to
make assumptions about many things being exactly the same, on
all
sites. Why else would you go to the trouble?? So this:
On 22 Apr 2010, at 13:30, Sander Klous wrote:
So, I see what you mean by banned now: the policy indeed bans
the
possibility to impose a specific way of obtaining your workload
on
every site. I think that is a good thing.
is to me turning off one of the main advantages of VMs, and for
a weak
reason.
Okay, I will raise this point in the afternoon. As an alternative
we can
not ban anything in the policy related to obtaining a workload.
This
means that some of the images will try to connect to the batch
system and
we have to make sure these images are not run or won't affect the infrastructure at Nikhef. Of course this also means that these
images
will fail when they are started at Nikhef. If they want, other
sites can
do the same for images that will try to get their work from pilot
job
frameworks.
I'm not sure how successful this intervention will be. In
previous
discussions multiple sites did not like the idea of VMs
prescribing the
way they wanted to obtain their workload.
Just putting the policy in the background for a minute, if I would take the Alice VO as an example use case, then it doesn't make sense to connect back to the batch system when the image has been launched. A service within the VM could launch the AliEn('s have landed) pilot job framework and crunch on the data from there. In this pilot job mode, I called this the cloud-approach a while back i.e. executed on an infrastructure like Claudia i.e. close your eyes and launch a VM on some hardware (yes, I'm skipping details intentionally).
If you would use this in a batch system integrated way, then it would be launched by the batch system (INFN VM approach). This would not really mean that a pbs_mom is connecting to the site's Torque service from within the VM. If the latter would be the case, then it would act as a class 1 VM WN in the batch system of which I doubt you really want this to ever happen IMHO as it is both made off-site, has VO specific stuff added to it and would potentially mix with your regular cluster nodes.
I would change Point 7.4 "Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
to:
7.4a "How a running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated."
7.4b "The methods in which VM can be contextualized must adhere to site local policies"
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid _______________________________________________ ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Sander,
4. Images should not be pre-configured to obtain a workload. How the running instance of an image obtains a workload is a contextualization option left to the site at which the image is instantiated.
The rules of engagement with respect to the contextualization are not specified in the policy. Point 7.4 is quite dependent upon how a site, endorser and image combiner might interpret the options in which contextualization can be done.
5. Instantiated images should not be any more constrained, notably in terms of network access, than a normal worker node at the site at which the image is instantiated.
Isn't this something that you state to a Site admin, not an Endorser?
6. There should be no installed accounts or user credentials of any form in an image.
I assume (must be clarified) that the policy refers to personal unix accounts. This account might belong to any of the specified (human) roles. This is what seems to be forbidden by the policy including credentials.
Nitpicking: I think point number 7.9 should be moved to 7.1 IMHO
8. You must assist the Grid in security incident response and must have a vulnerability assessment process in place.
So the Endorser is the one that gets all the blame, plus the role to act in the security incidents of 300+ sites. I hope the role of Endorser is plural to the poor soul that says 'yes' to the job.
10. You recognise that if a Site runs an image which no longer appears on your list of endorsed images no longer endorsed, that you are no longer responsible for any consequences of this.
(crooked sentence)
So, if I was an Endorser and shit has severely hit (and smothered) the fan then I would remove an endorsed image from the endorsed list and claim that it was not to be executed from a certain moment in time. This is probably what you do not wish to imply.
Oscar
On 22/4/10 9:58 AM, Sander Klous wrote:
Hi Jeff,
clarification on the other issue (getting a workload) is also important. i can understand why you would not want the image to get a "workload" (I would call it a job) from the site's batch system. you wouldn't even allow it. configured or not.
on the other hand, workload could also refer to something in the experiment's task queue off site. this would mean that the VM "woke up" as a pilot job. I don't see anything wrong with that at all. and that would be a type of pre-configuration.
The point is that "you don't see anything wrong with that", but other sites might. So, we don't want to specify it in the policy. That's why it is left a site contextualization issue, so each site can apply the mechanisms they are happy with. Another option would be to create X different versions of an image, each of them with a different way to retrieve the workload, sites can pick the ones they allow. That would not be inline with the philosophy of generating cross-site images, which is the main purpose of this policy.
On 22 Apr 2010, at 09:21, Sander Klous wrote:
Hi Jeff, (This time with reply-all, sorry)
- I think what they mean with pre-configured accounts is actually pre-configured credentials, which is a security concern. I will try to get this point clarified in the working group.
- The original proposal was to pre-configure the VM to connect back to the site batch system. This is the main mode of operation at CERN. At Nikhef this will never happen, so a different way is required to retrieve a workload (e.g. connect to a pilot job framework). Hence, in order to make an image usable cross-site, the mechanism to obtain the workerload can not be pre-configured.
Thanks for the feedback, Sander
On Apr 22, 2010, at 9:15 AM, Jeff Templon wrote:
Hi
mostly looks ok, but i do not understand two points:
why no pre-configured accounts? why should a combined ALICE VO image not contain an "alice" user account, already set up to go?
why not pre-configured to retrieve a workload?
Both of these things are valuable from the VO point of view. I would like to understand what the objections are. Other than the obvious one, being if one forbids enough of the advantages, then nobody will submit VMs ...
JT
On 22 Apr 2010, at 09:05, Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi, One more specific point I need input for before the discussion this afternoon:
We will introduce an endorser for the base and one for the VO software portions. We need to have a discussion about the requirements put on the role of the base image endorser. A simple "site member versus VO member" is not good enough to distinguish trust levels. So the question is: how do we "certify/qualify" a base image endorser?
Any ideas what is the list of requirements before we accept somebody as a base image endorser (i.e. endorsing the root portion of the installation)?
Thanks, Sander
On Apr 22, 2010, at 9:05 AM, Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Op 22-04-10 10:32, Sander Klous schreef:
Hi, One more specific point I need input for before the discussion this afternoon:
We will introduce an endorser for the base and one for the VO software portions. We need to have a discussion about the requirements put on the role of the base image endorser. A simple "site member versus VO member" is not good enough to distinguish trust levels. So the question is: how do we "certify/qualify" a base image endorser?
I suppose it depends on the scope of where the base will be used. If a base image is only going to be used at a single site, a local site admin can have that role. If an image is for distribution within an NGI, there should be agreement within such an organisation to bestow the responsibility.
For VOs we already have a role system in place. But what you probably also want is the audit trail of when a person gained and lost the role, so that this can be matched against the timestamps of actual endorsements.
Dennis
Hi Dennis, The whole point of the policy is to share images between sites. So, what should the agreement contain from Nikhef perspective? (and what is an NGI?) Thanks, Sander
On Apr 22, 2010, at 10:47 AM, Dennis van Dok wrote:
Op 22-04-10 10:32, Sander Klous schreef:
Hi, One more specific point I need input for before the discussion this afternoon:
We will introduce an endorser for the base and one for the VO software portions. We need to have a discussion about the requirements put on the role of the base image endorser. A simple "site member versus VO member" is not good enough to distinguish trust levels. So the question is: how do we "certify/qualify" a base image endorser?
I suppose it depends on the scope of where the base will be used. If a base image is only going to be used at a single site, a local site admin can have that role. If an image is for distribution within an NGI, there should be agreement within such an organisation to bestow the responsibility.
For VOs we already have a role system in place. But what you probably also want is the audit trail of when a person gained and lost the role, so that this can be matched against the timestamps of actual endorsements.
Dennis
Hi all, Thanks for the great feedback so far. One point is still not covered very well. Dennis and I discussed it a bit during lunch, but without solid conclusions.
"How do we certify/qualify a base image endorser?"
This person/entity is endorsing the root part of the image. So the site has to have sufficient trust in the base image endorser to believe that no bad things happen will happen from the VM on the site trusted network. We always claimed that just having VO software manger role is not enough to qualify. So, the next question is: what is enough to qualify and how do we make sure the endorser actually possesses these qualifications? Thanks, Sander
On Apr 22, 2010, at 11:12 AM, Sander Klous wrote:
Hi Dennis, The whole point of the policy is to share images between sites. So, what should the agreement contain from Nikhef perspective? (and what is an NGI?) Thanks, Sander
On Apr 22, 2010, at 10:47 AM, Dennis van Dok wrote:
Op 22-04-10 10:32, Sander Klous schreef:
Hi, One more specific point I need input for before the discussion this afternoon:
We will introduce an endorser for the base and one for the VO software portions. We need to have a discussion about the requirements put on the role of the base image endorser. A simple "site member versus VO member" is not good enough to distinguish trust levels. So the question is: how do we "certify/qualify" a base image endorser?
I suppose it depends on the scope of where the base will be used. If a base image is only going to be used at a single site, a local site admin can have that role. If an image is for distribution within an NGI, there should be agreement within such an organisation to bestow the responsibility.
For VOs we already have a role system in place. But what you probably also want is the audit trail of when a person gained and lost the role, so that this can be matched against the timestamps of actual endorsements.
Dennis
Op 22-04-10 09:05, Sander Klous schreef:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon.
Hi Sander,
Point 10 of the endorser policy looks like it contains a typo (Oscar found this already).
What I find missing is a method for revocation of an endorsement. Simply removing it from the endorsed list is not enough; there should be a signed statement saying 'I no longer endorse this combination, for this and this reason' and these should probably be distributed like CRLs.
Perhaps there should be automatic and/or implicit revocation, or at least expiry of a base system. When a security flaw is discovered, all the base systems that suffer it should no longer be used. And neither should any combination derived from them.
Just my 2ct.
Dennis
-- Sander
Begin forwarded message:
*Resent-From: *<hepix-virtualisation@cern.ch mailto:hepix-virtualisation@cern.ch> *From: *<david.kelsey@stfc.ac.ukmailto:david.kelsey@stfc.ac.uk> *Date: *April 22, 2010 12:32:26 AM GMT+02:00 *To: *<hepix-virtualisation@cern.chmailto:hepix-virtualisation@cern.ch> *Subject: **Updated draft of VM policy*
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Dennis,
What I find missing is a method for revocation of an endorsement. Simply removing it from the endorsed list is not enough; there should be a signed statement saying 'I no longer endorse this combination, for this and this reason' and these should probably be distributed like CRLs.
It was a very conscious choice of the working group to make revocation as simply as removing it from the endorsed list. Can you elaborate on why you think this is not enough?
Perhaps there should be automatic and/or implicit revocation, or at least expiry of a base system. When a security flaw is discovered, all the base systems that suffer it should no longer be used. And neither should any combination derived from them.
If I remember correctly the expiry is mentioned somewhere in the policy. If an image is no longer on the endorsed list, it won't be used anymore since the lists are checked before a VM is booted.
Thanks, Sander
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
-- D.H. van Dok :: Software Engineer :: www.nikhef.nl :: www.biggrid.nl Phone +31 20 592 22 28 :: http://www.nikhef.nl/~dennisvd/
Hi,
To what degree will this policy affect Big Grid and/or our community?
The venue this is discussed in, gives the impression that it's outside our scope. (What I want to assure is that it does not limit the way we may want to make use of vm's in the future)
Thanks
Cheers,
Mark
On Apr 22, 2010, at 9:05 , Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Hi Mark, In the BiG Grid VM working group we distinguished between trusted VMs and untrusted VMs. This policy deals with trusted VMs and will be applicable for BiG Grid. We are also working on an infrastructure to run untrusted VMs. This policy does not apply to untrusted VMs, so it does not limit the way you may want to make use of untrusted VMs in the future.
Trusted VMs will have the same access to resources as the current worker nodes. The untrusted VMs will be sand boxed, most likely causing some (yet unknown) performance loss compared to the trusted VMs and e.g. no access to the trusted network infrastructure. So this policy will limit you in the following way:
1) If the performance losses are not acceptable for your future VM, or if you need access to the trusted network infrastructure your VM has to comply with this policy. 2) If you want to be completely free in the way you create your VM, it can not make use of the site trusted network infrastructure and it will suffer from some performance losses due to the sand boxing.
These limitations are inline with conclusions from the BiG Grid VM working group presented to the Executive Team. A detailed progress report of this working group can be found here: https://wiki.nbic.nl/images/f/f6/ProgressReport-1.0.pdf
Hope this answers your question, -- Sander
On Apr 22, 2010, at 10:56 AM, Mark Santcroos wrote:
Hi,
To what degree will this policy affect Big Grid and/or our community?
The venue this is discussed in, gives the impression that it's outside our scope. (What I want to assure is that it does not limit the way we may want to make use of vm's in the future)
Thanks
Cheers,
Mark
On Apr 22, 2010, at 9:05 , Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
argggg .... it was not clear to me that the policy applies to trusted VMs and not untrusted ones!
JT
On 22 Apr 2010, at 11:24, Sander Klous wrote:
Hi Mark, In the BiG Grid VM working group we distinguished between trusted VMs and untrusted VMs. This policy deals with trusted VMs and will be applicable for BiG Grid. We are also working on an infrastructure to run untrusted VMs. This policy does not apply to untrusted VMs, so it does not limit the way you may want to make use of untrusted VMs in the future.
Trusted VMs will have the same access to resources as the current worker nodes. The untrusted VMs will be sand boxed, most likely causing some (yet unknown) performance loss compared to the trusted VMs and e.g. no access to the trusted network infrastructure. So this policy will limit you in the following way:
- If the performance losses are not acceptable for your future VM, or if you need access to the trusted network infrastructure your VM has to comply with this policy.
- If you want to be completely free in the way you create your VM, it can not make use of the site trusted network infrastructure and it will suffer from some performance losses due to the sand boxing.
These limitations are inline with conclusions from the BiG Grid VM working group presented to the Executive Team. A detailed progress report of this working group can be found here: https://wiki.nbic.nl/images/f/f6/ProgressReport-1.0.pdf
Hope this answers your question, -- Sander
On Apr 22, 2010, at 10:56 AM, Mark Santcroos wrote:
Hi,
To what degree will this policy affect Big Grid and/or our community?
The venue this is discussed in, gives the impression that it's outside our scope. (What I want to assure is that it does not limit the way we may want to make use of vm's in the future)
Thanks
Cheers,
Mark
On Apr 22, 2010, at 9:05 , Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.
Regards Dave
Dr David Kelsey Particle Physics Department Rutherford Appleton Laboratory Chilton, DIDCOT, OX11 0QX, UK
e-mail: david.kelsey@stfc.ac.uk Tel: [+44](0)1235 445746 (direct) Fax: [+44](0)1235 446733
-- Scanned by iCritical.
<VirtualisationPolicy-v1.2.pdf>
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
ct-grid mailing list ct-grid@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ct-grid
Sander Klous wrote:
Hi Mark, In the BiG Grid VM working group we distinguished between trusted VMs and untrusted VMs. This policy deals with trusted VMs and will be applicable for BiG Grid. We are also working on an infrastructure to run untrusted VMs. This policy does not apply to untrusted VMs, so it does not limit the way you may want to make use of untrusted VMs in the future.
Trusted VMs will have the same access to resources as the current worker nodes. The untrusted VMs will be sand boxed, most likely causing some (yet unknown) performance loss compared to the trusted VMs and e.g. no access to the trusted network infrastructure. So this policy will limit you in the following way:
- If the performance losses are not acceptable for your future VM, or if you need access to the trusted network infrastructure your VM has to comply with this policy.
- If you want to be completely free in the way you create your VM, it can not make use of the site trusted network infrastructure and it will suffer from some performance losses due to the sand boxing.
These limitations are inline with conclusions from the BiG Grid VM working group presented to the Executive Team. A detailed progress report of this working group can be found here: https://wiki.nbic.nl/images/f/f6/ProgressReport-1.0.pdf
and to add to that:
- "trusted network resources" in this case means access to local NFS-type disks, e.g. the local software installation area, and internal websites - for untrusted VMs outbound connectivity will be possible but will be severely limited for security reasons. You may or may not get *outbound* access on ports 80, 443, 2811 (gridftp) and possible a few others but don't expect a whole range. Inbound access is always restricted (as is currently the case for worker nodes).
JM2CW,
JJK
On Apr 22, 2010, at 10:56 AM, Mark Santcroos wrote:
Hi,
To what degree will this policy affect Big Grid and/or our community?
The venue this is discussed in, gives the impression that it's outside our scope. (What I want to assure is that it does not limit the way we may want to make use of vm's in the future)
Thanks
Cheers,
Mark
On Apr 22, 2010, at 9:05 , Sander Klous wrote:
Hi, Comments please. Preferably before the meeting starts at 16:00 this afternoon. -- Sander
Begin forwarded message:
Resent-From: hepix-virtualisation@cern.ch From: david.kelsey@stfc.ac.uk Date: April 22, 2010 12:32:26 AM GMT+02:00 To: hepix-virtualisation@cern.ch Subject: Updated draft of VM policy
Dear all,
The updated draft (V1.2) of the Virtualisation Policy may be found at the JSPG wiki...
http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines
I also attach a PDF version just in case this is not reachable.
Lots of things to be discussed I am sure :=)
For discussion tomorrow.