Hi all (sorry for the lengthy spam, please do skip this when you are not interested in Security related to Virtual Machines),
Actually, I am not sure it has to run with system rights. The policy says that the applications in the VM should have the same environment as the existing worker nodes. If that can be accomplished with user rights, that would be fine... ... Would there be ways to give the VMs access to e.g. NFS shared storage?...
I understood this: The same environment as a WN equals root rights on the system network. Application rights equals user rights. Users are not allowed to do NFS mounts, so now I am a bit confused... The whole point of an NFS mount for a WN is to mount the software directory. The whole point of a VM is that users can install their own software and do not need an NFS mount to have that managed for them. So a user submitted "certified" VM does not need to access anything as a regular WN on the system/cluster level... And allowing it to do so is very unsafe, even booting it in the same network is very unsafe.
But even with end-user rights, if they get access to NFS or other shared resources, we need an endorsement procedure for the VM. I think this is the main difference with Cloudia, where the VMs don't have access to such shared resources.
I re-emphasize that in my opinion the proposed setup for "class 2" Virtual Machines is *unsafe* and more importantly it is not necessary. When most VOs and/or a users need a VM, it is because they need an easily configurable OS and software environment. So they stop needing the software directory, because it is more convenient to directly install the software. A VO and/or user submitted VM only needs the same rights as any regular grid-job and only needs to access resources that are available to any other grid-job. Ergo, we can treat a user submitted VM safely as any other user submitted job, with one caveat: the virtual network topology needs to contain it inside its own VLAN.
...Sounds good to me. What about performance issues? I'm especially concerned about file I/O and network I/O.
Any virtualization has overhead. The best you can squeeze it is about 90-95% percent on I/O and close to 100% native for CPU. This is independent from the network topology and with which rights a VM is running: i.e. how your network topology is designed does not make any difference on how your virtualization performs compared to the same setup for a native solution.
"Class 2 VMs" should be treated as "Class 3 VMs" and those should have the same or similar rights as any other user submitted job. In that case there is no difference between a regular grid-job and a "class 3 VM" grid-job, so they are not more or less safe. This can be accomplished by a few technical measures, that both Davide on his Grid, and we on our HPC cloud have implemented. The only policy we will need is stating that VMs will be treated as regular user jobs and that our regular rules for those apply. Additionally, on our HPC cloud at Sara we can actually allow VMs to have a public ip in case a few additional rules are observed, which we monitor closely.
Kind Regards,
Floris