The psp.c source is restricted in makefile with .if ${MACHINE} ==
"amd64" so use #ifdef __amd64__ around the call to psp_setup(). On
arm64 set vmd_psp_fd to an invalid value.
OK hshoexer@ mlarkin@
Use shutdown and init to reset psp(4) on vmd(8) startup. This helps
when hacking on vmd(8) and crashing it. The psp(4) reset cleans
up all remnants of dead VMs from psp(4). Otherwise one would have
to reboot the machine.
from hshoexer@; OK mlarkin@
To launch a guest with AMD SEV enabled, vmd needs to do a few things:
- retrieve ASID used by guest on VM creation
- provide ASID to psp(4)
- let psp(4) encrypt memory used intially by guest
- run guest
- release resources held by psp(4) on guest shutdown
To enable SEV for a guest use the parameter "sev" in the guest's vm
section in vm.conf.
from hshoexer@; OK mlarkin@
Makes as much of the core of vmd mi, pushing x86-isms into separate
compilation units. Adds build logic for arm64, but no emulation
yet. (You can build vmd, but it won't have a vmm device to connect
to.)
Some more cleanup probably needed around interrupt controller
abstraction, but that can come as we implement more than the i8259.
ok mlarkin@
To prepare for mi/md splitting vmd, need to fixup the dev/vmm/vmm.h
mi header. Move the vm_run_params struct and clean up the includes
in vmd.
"sure", mlarkin@
Remove extraneous fcntl(3) usage for setting fd features that can
be set at time of open(2), pipe2(2), or socketpair(2). Also cleans
up pty creation switching to using functions from libutil instead
of direct ioctl(2) calls.
ok mlarkin@, original diff ok claudio@ as well.
The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.
Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.
The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.
In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.
ok mlarkin@
While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:
1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.
2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.
This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.
Tested with help from Mischa Peters.
ok mlarkin@
vm_instance was using the wrong vm instance for checking the
vm_kernel_path member. Switch to using the value from the parent
vm instance in the check for if a kernel is known.
Issue reported by kn@. OK mlarkin@, kn@.
Adding in the ability to override the boot kernel created an edge
case in the ipc message handling logic for the parent process (vmd)
when receiving a "start vm" request. Result was incorrectly responding
to the control process, and as a result the vmctl client, with a
bogus "start vm response" reply with an empty tty name.
This commit rewrites the logic of how vmd goes about processing the
"start vm" request with the aim of making it simpler to understand
while addressing the edge case.
Issue reported by kn@. OK mlarkin@.
vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.
This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.
ok mlarkin@
Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".
Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.
With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.
Testing help from phessler@ and Mischa Peters.
ok mlarkin@
The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.
This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.
ok mlarkin@
/var/log/{messages,daemon} logs ENOENT as error on default configless vmd.
Only complain on explicitly passed files and print a debug hint under `-vv'
in case someone forgot to populate their /etc/vm.conf.
OK dv mlarkin
Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.
This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).
ok mlarkin@
Some mild tidying of fd closing in the vmm process in prep for
landing parts of my fork+exec diff.
With input from guenther@ on the nuances of if/when EINTR may happen
in a call to close(2).
ok mlarkin@
Other structs use a fixed length array already. This allows a vmd_vm
object to be transmitted over an ipc channel, too.
Additionally, solves a segfault caused by a strlcpy(3) in an error
path.
ok mlarkin@
Multiple error paths, specifically the one related to if a guest
cannot allocate memory at start, resulted in a known vm (via
vm.conf(5)) being removed from the vm list. Adjust the error paths
to check if the failing vm is defined in the config before tearing
it down.
Tested with help from beck@ and Mischa Peters.
ok beck@
Have the parent process open /dev/vmm and send the fd to the vmm
child process. Only the vmm process and its resulting children
(guest vms) need it for ioctl calls.
ok kn@
User accounting and enforcement was never finished. tedu the thing
until someone wants to pick it up and finish it.
Originally found by Matthew Martin.
ok mlarkin@, kn@. input from tb@.
Rebooting a received vm resulted in vmd(8) exiting as a result of
flawed state tracking in the parent process.
When stopping a vm, clear the VM_RECEIVE_STATE flag. When starting
a vm, make sure the parent process collapses any existing memory
ranges after the vm is sent to the vmm process (responsible for
launching the vm).
ok mlarkin@
With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.
This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.
While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.
Original diff from tedu@. OK mlarkin@.
macro-build a replacement for sccsid, and was done without any concern
for namespace damage. Unfortunately this practice started infecting
other code as others were unaware they didn't need the file.
ok millert guenther
Refactor config_setvm to directly return error code on failure
instead of returning -1 and setting errno. It was setting unsupported
values not defined in <errno.h>.
OK mlarkin@
vmd(8)'s vm_instance function set unsupported errno values. Change the
api to directly return an error (either errno or custom vmd error).
"go for it" -mlarkin@
Adds queue-based tracking of waiting client state to fix the cause of
state corruption when a vmctl(8) user cancels a wait and restarts it.
The socket fd value for the control process client was being used to
track the waiting party, but this also prevented multiple waiting
clients.
This moves all the state tracking of who to notify of a vm's stopping
to the control process and no longer requires the parent process to
track it in the global environment state.
Future work will be needed to smooth out the difference between the
IMSG_VMDOP_TERMINATE_VM_{EVENT,RESPONSE} events instead of needing to
translate before relaying to the vmctl(8) client.
Tested by Mischa Peters (thanks!)
ok mlarkin@
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.
This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.
OK mlarkin@
This addresses 'thundering herd' problem when a lot of
vms are configured in vm.conf. A lot of vms booting in parallel can
overload the host and also mess up tsc calibration in openbsd guests as
it uses PIT which doesn't fire reliably if the host is overloaded.
We default to starting vms with parallelism of ncpuonline and a delay 30 seconds
between batches. This is configurable in vm.conf.
ok mlarkin@ (also addressed comments from cheloha@)
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.