VM workloads for hypervisors #271
No reviewers
Labels
No labels
Compat/Breaking
Difficulty
Architectural
Difficulty
Easy
Difficulty
Hard
Help Wanted
Kind
Bug
Kind
Documentation
Kind
Enhancement
Kind
Feature
Kind
Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Security
Silenced Alert
Status
Abandoned
Status
Blocked
Status
Need More Info
Status
Postponed
Tracking Issue
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: the-distro/infra#271
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "vm-workloads"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This implements an ad-hoc mechanism to provision declarative VMs on our hypervisors based on microvm.nix, cloud-hypervisor and ZFS.
Simple modus operandi:
vm/$hypervisor_attr_name_in_flake_dot_nix/$vm_name/default.nix
hardware.vm
as shown bytest01
.To connect to a VM from the host, you can use
vmsh
which is a poor's man console access, relying on screen and virtio-console (TODO: add patches for screen resizing from SpectrumOS).If you need to exchange files:
/run/microvm/<vm name>/xchg
exist and is mounted on both sides.VMs are as stateless as it can be, i.e. they boot from the host Nix store and they have access to the host Nix store, this achieves a nice deduplication effect for all VMs OS information. Volumes are usually meant ONLY for
/var
, side effect:/etc/
gets reset on each reboot. An exception has been allowed forsshd
and is mounted on the host as well in the host/var
, this enables preprovisioning of the SSH host keys for secret provisioning.A custom kernel is used and enable very fast booting at the cost of playing Whack'n'mole with what is broken or what is missing in the Kconfig.
One thing is very broken:
/var/lib/microvms/%i/sshd
directory creation, systemd-tmpfiles does NOT always kick in. If you manually remove the inode and reswitch to configuration, tmpfiles will not be run as there is no configuration change. A better provisioning technique should be adopted than systemd-tmpfiles here.Other than that, the approach may be improved a lot. Perhaps, we don't need a XFS journal separate with a different block device and it's premature optimization.
A real-world useful example is provided with n64gw01 meant as a NAT64 gateway via jool, perhaps a bit complicated and contains some hacks due to the networking aspects.
Quick initial pass without looking too much into the details.
@ -4,0 +38,4 @@
# TODO(Raito): replace me by a `vmDefinitionsPath` rather.
readVMs = hypervisorName:
mapAttrs (n: _: mkVM ../../../vm/${hypervisorName}/${n}
I'm not entirely convinced by the idea of treating vms differently from other hosts (which are in
hosts/
). Do you have arguments pro/con?The way I see it, something like
cp $vm_1 ../$hyp/$vm_1
ormv
should be the simplest way to handle things, aside from some unavoidable state issues. That works fine for (internal) VMs, but not for baremetal, which can't just come online by moving files around and prepping some state.The whole point here is to get automatic VM loading based on the directory tree, where the hypervisor is implied. That breaks if we go into
hosts/
.Maybe this is overkill and we can revisit it later. The upside is (limited for now) simplicity in managing VMs. The downside is that this adds compute in a weird way, it's not your typical VPS, and it's not baremetal either.
@ -0,0 +287,4 @@
let
systemd-openbao = import inputs.systemd-openbao { };
in
[
This will inevitably drift and lead to confusion. Any idea on how we could avoid this?
We would need to extract the modules used by the colmena's hive instantiation function and apply it here. I think feasible but I cannot think of the path to achieve it right now, OTOH.
It's safe to have the full colmena modules because we don't use any that possess computational meaning.
I implemented it in #279/commits/
dbba80616f
because I obviously got hit by it.@ -0,0 +393,4 @@
networking.nftables.enable = true;
services.dbus.implementation = "broker";
systemd.services.systemd-oomd = {
requires = [ "userborn.service" ];
Can you document why?
IIRC, an upstream bug. I need to double check. systemd-oomd depends on user to be ready, it actually doesn't order itself well.
Not necessary anymore since https://github.com/NixOS/nixpkgs/pull/424035#pullrequestreview-3010253359 which we do have.
@ -0,0 +419,4 @@
systemd.network.enable = true;
systemd.network.networks = mapAttrs' mkNetworks cfg.interfaces;
# Otherwise, it's really annoying at redeployment time.
Any reason to not do it globally for all monitoring agent proms then?
No good reason.
@ -4,0 +28,4 @@
vmOptions = {
options = {
evalModule = mkOption {
some good feedback from pennae: just dont do that.
@ -14,0 +67,4 @@
environment.systemPackages = [
(pkgs.writeShellScriptBin "vmsh" ''
NAME=$1
[[ -d /var/lib/microvms/$NAME ]] || (echo "No such VM '$NAME'"; exit 1)
broken parsing
@ -0,0 +1,111 @@
{ lib, pkgs, ... }:
{
microvm.vsock.cid = 5;
automatic numbering by hash of name is better
@ -0,0 +369,4 @@
};
boot.kernelPackages = pkgs.linuxPackages_custom {
version = "6.6.100";
src = pkgs.fetchurl {
use nixpkgs source and infer the ver from there instead
@ -0,0 +4,4 @@
# This is critical to ensure that the host sends IPv4 packets directly to this VM's IPv4 interface.
microvm.binScripts.tap-up = ''
${lib.getExe' pkgs.iproute2 "ip"} replace 57.129.18.76 dev vm-n64gw01-v4 scope link
this is absolutely wrong i think and doesn't work.
e1fad9d828
toc19639a1ad
WIP: VM workloads for hypervisorsto VM workloads for hypervisorsc19639a1ad
tocb4a70bb44
cb4a70bb44
to05b60ba890
05b60ba890
tobee1fd09bf
bee1fd09bf
toe98152f82e
@ -0,0 +126,4 @@
mkCreateScript = name: { path, pool, size, properties, ... }:
let
max = a: b: if a <= b then b else a;
# journal size: 10MB if <1GB
16
e98152f82e
to21e37eeef1