zulip.{lix.systems,afnix.fr}: init #279

Merged
raito merged 8 commits from zulip into main 2025-08-27 00:37:21 +00:00
Owner

This is an application of #271.

This is an application of #271.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
NixOS vanilla kernels contains many good things but are not optimized
for "cloud" instances, read: VM instances on an hypervisor implementing
modern features, e.g. cloud-hypervisor.

As a result, they cause long boot times for no good reason.

With this commit, we ship a minimal KVM kernelconfig. Job will be to
find a way to maintain it sanely.

[root@test01:~]# systemd-analyze time
Startup finished in 206ms (kernel) + 3.371s (initrd) + 2.307s (userspace) = 5.885s
multi-user.target reached after 2.294s in userspace.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
This is a NAT64 node.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
@ -115,3 +115,3 @@
nixpkgs.overlays = [(self: super: {
prometheus-smartctl-exporter = super.prometheus-smartctl-exporter.overrideAttrs (final: prev: {
patches = (prev.patches or []) ++ [ ./exporters/smartctl-exporter-i305.patch ];
patches = (prev.patches or []) ++ lib.optional (prev.meta.__i305 or false) [ ./exporters/smartctl-exporter-i305.patch ];
Author
Owner

this happens due to a double overlaying

this happens due to a double overlaying
raito marked this conversation as resolved
raito changed title from WIP: zulip.lix.systems: init to WIP: zulip.{lix.systems,afnix.fr}: init 2025-08-25 18:39:15 +00:00
raito changed title from WIP: zulip.{lix.systems,afnix.fr}: init to zulip.{lix.systems,afnix.fr}: init 2025-08-26 22:05:52 +00:00
@ -93,6 +93,7 @@ let
mkInterface = name:
let
interface = cfg.interfaces.${name};
mkIntfName = { intfName, vmName }: "vm-${substring 0 (11 - stringLength intfName) vmName}-${intfName}";
Owner

This seems dangerous to me given that common prefixes seems like a very common case (e.g. superservice01 and superservice02 would both truncate to exactly the same thing).

This seems dangerous to me given that common prefixes seems like a very common case (e.g. superservice01 and superservice02 would both truncate to exactly the same thing).
Author
Owner

Yep, I forgot to put a TODO that I'd go for suffixes, @delroth do you have another suggestion? I'd prefer proper altnames here but I think this requires patching microvm.nix.

Yep, I forgot to put a TODO that I'd go for suffixes, @delroth do you have another suggestion? I'd prefer proper altnames here but I think this requires patching microvm.nix.
Owner

I don't.

I don't.
Author
Owner

@delroth Are you fine with suffixes?

@delroth Are you fine with suffixes?
Owner

It's slightly better but it still feels super fragile. Like, we're already one letter off from "lix-zulip01" and "afnix-zulip01" having the same 11 character suffix...

It's slightly better but it still feels super fragile. Like, we're already one letter off from "lix-zulip01" and "afnix-zulip01" having the same 11 character suffix...
Author
Owner

@delroth The easiest way out is to introduce additional evaluation checks to verify that no collision can happen, would that be sufficient for you? It will cost us evaluation time though.

@delroth The easiest way out is to introduce additional evaluation checks to verify that no collision can happen, would that be sufficient for you? It will cost us evaluation time though.
Owner

I don't have a strong opinion about how/whether to resolve this.

I don't have a strong opinion about how/whether to resolve this.
Author
Owner

OK, I will go for a O(n^2) assertion check and suffixes, thanks.

OK, I will go for a O(n^2) assertion check and suffixes, thanks.
Author
Owner

In this case, suffix will fail, but prefixes work (a miracle, we could say).

The prefix calculation is simpler and works, I'm inclined to go back to prefixes for now, leave to assert which catch indesirable events, leave a TODO to move to either altnames AND/OR hashed interface names based on the full VM name.

I suppose you're still fine with it, if not, please object.

In this case, suffix will fail, but prefixes work (a miracle, we could say). The prefix calculation is simpler and works, I'm inclined to go back to prefixes for now, leave to assert which catch indesirable events, leave a TODO to move to either altnames *AND/OR* hashed interface names based on the full VM name. I suppose you're still fine with it, if not, please object.
Author
Owner

I implemented the hostId idea, resolving.

I implemented the hostId idea, resolving.
raito marked this conversation as resolved
@ -0,0 +153,4 @@
};
};
networking.firewall.allowedTCPPorts = [ 80 443 28464 ];
Owner

What's 28464?

What's 28464?
Author
Owner

It's the port for appservice communication for Matrix servers. We actually don't need it here for AFNix, dropping.

It's the port for appservice communication for Matrix servers. We actually don't need it here for AFNix, dropping.
raito marked this conversation as resolved
@ -0,0 +1,158 @@
{ pkgs, secretsPath, config, lib, ... }:
Owner

I'm confused why "zulip01" is not tenant-prefixed and then we have "afnix-zulip01" which is tenant-prefixed?

Also the file path does not match the commit message (but it does in the next commit in the chain?)

I'm confused why "zulip01" is not tenant-prefixed and then we have "afnix-zulip01" which is tenant-prefixed? Also the file path does not match the commit message (but it does in the next commit in the chain?)
Author
Owner

Historical reasons, this can be prefixed.
Same reasons for the filepath stuff.

Historical reasons, this can be prefixed. Same reasons for the filepath stuff.
Author
Owner

Actually, this raise an important question to solve.

When renaming VMs, zvols datasets are not following them, our sole identifier is the VM ID which is right now the attribute name in the vms attribute set.

So, when renaming, the state is busted and created anew, which is fine.

A manual intervention can simply: zfs rename the dataset and it's all good again.

The open question though is when you rename into an existing dataset, this can only happen if the VM for which the dataset existed went away. The current script is safe and will not overwrite the current zvol dataset, but, the VM might run and attach that zvol dataset and the VM itself can start doing things to the pool.

If a VM goes away, its ZFS pool should be renamed to spiritedaway-$ORIG_NAME and left to the operator to purge such zvol datasets.

Does that make sense?

Actually, this raise an important question to solve. When renaming VMs, zvols datasets are not following them, our sole identifier is the VM ID which is right now the attribute name in the `vms` attribute set. So, when renaming, the state is busted and created anew, which is fine. A manual intervention can simply: zfs rename the dataset and it's all good again. The open question though is when you rename into an existing dataset, this can only happen if the VM for which the dataset existed went away. The current script is safe and will not overwrite the current zvol dataset, **but**, the VM might run and attach that zvol dataset and the VM itself can start doing things to the pool. If a VM goes away, its ZFS pool should be renamed to `spiritedaway-$ORIG_NAME` and left to the operator to purge such zvol datasets. Does that make sense?
Author
Owner

I implemented the hostId idea, resolving.

I implemented the hostId idea, resolving.
raito marked this conversation as resolved
@ -7,3 +4,3 @@
# FIXME(Raito): Please test this and confirm that the route is installed automatically (this requires an hypervisor reboot).
microvm.binScripts.tap-up = ''
${lib.getExe' pkgs.iproute2 "ip"} route replace 57.129.18.76 dev vm-n64gw01-v4 scope link
${lib.getExe' pkgs.iproute2 "ip"} route replace 57.129.18.76 dev vm-b8ac-v4 scope link
Author
Owner

the interface name should probably be derived from a helper library

the interface name should probably be derived from a helper library
raito marked this conversation as resolved
requested review from delroth 2025-08-27 00:17:10 +00:00
delroth approved these changes 2025-08-27 00:23:41 +00:00
@ -0,0 +1,157 @@
{ pkgs, secretsPath, config, lib, ... }:
Owner

Commit message still refers to the old path, I think.

Commit message still refers to the old path, I think.
raito marked this conversation as resolved
raito merged commit a45a9e1232 into main 2025-08-27 00:37:21 +00:00
raito deleted branch zulip 2025-08-27 00:37:22 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: the-distro/infra#279
No description provided.