I am interested in running high availability clusters at home. Clusters have some advantages over home servers that I think are under-appreciated:
- They allow for hardware replacement/maintenance on a human-centric schedule of “I’ll get to it next weekend”. People with families or otherwise busy lives could manage server chores the same way they manage minor household repairs, and the services they use keep humming in the background.
- They could allow for easier storage expansion. Imagine if buying your second Synology or QNAP NAS only added capacity, and didn’t add any management overhead beyond the first NAS.
My dad is a photographer, and he’s spent lots of time and money managing storage solutions for his business. Expansion especially is a huge expense, because the best way to expand is to buy a new NAS or DAS and fill it with the biggest drives you can afford all at once. For a hobbyist, upgrading drives one at a time might be feasible, but for a business that’s too time consuming and fragile for critical storage.
Some technical notes about this idea:
- Nodes should be stateless; all state should come from the cluster manager. This should also improve the user experience for backups and restores.
- I’m not sure any of the software is there yet for home users, but it could get there. The features in something like Ceph for storage and Kubernetes for running apps are nice, but maintenance, reliability, and error UX are bad even when compared to business SaaS, not nearly good enough today for home users. This is solvable.
- I think a home cluster could improve both expansion and performance, but performance would likely require home intranet upgrades (10Gb Ethernet). Systems like LizardFS, which cluster server storage into a single unified filesystem for performance, have proved this idea can work in the enterprise, at least.
I have worked on two projects trying to explore this idea.
used single board computers and Docker Swarm.
I got stuck trying to solder together the overly complicated power system I was trying to use
and haven’t picked it up in a while now.
It’s still on the project back burner… somewhere…
but for now my biggest plan for it is decoration on the wall of my new office.
Its key ideas:
- Single board computers are low power
- A dedicated serial server would connect to all systems over RS232 – no KVM required for a console
- A dedicated network with a router to separate from the main home network
- Only normal gigabit ethernet
- I want to build an OS that would boot locally into a specialized Linux kernel build, read configuration and pull a real kernel and filesystem image over the network, and then kexec into the real kernel and rootfs. However, at the time, I couldn’t get kexec to work.
- Used Ansible to configure all nodes
is using desktop mini-PCs and Kubernetes.
Its key ideas:
- Desktop mini-PCs are still pretty low power, and much more powerful
- Lack of serial console support is a disapointment; for now I’m using PiKVM
- Uses a simpler system with live kernel/filesystems built from Alpine (psyopsOS)
- Uses a custom Python package instead of Ansible for system configuration (progfiguration)
- Kubernetes is extremely complex and hard to learn from scratch. Everything is optional, meaning there isn’t really a happy path for bare metal clusters. Kubernetes really feels like Linux from Scratch and what I want is Debian. I’ve been documenting Kubernetes configuration at https://kubernasty-labnotes.micahrl.com.
- I’ve tried Longhorn and Ceph for cluster storage, and haven’t been satisfied with either. Longhorn’s core product is not mature at all, and Ceph’s Kubernetes support is not mature even if its design is. This is where I’ve spent most of my time on this project in 2023 so far.