There Has Been a Slight Change with My Lab: An Upgrade

Every big project starts small — and this is mine. I’m setting the stage for a full rebuild of my production RKE2 cluster by first rolling up my sleeves and diving deep into a dedicated Kubernetes lab environment. This isn’t just tinkering for fun (though it will be fun); it’s a focused effort to test, learn, and refine the tools and workflows that will power my future production setups.

With an old Lenovo ThinkStation P320 Tiny, I’m going to break things, fix them, and—most importantly—learn why they broke in the first place. This lab is where mistakes become lessons, and lessons become skills.

🛠️ Lab Hardware Specs

The heart of my lab is humble, but it packs enough punch for everything I need:

Lenovo ThinkStation P320 Tiny

Ubuntu Server 24.04.2 LTS
Rancher/SUSE K3s
Intel Core i7-7700T
16GB RAM
256GB NVMe SSD
1GbE NIC
Nvidia Quadro P600

While modest compared to a full-blown production cluster, it’s perfect for lab work, experimentation, and building a solid foundation.

🎯 Learning Goals & Focus Areas

This lab isn’t about just spinning up pods. It’s about understanding the ecosystem that makes Kubernetes production-ready. Here’s where my focus will be:

📡 Networking (CNI)

Calico for advanced network policies and scalability.
Cilium to explore eBPF-powered networking and observability.
Flannel for a lightweight, baseline comparison.
Canal combines Calico’s policy engine with Flannel’s networking backend.
Multus to enable attaching multiple network interfaces to pods for complex topologies.
Weave Net for simple, encrypted networking with automatic peer discovery.

⚖️ Load Balancing & High Availability

MetalLB for simple L2/L3 load balancing.
Kube-VIP for virtual IPs and HA control plane setup.
HAProxy for flexible L4/L7 proxying, ideal for ingress or external load balancing.
Traefik as a dynamic reverse proxy with integrated Let’s Encrypt and metrics.
NGINX Ingress Controller for production-grade HTTP load balancing and Ingress routing.
ExternalDNS – Automatically manages DNS records in external providers (e.g., Cloudflare, Route53) based on Kubernetes service and ingress resources.

🗄️ Storage (CSI)

Synology CSI Driver for volume provisioning on Synology NAS systems.
NFS and SMB CSI drivers for shared filesystem access over network protocols.
S3-compatible storage solutions like JuiceFS for cloud-native object storage access.
iSCSI CSI Driver for block-level persistent volumes and advanced workloads.
Longhorn – Distributed block storage system designed for Kubernetes, ideal for lab/home clusters.
Ceph / Rook – Highly scalable, replicated block/object/filesystem storage using Ceph, managed via Rook operator.
GlusterFS CSI – Shared filesystem storage with high availability and horizontal scaling.
Local Path Provisioner – Simple hostPath-based storage for development/testing clusters.

🌐 Connectivity & Access

Tailscale to create a secure mesh VPN for remote access.
Cloudflare Tunnels for secure, zero-trust ingress.
ngrok for quick, developer-friendly secure tunnels to local services (great for demos and dev, but limited for production use).
Teleport for secure access to Kubernetes clusters, SSH, databases, and apps — with audit logging and SSO.
ZeroTier as an alternative mesh VPN with advanced routing and bridging features.
Nebula – Lightweight mesh VPN built by Slack, great for self-hosted, peer-to-peer networks.
Traditional VPN tunnels for more controlled scenarios.

🌀 GitOps: FluxCD & ArgoCD

Modern Kubernetes management revolves around GitOps. I’ll be diving into:

FluxCD for automated reconciliation and Git-driven deployments.
ArgoCD for application visualization and deployment management.
Jenkins, GitHub Actions, or Semaphore for CI/CD pipelines and integration testing.

These tools will help me build repeatable, automated workflows — making my cluster self-healing, declarative, and production-ready.

🔐 Secrets Management: HashiCorp Vault & OpenBao

Managing secrets securely is non-negotiable in any production environment. I’ll be comparing:

HashiCorp Vault, the industry standard for secret management.
OpenBao, the open-source fork aimed at long-term community support.

Both will play key roles in managing tokens, passwords, and sensitive configs in my cluster.

🔑 SSO & Authentication

Authentik – Lightweight, modern identity provider supporting SSO, MFA, reverse proxy authentication, and LDAP/AD integration.
Keycloak – Enterprise-grade open-source identity and access management with robust OIDC, SAML, and user federation support.
Authelia – Authentication proxy for web apps with 2FA, ideal for use with NGINX or Traefik.
Dex – Kubernetes-native OIDC identity service that integrates with LDAP, GitHub, and others.

🏗️ Infrastructure as Code: Terraform & OpenTofu

To manage resources beyond Kubernetes itself, I’ll be using:

Hashicorp Terraform to automate cloud resources, DNS records, and infrastructure provisioning.
OpenTofu, the community-driven alternative, to see how it compares and integrates into my workflows.

Infrastructure as Code will be a key skill in scaling my homelab into production-grade environments.

📈 Monitoring & Observability

Capturing metrics, logs, and traces is essential for maintaining system health and debugging issues:

Prometheus for collecting metrics and enabling alerting.
Grafana for visual dashboards of your infrastructure and services.
Loki to aggregate and query logs efficiently.
Fluent Bit to collect, parse, and forward logs to multiple backends.
Fluentd – Full-featured log collector and processor, supports filtering, buffering, and routing logs to multiple backends (Splunk, Elasticsearch, etc).
Elasticsearch – Distributed search and analytics engine, often used for log indexing and full-text search.
Kibana – Visualization and dashboard interface for Elasticsearch, used to explore logs, metrics, and APM data in ELK/EFK stacks.
Tempo or Jaeger for distributed tracing.
Splunk for enterprise-grade log and event aggregation, especially for hybrid or security-focused environments.

Observability is more than just logs — it’s about connecting metrics, traces, and logs to gain real insight.

✉️ Messaging & Queuing

Message brokers and event streaming platforms are essential for decoupling services and enabling scalable architectures:

RabbitMQ – Lightweight, reliable message broker supporting AMQP, MQTT, and STOMP. Great for traditional pub/sub or job queues.
NATS – High-performance cloud-native messaging system, ideal for microservices and IoT applications.
Kafka – Distributed streaming platform for large-scale data pipelines and event-driven systems.
Mosquitto – Lightweight MQTT broker, great for home automation and IoT telemetry.
Redis Streams / Valkey Streams – Simple stream processing using Redis/Valkey’s in-memory data store.

💾 Data Backups & Snapshots

Backing up your data is critical for disaster recovery, migrations, and testing:

Velero – Kubernetes-native backup and restore tool for volumes and cluster state.
CloudNativePG Backups – Built-in support for S3/NFS backups and PITR.
Kasten K10 – Commercial-grade Kubernetes backup solution (free for small/home clusters).
Restic – Fast, efficient backup tool that can integrate with Kubernetes via Stash or Velero plugins.
PgBackRest – Reliable PostgreSQL backup and recovery tool, often used with CloudNativePG.
Percona XtraBackup – Hot backup utility for MySQL and MariaDB.
NFS/SMB snapshotting via Synology or Longhorn native snapshot/backup tools.

🛠️ Tools & Utilities

A collection of essential tools I use to interact with, manage, and customize my Kubernetes environment:

K9s – Terminal UI to interact with Kubernetes clusters in real time.
Kustomize – Customize Kubernetes manifests with patches and overlays.
Helm – Kubernetes package manager for deploying complex applications via charts.
Helmfile – Declarative manager for Helm charts, great for GitOps and environments.
Stern – Stream logs from multiple pods with label filtering.
Rancher – Web-based Kubernetes management platform with multi-cluster support.
Lens – GUI-based Kubernetes IDE for visual insights, workload browsing, and troubleshooting.
KURED – KUbernetes REboot Daemon that safely handles node reboots after OS updates.

🚀 Why This Lab Matters

This lab is my personal training ground — a place where I can explore, break, fix, and automate without the risks of impacting production. By the time I’m ready to rebuild my RKE2 cluster, I’ll have tested and validated these tools in realistic scenarios, with hands-on experience guiding every decision.

But this journey isn’t just for me.

I want to share every step — the successes, the failures, the lessons learned — so that others can learn alongside me. Whether you’re new to Kubernetes, experimenting with homelabs, or preparing for your own production deployments, I hope my experiences can help you avoid common pitfalls and accelerate your own learning.

All my configurations, manifests, and code from this lab will be available in my GitHub repository:

👉 EricZarnosky/MyOps

This will be a living resource as I continue to refine, document, and share what I discover.

📢 What’s Next?

Follow along as I document each step — the wins, the failures, and everything I learn while leveling up my Kubernetes skills. From lab experiments to production-ready deployments, this journey is just getting started.

🧑‍💻 Follow My Lab Journey

This lab is more than a personal project — it’s a learning journey I’m sharing with the community.

Mistakes, fixes, lessons — all documented. Let’s learn Kubernetes together.