Files
hyperguild/docs/superpowers/specs/2026-04-20-cd-pipeline-design.md
2026-04-20 20:10:16 +02:00

9.3 KiB

CD Pipeline Design

Date: 2026-04-20
Status: Approved for implementation

Problem statement

The supervisor (and future services on the koala k3s cluster) have no automated deployment path after CI passes. Images are not built, the cluster is updated manually, and there is no audit trail for what is running where.

Goal

After a push to main passes CI, automatically build a container image, push it to the Gitea registry, and update the cluster via GitOps — with a design that scales to many repos and services without per-repo kubeconfig or secret sprawl.

Success criteria

  • Successful main push triggers image build and push to gitea.d-ma.be/<org>/<repo>:<git-sha>
  • Infra repo receives a commit updating the image tag for the deployed service
  • Flux reconciles within 60s of the infra repo commit; pod runs the new image
  • Rollback = one commit to infra repo reverting the tag
  • Secrets (app secrets, registry pull) are SOPS-encrypted in infra repo; no manual kubectl create secret
  • Adding a new service requires only: adding apps/<service>/ to infra repo + cd.yml to the app repo
  • Zero changes to the k3s cluster networking or runner configuration

Constraints

  • Gitea Actions self-hosted runner runs as a systemd host process on koala — not a k8s pod; cannot use cluster DNS
  • k3s uses containerd; no Docker daemon, no nerdctl on koala
  • Flux is already running (core controllers only); image-reflector/image-automation are NOT installed and will NOT be added
  • SOPS + age is the secret management standard; no plaintext Secrets in git
  • All org-level Gitea secrets are shared across repos — minimize the set

Out of scope

  • Multi-cluster promotion (koala only for now; infra repo structure supports adding clusters later)
  • Automated rollback on health check failure (manual rollback via infra repo commit)
  • Build caching beyond BuildKit's local disk cache
  • PR preview environments

Architecture

App repo (supervisor, n8n, etc.)
    ↓  push to main
Gitea Actions — ci.yml (lint + test)
    ↓  passes
Gitea Actions — cd.yml
    ├─ 1. buildctl → BuildKit (unix socket on koala host)
    │         → pushes gitea.d-ma.be/<org>/<repo>:<git-sha>
    ├─ 2. Clone infra repo (SSH deploy key)
    │         → patch apps/<service>/deployment.yaml IMAGE_TAG → <git-sha>
    │         → git commit + push
    └─ done

gitea.d-ma.be/mathias/infra (Flux source)
    ↓  Flux source-controller detects new commit (30s interval)
kustomize-controller
    └─ applies apps/<service>/kustomization.yaml → k3s namespace
         ↓
       pod runs new image (pulls from gitea.d-ma.be with imagePullSecret)

Components

1. BuildKit — systemd service on koala

BuildKit runs as a rootless systemd service on the koala host, identical to the Gitea runner pattern already in use.

  • Socket: unix:///run/user/<uid>/buildkit/buildkitd.sock (rootless) or /run/buildkit/buildkitd.sock (root)
  • Cache: local disk at default BuildKit cache path — persists across builds
  • Access: buildctl --addr unix:///run/buildkit/buildkitd.sock from the runner process (same host, same user)
  • No k3s involvement for builds

2. Gitea Actions — cd.yml

Separate workflow file; triggers on main push after ci.yml succeeds.

name: cd
on:
  push:
    branches: [main]

jobs:
  deploy:
    needs: [ci]           # or workflow_run trigger — see implementation plan
    runs-on: [self-hosted, koala]
    env:
      IMAGE: gitea.d-ma.be/${{ github.repository }}:${{ github.sha }}
    steps:
      - uses: actions/checkout@v4
      - name: Build and push
        run: |
          buildctl --addr unix:///run/buildkit/buildkitd.sock \
            build \
            --frontend dockerfile.v0 \
            --local context=. \
            --local dockerfile=. \
            --output type=image,name=$IMAGE,push=true
        env:
          BUILDKIT_HOST: unix:///run/buildkit/buildkitd.sock
      - name: Update infra repo
        run: |
          git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra
          cd /tmp/infra
          sed -i "s|IMAGE_TAG|${{ github.sha }}|g" apps/${{ env.SERVICE_NAME }}/deployment.yaml
          git config user.email "cd-bot@d-ma.be"
          git config user.name "CD Bot"
          git add apps/${{ env.SERVICE_NAME }}/deployment.yaml
          git commit -m "chore(deploy): ${{ env.SERVICE_NAME }} → ${{ github.sha }}"
          git push
        env:
          GIT_SSH_COMMAND: ssh -i /tmp/infra-deploy-key -o StrictHostKeyChecking=no

SERVICE_NAME is set per-repo (either hardcoded in cd.yml or derived from the repo name).

3. Org-level Gitea secrets

Three secrets, set once, inherited by all repos:

Secret Purpose
BUILDKIT_REGISTRY_AUTH credentials for pushing to gitea.d-ma.be (buildctl --opt or ~/.docker/config.json)
INFRA_DEPLOY_KEY SSH private key with write access to gitea.d-ma.be/mathias/infra
KUBECONFIG_KOALA (optional) kubeconfig for manual kubectl steps if ever needed; scoped ServiceAccount

4. Infra repo structure

gitea.d-ma.be/mathias/infra
├── clusters/
│   └── koala/
│       └── kustomization.yaml    # points at ../../apps/*/
├── apps/
│   ├── supervisor/
│   │   ├── namespace.yaml
│   │   ├── deployment.yaml       # image: gitea.d-ma.be/mathias/supervisor:IMAGE_TAG
│   │   ├── service.yaml
│   │   ├── secrets.enc.yaml      # SOPS-encrypted app secrets (ANTHROPIC_API_KEY, etc.)
│   │   └── kustomization.yaml
│   ├── n8n/
│   │   └── ...
│   └── imagepullsecret/
│       └── secret.enc.yaml       # SOPS-encrypted imagePullSecret for gitea.d-ma.be
└── flux-system/                  # existing Flux bootstrap manifests

Adding a new service = add apps/<service>/ directory. The clusters/koala/kustomization.yaml uses a glob or explicit list.

5. SOPS + age for Flux

Flux decrypts SOPS-encrypted files at apply time using an age key stored as a k8s Secret in the flux-system namespace. Setup:

  1. Generate age keypair: age-keygen
  2. Store private key: kubectl create secret generic sops-age --from-file=age.agekey -n flux-system
  3. Configure Flux Kustomization with decryption.provider: sops
  4. Encrypt secrets before committing: sops --encrypt --age <pubkey> secret.yaml > secret.enc.yaml

App secrets (e.g., ANTHROPIC_API_KEY) and the registry pull secret live as encrypted files in apps/<service>/ and apps/imagepullsecret/ respectively.

6. Image pull secret

Each app namespace needs a kubernetes.io/dockerconfigjson Secret to pull from gitea.d-ma.be. This Secret is SOPS-encrypted in apps/imagepullsecret/ and applied to each app namespace via Kustomize namespace field or a shared Kustomize component.


Data flow: supervisor deploy

  1. Push to supervisor main → CI passes (lint/test/vet)
  2. CD job builds image: gitea.d-ma.be/mathias/supervisor:abc1234
  3. CD job clones infra repo, patches apps/supervisor/deployment.yaml, commits
  4. Flux source-controller detects infra commit within 30s
  5. kustomize-controller applies apps/supervisor/kustomization.yaml
  6. Flux decrypts secrets.enc.yaml → k8s Secret in supervisor namespace
  7. k3s pulls gitea.d-ma.be/mathias/supervisor:abc1234 using imagePullSecret
  8. Pod starts with new image; previous pod terminates

Rollback: git revert <tag-commit> in infra repo → Flux reconciles → old image deployed.


Error handling

Scenario Behaviour
CI fails cd.yml does not run (needs: ci gate)
BuildKit unreachable buildctl exits non-zero → workflow fails; infra repo untouched
Image push fails Workflow fails; infra repo untouched; cluster unchanged
Infra repo push conflict Retry once with rebase; fail and alert if still conflicting
Flux reconcile error Notification-controller fires alert; pods stay on previous image
Pod image pull fails ImagePullBackOff; Flux reports degraded Kustomization
SOPS decrypt fails Kustomization fails; Flux reports error; no partial apply

Testing approach

  1. BuildKit smoke testbuildctl build with a trivial one-line Dockerfile; verify image appears in Gitea registry
  2. cd.yml dry run — trigger manually on a test branch; verify infra repo commit contains correct sha
  3. Flux reconcile test — push infra commit; verify flux get kustomizations shows Ready and pod runs new image sha
  4. Pull secret test — delete pod, verify it restarts and pulls from Gitea registry without ImagePullBackOff
  5. SOPS round-trip test — encrypt a dummy secret, push to infra repo, verify Flux decrypts and kubectl get secret shows correct data

Risks

Risk Mitigation
BuildKit socket path varies by user/rootless mode Confirm path during setup; hardcode in cd.yml
Infra repo concurrent pushes (multiple repos deploying simultaneously) Git rebase retry handles this; unlikely at current scale
age private key lost Back up to SOPS-accessible location; document recovery procedure
Registry storage fills up Set Gitea registry tag retention policy (keep last 20 per repo)
Gitea deploy key compromised Rotate via Gitea UI; single key for infra repo only