Yornik Heyl
Five years keeping production calm at Mendix (Siemens). Incident response, rollout design, and observability that means something at 03:00. I build platforms that are safe to change, not just safe to leave alone.
Overview
Practice
I make production safer to change. Incident response, rollout design, observability that actually tells you something at 03:00, and alerting that doesn't go off for nothing. Systems should be explainable under pressure, not just work when everything is fine.
The way I work is declarative. Reproducible rollouts, clear ownership, and security folded into the platform rather than bolted on afterwards. AI tooling where it pays — automating the boring parts and catching mistakes early — not as a buzzword.
Runbook · what I operate
- talos linux v1.12 4-node cluster · 3 control-plane / 1 worker · intel n150
- argo cd v3 app-of-apps · 28 applications · sops + ksops
- prometheus / grafana / loki stable alloy log shipping · 48 h retention · custom dashboards
- traefik v3 tls 1.3 only · ecdsa p-384 · acme via let's encrypt
- external-dns v0.21 cloudflare · dane / tlsa cron jobs · sops-encrypted creds
- opentofu v1.10 hetzner edge nodes · haproxy ingress to tailscale · iac end-to-end
All declarative. Every change is a pull request: required review, yaml lint,
helm template, and kubeconform validation in CI before merge.
Signals
- TLS policy tls 1.3 only · hsts preload
- Certificates ecdsa p-384 · le · 90-day rotation
- Email auth spf · dkim · dmarc · dane / tlsa
- IPv6 end-to-end
- Audit ncsc nl — "good" on every category
Contact
Currently at Mendix (Siemens). Open to conversations about platform & reliability work in the Netherlands and remote across Europe.