Skip to content

Operational Runbooks

This section contains step-by-step procedures for common operational tasks and maintenance activities.

FluxCD Operations

Upgrading Flux CRDs and HelmRelease API Version

Procedures for updating FluxCD Custom Resource Definitions and migrating HelmRelease resources to newer API versions.

Git-Pinned Flux Upgrades

Instructions for upgrading FluxCD when using git-pinned component versions.

Infrastructure Maintenance

Proxmox Root Password Reset

Procedure for resetting the root password on Proxmox hosts via GRUB when the existing password is unknown.

Dell R630 BIOS & iDRAC Update

Instructions for updating BIOS, iDRAC, and firmware on Dell PowerEdge R630 servers using the Dell Support Live Image and SUU ISO.

Kubernetes Cluster Operations

  • Node maintenance and updates
  • Certificate rotation
  • Backup and restore procedures
  • Disaster recovery testing

Storage Operations

  • Persistent volume management
  • Ceph cluster maintenance (planned)
  • VolSync backup operations
  • Storage capacity planning

Network Operations

  • VLAN configuration changes
  • Switch maintenance procedures
  • Firewall rule updates
  • DNS configuration changes

Monitoring and Alerting

Prometheus Operations

  • Alerting rule updates
  • Metric retention management
  • Prometheus configuration updates
  • Grafana dashboard management

Log Management

  • Log retention policies
  • Log aggregation setup
  • Alert investigation procedures
  • Performance troubleshooting

Security Operations

Certificate Management

  • TLS certificate renewal
  • Certificate authority operations
  • Secrets rotation procedures
  • Access control updates

Backup Operations

  • Backup verification procedures
  • Disaster recovery testing
  • Off-site backup management
  • Data retention compliance

Emergency Procedures

Incident Response

  • Service outage procedures
  • Data loss recovery
  • Security incident response
  • Communication protocols

Disaster Recovery

  • Complete cluster rebuild
  • Data center failover
  • Cloud failover procedures
  • Business continuity planning

Runbooks are living documents and should be updated as procedures change or improve