Skip to content

Backup Power System Design

Note

The info below is here just to record some thoughts around backup power to the homelab rack. Some or all of it may be junk and we may or may not decide to implement any of it.

1) Quick power budget (to size the UPS)

These are conservative "typical" draw numbers (idle-to-medium load), not nameplate. Adjust with your iDRAC/PDUs later.

  • 2× R720xd SFF: ~300–400 W each → ~700 W
  • 1× R720xd LFF: ~300–400 W → ~350 W
  • 3× R630: ~150–200 W each → ~540 W
  • 2× MD1220 shelves (24× 2.5/3.5" SAS each): ~200–300 W each → ~500 W
  • Networking (firewall, core switch, ToR, AP controller, etc.): ~100–200 W → ~150 W

Estimated continuous load: ~2,240 W Add 30% headroom for spikes + growth → ~2.9 kW target UPS output.

Energy needed for 10 minutes: 2.9 kW × (10/60) h ≈ 0.48 kWh stored energy (after inverter losses). In practice, you'll choose a UPS by kVA and then add external battery modules to hit ~10 min at your load (vendors spec runtime charts).

2) Topology choices (pick 1)

A) Single, central 240 V double-conversion UPS (simple, robust)

  • Size: 5–8 kVA online UPS (power factor ≈0.9–1.0). A 5 kVA unit gives ~4.5–5 kW usable; plenty for ~2.9 kW plus surge.
  • Runtime: Add 1–3 external battery modules to get ~10 min at ~3 kW (check vendor runtime tables).
  • Wiring:
  • Input: L6-30P (fits your 30A/240 V branch).
  • Output: 240 V to two rack PDUs (L6-30R → (2)× 208/240 V PDUs with C13/C19 outlets).
  • Dell PSUs auto-range (100–240 V), so 208–240 V PDUs are fine.
  • Pros: Easiest to manage; best brownout immunity; one battery stack.
  • Cons: No PSU-level redundancy unless your servers have dual PSUs tied to different sources (see Option B).

B) Two independent UPS banks (redundancy across dual PSUs)

  • Size: (2)× 3 kVA online UPS, each feeding its own PDU.
  • Cabling: For nodes with dual PSUs, A PSU → PDU-A (UPS-A) and B PSU → PDU-B (UPS-B). If either UPS fails or is in bypass/maintenance, systems keep running.
  • Runtime: Each UPS sized so either can power at least ~60–70% of its connected equipment for ~10 min. (You'll distribute load so that the sum still leaves headroom.)
  • Pros: Real redundancy + maintenance flexibility.
  • Cons: More gear and runtime math to balance loads.

C) Tiered approach (cost-effective + graceful shedding)

  • UPS-Core (1.5–2.2 kVA): Networking, firewall, storage control plane, one small hypervisor host for "core services." Aim for 20–30 min here so you always keep the control plane/network up.
  • UPS-Compute (3–5 kVA): R720/R630s and disk shelves. Aim for 10 min. Configure automated load shedding (non-critical nodes off first).
  • Pros: Saves money, keeps core stable longer, easier battery budgets.
  • Cons: Not fully redundant unless you double up PSUs as in B.

Recommendation

If budget/space allow, Option B is the sweet spot: (2)× 3 kVA online UPS with dual PDUs and dual-corded servers split across them. If you want simpler/cheaper, go Option C.

3) Runtime & battery chemistry

  • Online (double-conversion) UPS handles brownouts and dirty power flawlessly; transfer time ≈ 0 ms. Line-interactive can work, but online is better for frequent sags.
  • Lithium-ion UPS: lighter, better cycle life, higher cost. If you've got frequent events or want lower maintenance, Li-ion pays off.
  • VRLA/AGM: cheaper upfront; expect battery replacement every ~3–5 years.

4) Power distribution & startup behavior

  • Rack PDUs: Use metered or switched 208/240 V PDUs (C13/C19). Switched PDUs let you remotely shed loads.
  • Staggered startup: Configure BIOS/DRAC "AC Power Recovery" with delays and use PDU outlet sequencing so everything doesn't inrush at once.
  • Disk shelves: Bring MD1220s up before hosts, or pause host boot until storage is ready.
  • Labeling: Color-code A/B power feeds and document which PSU goes where.

5) Automation: clean shutdowns + brownout tolerance

Use NUT (Network UPS Tools) or the vendor's network card + software:

  • UPS <—> NUT server: USB/serial to one small always-on box (or UPS network management card).
  • NUT clients: Install on Proxmox nodes and key VMs (Ceph MGR/MON, DNS/DHCP, etc.).
  • Policy:
  • onbattery_delay: 10–15 s (ignore <10 s blips).
  • Start shedding non-critical hosts/services at ~7–8 min remaining.
  • Initiate Ceph safe mode (set noout, throttle backfill/recovery) well before shutdown.
  • At ~4–5 min remaining, stop noisy services, evacuate VMs (if time permits).
  • At ~3 min remaining, cleanly shut down hypervisors and MD1220s (if supported) in an order that preserves data integrity.
  • Keep networking + the NUT server up the longest (Option C's "UPS-Core" makes this easy).

6) Brownouts and <30 s cuts

A double-conversion UPS rides through these by design. For frequent micro-cuts:

  • Low-battery start threshold: Keep it conservative so repeated micro-events don't prematurely age batteries.
  • High-transfer voltage window: With online UPS you're already isolated, but set sensible input limits so you don't drain batteries on minor sags.

7) Future-proofing: generator or solar-battery integration

  • Add a L14-30 / CS6365 inlet + interlock or a small manual transfer switch upstream of the UPS to connect a portable inverter generator later. The UPS then becomes your ride-through and power conditioner.
  • If you ever add house batteries/solar, feed the UPS from the backed-up subpanel so you inherit longer runtime automatically.

8) Practical parts checklist (vendor-agnostic)

  • UPS(es):
  • Option A: 5–8 kVA online UPS (L6-30P input), + external battery modules for 10 min @ ~3 kW.
  • Option B: (2)× 3 kVA online UPS, each with enough battery for ≥10 min at its expected share.
  • Option C: 1.5–2.2 kVA (Core) + 3–5 kVA (Compute).
  • (2) Rack PDUs per UPS (208/240 V, C13/C19, metered/switched).
  • Cables: L6-30 and C13/C19 cords sized properly.
  • NUT server or UPS network cards (SNMP).
  • Labels and a laminated power map in the rack.
  • Environmental: Temp sensors; keep batteries <25 °C for longevity.

9) Commissioning checklist

  1. Measure real load: Plug temporarily into a metered PDU to confirm watts/amps per device. Adjust budget.
  2. Map A/B feeds and tag every PSU.
  3. Program BIOS (power-on after AC loss, delays).
  4. Configure NUT (or vendor agent): test "on battery," "low battery," and full shutdown drill.
  5. Stagger boot order and test full power-return sequence.
  6. Document: single-page runbook taped inside the rack.