Skip to content

Storage Architecture and Planning

Storage Architecture Planning (emerald)

Current Hardware Available

  • SAS Drives: 16x 600GB 10K RPM enterprise SAS drives
  • SSD Boot Drives: 2x 500GB 2.5" SSDs installed in rear bays
  • Current State: Clean Proxmox host (no existing workloads)

Final Architecture - Native Dell Hardware

Tier 1: SSD Boot Storage (OS)

Hardware: 2x 500GB Samsung 2.5" SSDs in rear bays Configuration: RAID-1 (mirrored)

Purpose

  • Proxmox host OS and system files
  • Boot partition and system storage
  • Templates and ISO storage

Performance: High reliability, fast boot times

Tier 2: SAS RAID-10 (High Performance)

Hardware: 12x 600GB 10K RPM SAS drives (front bays) Configuration: RAID-10 (6 mirror pairs) Usable Capacity: ~3.6TB Name: VM-FAST

Purpose (Tier 2)

  • Kubernetes control plane VMs
  • Production worker node storage
  • Application persistent volumes
  • Container image storage
  • High-performance workloads

Performance: 2.4K-4.8K IOPS, 2-5ms latency, 1.2-2.4 GB/s throughput

Tier 3: SAS RAID-6 (High Capacity)

Hardware: 4x 600GB 10K RPM SAS drives (front bays) Configuration: RAID-6 Usable Capacity: ~1.2TB Name: VM-BULK

Purpose (Tier 3)

  • Development/testing VMs
  • Backup and snapshot storage
  • Log aggregation and archives
  • Non-critical bulk data

Performance: 400-800 IOPS, 5-10ms latency, 600MB-1.2GB/s throughput

Capacity Planning

RAID Configuration Scenarios

Different RAID levels will provide varying capacity and redundancy:

  • Usable capacity: ~2.4TB (50% overhead)
  • Fault tolerance: Can lose 1 drive per mirror pair
  • Performance: Excellent read/write performance
  • Use case: High-performance VM storage

RAID 6 (Maximum capacity with redundancy)

  • Usable capacity: ~3.6TB (25% overhead)
  • Fault tolerance: Can lose any 2 drives
  • Performance: Good read, slower write
  • Use case: Bulk storage, backups

RAID 5 (Balance of capacity and redundancy)

  • Usable capacity: ~4.05TB (17% overhead)
  • Fault tolerance: Can lose 1 drive
  • Performance: Good read, moderate write
  • Use case: General purpose storage

Projected Storage Utilization (New Architecture)

Phase 1: Minimal Cluster

Tier 1 (SSD Boot Storage)

  • Proxmox OS: ~50GB
  • Control plane VM: ~30GB (OS only, data on Tier 2)
  • Available: ~420GB (from 500GB SSDs)

Tier 2 (SAS RAID-10)

  • Control plane data: ~20GB
  • Worker 1: ~100GB
  • Worker 2: ~100GB
  • Container storage: ~100GB
  • Total initial usage: ~320GB
  • Available for expansion: ~1.5TB

Tier 3 (SAS RAID-6)

  • Reserved for backups and development: ~600GB available

Kubernetes-Specific Benefits

  • etcd performance: SSD provides fast boot and system responsiveness
  • Pod scheduling: Fast API server responses
  • Container pulls: Fast image storage on RAID-10
  • Persistent volumes: High-performance application storage

Storage Strategy

Short-term (Phase 1)

  • Utilize existing RAID configuration
  • Monitor storage performance and capacity
  • Plan for VM growth and additional services

Medium-term (Phase 2 - Add fuji)

  • Mirror storage configuration on fuji
  • Implement cross-node storage replication
  • Consider distributed storage solutions

Long-term (Hybrid Cloud)

  • Implement tiered storage strategy
  • Use cloud for cold storage and backups
  • Optimize for performance vs cost

Distributed Storage Considerations

For 2-Node Setup (emerald + fuji)

Challenges

  • No true quorum (need 3+ nodes for automatic failover)
  • Split-brain scenarios possible
  • Manual intervention required for failures

Solutions

  • External witness/tie-breaker
  • Asynchronous replication
  • Manual failover procedures

Storage Options

  • Ceph: Requires 3+ nodes for optimal operation
  • Longhorn: Works with 2 nodes but limited HA
  • External NAS: Centralized storage accessible by both nodes
  • Proxmox replication: Built-in async replication

For 5-Node Setup

Benefits

  • True distributed storage quorum
  • Automatic failover capabilities
  • Better data distribution
  • Higher fault tolerance
  • Ceph RBD: Full distributed block storage
  • Longhorn: Kubernetes-native distributed storage
  • OpenEBS: Multiple storage engines available

Performance Considerations

SAS Drive Characteristics

  • Sequential Read/Write: ~200-300 MB/s per drive
  • Random IOPS: ~200-400 IOPS per drive (4K blocks)
  • Latency: 2-5ms typical
  • Reliability: Enterprise-grade (high MTBF)

RAID Performance Impact

  • RAID 10: Best overall performance
  • RAID 6: Read performance good, write penalty ~6x
  • RAID 5: Read performance good, write penalty ~4x

VM Storage Requirements

Different workloads have varying storage needs:

Database workloads

  • High IOPS requirements
  • Low latency critical
  • Consider RAID 10 for high performance

Web applications

  • Moderate IOPS
  • Sequential read heavy
  • RAID 5/6 acceptable

Backup/archival

  • High capacity priority
  • Lower performance requirements
  • RAID 6 optimal

Monitoring and Maintenance

Key Metrics to Track

  • Capacity utilization: Current usage vs available
  • IOPS performance: Read/write operations per second
  • Latency: Response time for storage operations
  • Drive health: SMART data, error rates
  • RAID status: Array health, rebuild status

Maintenance Tasks

  • Regular SMART monitoring
  • Proactive drive replacement
  • Performance baseline establishment
  • Capacity planning updates
  • Backup validation

Future Storage Expansion

Adding More Capacity

  • Additional drives: If empty bays available
  • Storage nodes: Add dedicated storage servers
  • Cloud integration: Hybrid storage tiers
  • SSD cache: Add fast cache layer if needed

Technology Upgrades

  • Additional SSDs: Higher performance options
  • Larger capacity drives: 1TB+ enterprise SAS
  • All-flash arrays: Maximum performance
  • Hybrid arrays: SSD + HDD tiers

Storage Performance Characteristics

Current RAID-10 Performance (Estimated)

  • Sequential Read: ~1.6-2.4 GB/s (DATA array), ~800MB-1.2GB/s (VM array)
  • Sequential Write: ~800MB-1.2GB/s (DATA array), ~400-600MB/s (VM array)
  • Random IOPS: ~1600-3200 IOPS (DATA array), ~800-1600 IOPS (VM array)
  • Latency: 2-5ms typical for enterprise SAS

Optimal Storage Allocation Strategy

  • DATA Array (2.233TB): Primary VM storage for production workloads
  • VM Array (1.116TB): Development, testing, or secondary workloads
  • OS Array (558GB): Proxmox system, templates, ISOs

Implementation Plan

Phase 1: Hardware Preparation

  1. Install 2x 500GB SSDs in rear bays (completed)
  2. Backup any critical data (emerald is currently clean)

Phase 2: Storage Reconfiguration

  1. Configure SSD RAID-1 in iDRAC/BIOS for boot storage
  2. Reconfigure SAS arrays:
  3. Create 12-drive RAID-10 array
  4. Create 4-drive RAID-6 array
  5. Install Proxmox on SSD RAID-1

Phase 3: Proxmox Storage Pools

# Configure storage pools for different tiers

pvesm add dir ssd-boot --path /mnt/ssd-storage --content images
pvesm add dir sas-fast --path /mnt/sas-raid10 --content images
pvesm add dir sas-bulk --path /mnt/sas-raid6 --content images,backup

Phase 4: Update Infrastructure Code

  1. Update Terraform variables for tiered storage
  2. Create storage class definitions for Kubernetes
  3. Test VM provisioning across all tiers

Implementation Status - End of Session

Completed Steps (emerald)

  1. RAID arrays created and initialized
  2. VM-FAST: 12x 600GB RAID-10 (~3.35TB usable)
  3. VM-BULK: 4x 600GB RAID-6 (~1.2TB usable)
  4. Foreign configuration cleared - Resolved slot 15 hot spare issue
  5. Fast initialization completed - Both arrays ready
  6. PCI card boot issues - BIOS not recognizing 1TB SSD as boot device

Architecture Decision - Native Hardware Approach

Issue: PCI SATA card not bootable in R720XD BIOS Solution: Switch to native Dell hardware approach

Orders Placed

  • 2x Rear drive bay kits (for emerald + fuji)
  • 4x 500GB Samsung 2.5" SSDs

Revised Final Architecture

Per R720XD server (emerald + fuji)

Rear Bays: 2x 500GB SSD RAID-1 (Proxmox OS)
├── Fully supported Dell hardware
├── Hot-swappable drive carriers
└── Guaranteed BIOS boot compatibility

Front Bays: 16x 600GB 10K RPM SAS drives
├── VM-FAST: 12-drive RAID-10 (~3.6TB)
├── VM-BULK: 4-drive RAID-6 (~1.2TB)
└── Total VM storage: ~4.8TB per server

Current Status

emerald

  • ✅ SAS RAID arrays configured and ready
  • ⏸️ Waiting for rear bay hardware to arrive
  • ⏸️ Proxmox installation pending proper boot drive

fuji

  • 📋 Ready for identical configuration once parts arrive

Pending Tasks (When Hardware Arrives)

  1. Install rear bay kits in both emerald + fuji
  2. Configure SSD RAID-1 for Proxmox OS in rear bays
  3. Install Proxmox on native hardware
  4. Configure storage pools for VM-FAST and VM-BULK
  5. Update Terraform for new storage layout
  6. Deploy minimal Kubernetes cluster

Benefits of This Approach

  • Enterprise-grade reliability - All native Dell hardware
  • Easy replication - Identical setup for emerald + fuji
  • Proper support - Hot-swap, monitoring, management
  • Future-proof - Clean foundation for 5-node expansion
  • Maintainable - Standard procedures for drive replacement

Session paused pending arrival of rear bay hardware. SAS storage arrays are configured and ready for Proxmox installation


Updated with optimal 3-tier architecture plan