Storage Architecture and Planning¶
Storage Architecture Planning (emerald)¶
Current Hardware Available¶
- SAS Drives: 16x 600GB 10K RPM enterprise SAS drives
- SSD Boot Drives: 2x 500GB 2.5" SSDs installed in rear bays
- Current State: Clean Proxmox host (no existing workloads)
Final Architecture - Native Dell Hardware¶
Tier 1: SSD Boot Storage (OS)¶
Hardware: 2x 500GB Samsung 2.5" SSDs in rear bays Configuration: RAID-1 (mirrored)
Purpose¶
- Proxmox host OS and system files
- Boot partition and system storage
- Templates and ISO storage
Performance: High reliability, fast boot times
Tier 2: SAS RAID-10 (High Performance)¶
Hardware: 12x 600GB 10K RPM SAS drives (front bays) Configuration: RAID-10 (6 mirror pairs) Usable Capacity: ~3.6TB Name: VM-FAST
Purpose (Tier 2)¶
- Kubernetes control plane VMs
- Production worker node storage
- Application persistent volumes
- Container image storage
- High-performance workloads
Performance: 2.4K-4.8K IOPS, 2-5ms latency, 1.2-2.4 GB/s throughput
Tier 3: SAS RAID-6 (High Capacity)¶
Hardware: 4x 600GB 10K RPM SAS drives (front bays) Configuration: RAID-6 Usable Capacity: ~1.2TB Name: VM-BULK
Purpose (Tier 3)¶
- Development/testing VMs
- Backup and snapshot storage
- Log aggregation and archives
- Non-critical bulk data
Performance: 400-800 IOPS, 5-10ms latency, 600MB-1.2GB/s throughput
Capacity Planning¶
RAID Configuration Scenarios¶
Different RAID levels will provide varying capacity and redundancy:
RAID 10 (Recommended for performance + redundancy)¶
- Usable capacity: ~2.4TB (50% overhead)
- Fault tolerance: Can lose 1 drive per mirror pair
- Performance: Excellent read/write performance
- Use case: High-performance VM storage
RAID 6 (Maximum capacity with redundancy)¶
- Usable capacity: ~3.6TB (25% overhead)
- Fault tolerance: Can lose any 2 drives
- Performance: Good read, slower write
- Use case: Bulk storage, backups
RAID 5 (Balance of capacity and redundancy)¶
- Usable capacity: ~4.05TB (17% overhead)
- Fault tolerance: Can lose 1 drive
- Performance: Good read, moderate write
- Use case: General purpose storage
Projected Storage Utilization (New Architecture)¶
Phase 1: Minimal Cluster¶
Tier 1 (SSD Boot Storage)¶
- Proxmox OS: ~50GB
- Control plane VM: ~30GB (OS only, data on Tier 2)
- Available: ~420GB (from 500GB SSDs)
Tier 2 (SAS RAID-10)¶
- Control plane data: ~20GB
- Worker 1: ~100GB
- Worker 2: ~100GB
- Container storage: ~100GB
- Total initial usage: ~320GB
- Available for expansion: ~1.5TB
Tier 3 (SAS RAID-6)¶
- Reserved for backups and development: ~600GB available
Kubernetes-Specific Benefits¶
- etcd performance: SSD provides fast boot and system responsiveness
- Pod scheduling: Fast API server responses
- Container pulls: Fast image storage on RAID-10
- Persistent volumes: High-performance application storage
Storage Strategy¶
Short-term (Phase 1)¶
- Utilize existing RAID configuration
- Monitor storage performance and capacity
- Plan for VM growth and additional services
Medium-term (Phase 2 - Add fuji)¶
- Mirror storage configuration on fuji
- Implement cross-node storage replication
- Consider distributed storage solutions
Long-term (Hybrid Cloud)¶
- Implement tiered storage strategy
- Use cloud for cold storage and backups
- Optimize for performance vs cost
Distributed Storage Considerations¶
For 2-Node Setup (emerald + fuji)¶
Challenges¶
- No true quorum (need 3+ nodes for automatic failover)
- Split-brain scenarios possible
- Manual intervention required for failures
Solutions¶
- External witness/tie-breaker
- Asynchronous replication
- Manual failover procedures
Storage Options¶
- Ceph: Requires 3+ nodes for optimal operation
- Longhorn: Works with 2 nodes but limited HA
- External NAS: Centralized storage accessible by both nodes
- Proxmox replication: Built-in async replication
For 5-Node Setup¶
Benefits¶
- True distributed storage quorum
- Automatic failover capabilities
- Better data distribution
- Higher fault tolerance
Recommended Solutions¶
- Ceph RBD: Full distributed block storage
- Longhorn: Kubernetes-native distributed storage
- OpenEBS: Multiple storage engines available
Performance Considerations¶
SAS Drive Characteristics¶
- Sequential Read/Write: ~200-300 MB/s per drive
- Random IOPS: ~200-400 IOPS per drive (4K blocks)
- Latency: 2-5ms typical
- Reliability: Enterprise-grade (high MTBF)
RAID Performance Impact¶
- RAID 10: Best overall performance
- RAID 6: Read performance good, write penalty ~6x
- RAID 5: Read performance good, write penalty ~4x
VM Storage Requirements¶
Different workloads have varying storage needs:
Database workloads¶
- High IOPS requirements
- Low latency critical
- Consider RAID 10 for high performance
Web applications¶
- Moderate IOPS
- Sequential read heavy
- RAID 5/6 acceptable
Backup/archival¶
- High capacity priority
- Lower performance requirements
- RAID 6 optimal
Monitoring and Maintenance¶
Key Metrics to Track¶
- Capacity utilization: Current usage vs available
- IOPS performance: Read/write operations per second
- Latency: Response time for storage operations
- Drive health: SMART data, error rates
- RAID status: Array health, rebuild status
Maintenance Tasks¶
- Regular SMART monitoring
- Proactive drive replacement
- Performance baseline establishment
- Capacity planning updates
- Backup validation
Future Storage Expansion¶
Adding More Capacity¶
- Additional drives: If empty bays available
- Storage nodes: Add dedicated storage servers
- Cloud integration: Hybrid storage tiers
- SSD cache: Add fast cache layer if needed
Technology Upgrades¶
- Additional SSDs: Higher performance options
- Larger capacity drives: 1TB+ enterprise SAS
- All-flash arrays: Maximum performance
- Hybrid arrays: SSD + HDD tiers
Storage Performance Characteristics¶
Current RAID-10 Performance (Estimated)¶
- Sequential Read: ~1.6-2.4 GB/s (DATA array), ~800MB-1.2GB/s (VM array)
- Sequential Write: ~800MB-1.2GB/s (DATA array), ~400-600MB/s (VM array)
- Random IOPS: ~1600-3200 IOPS (DATA array), ~800-1600 IOPS (VM array)
- Latency: 2-5ms typical for enterprise SAS
Optimal Storage Allocation Strategy¶
- DATA Array (2.233TB): Primary VM storage for production workloads
- VM Array (1.116TB): Development, testing, or secondary workloads
- OS Array (558GB): Proxmox system, templates, ISOs
Implementation Plan¶
Phase 1: Hardware Preparation¶
- Install 2x 500GB SSDs in rear bays (completed)
- Backup any critical data (emerald is currently clean)
Phase 2: Storage Reconfiguration¶
- Configure SSD RAID-1 in iDRAC/BIOS for boot storage
- Reconfigure SAS arrays:
- Create 12-drive RAID-10 array
- Create 4-drive RAID-6 array
- Install Proxmox on SSD RAID-1
Phase 3: Proxmox Storage Pools¶
# Configure storage pools for different tiers
pvesm add dir ssd-boot --path /mnt/ssd-storage --content images
pvesm add dir sas-fast --path /mnt/sas-raid10 --content images
pvesm add dir sas-bulk --path /mnt/sas-raid6 --content images,backup
Phase 4: Update Infrastructure Code¶
- Update Terraform variables for tiered storage
- Create storage class definitions for Kubernetes
- Test VM provisioning across all tiers
Implementation Status - End of Session¶
Completed Steps (emerald)¶
- ✅ RAID arrays created and initialized
- VM-FAST: 12x 600GB RAID-10 (~3.35TB usable)
- VM-BULK: 4x 600GB RAID-6 (~1.2TB usable)
- ✅ Foreign configuration cleared - Resolved slot 15 hot spare issue
- ✅ Fast initialization completed - Both arrays ready
- ❌ PCI card boot issues - BIOS not recognizing 1TB SSD as boot device
Architecture Decision - Native Hardware Approach¶
Issue: PCI SATA card not bootable in R720XD BIOS Solution: Switch to native Dell hardware approach
Orders Placed¶
- 2x Rear drive bay kits (for emerald + fuji)
- 4x 500GB Samsung 2.5" SSDs
Revised Final Architecture¶
Per R720XD server (emerald + fuji)¶
Rear Bays: 2x 500GB SSD RAID-1 (Proxmox OS)
├── Fully supported Dell hardware
├── Hot-swappable drive carriers
└── Guaranteed BIOS boot compatibility
Front Bays: 16x 600GB 10K RPM SAS drives
├── VM-FAST: 12-drive RAID-10 (~3.6TB)
├── VM-BULK: 4-drive RAID-6 (~1.2TB)
└── Total VM storage: ~4.8TB per server
Current Status¶
emerald¶
- ✅ SAS RAID arrays configured and ready
- ⏸️ Waiting for rear bay hardware to arrive
- ⏸️ Proxmox installation pending proper boot drive
fuji¶
- 📋 Ready for identical configuration once parts arrive
Pending Tasks (When Hardware Arrives)¶
- Install rear bay kits in both emerald + fuji
- Configure SSD RAID-1 for Proxmox OS in rear bays
- Install Proxmox on native hardware
- Configure storage pools for VM-FAST and VM-BULK
- Update Terraform for new storage layout
- Deploy minimal Kubernetes cluster
Benefits of This Approach¶
- Enterprise-grade reliability - All native Dell hardware
- Easy replication - Identical setup for emerald + fuji
- Proper support - Hot-swap, monitoring, management
- Future-proof - Clean foundation for 5-node expansion
- Maintainable - Standard procedures for drive replacement