10GbE Storage Network Implementation Guide¶
Implementation Status¶
Status: ✅ COMPLETE (Deployed December 29, 2025)
Deployed Servers: Apollo, Emerald, Fuji (3 of 6 servers)
Network Summary:
- VLAN: 104 (Data-Sync) on 172.16.104.0/24
- Switch: SW-RACK (Cisco WS-C3850-12X48U)
- Speed: 10 Gbps confirmed on all links
- MTU: 9000 (jumbo frames working)
- Latency: ~0.3-0.4ms between servers
- Configuration: Persistent across reboots
Actual Server Configuration:
| Server | Interface | IP Address | Switch Port | Hardware Details |
|---|---|---|---|---|
| Apollo (unRAID) | eth4 | 172.16.104.30 | Te1/0/37 | Dell Y40PH (Broadcom bnx2x) |
| Emerald (Proxmox) | enp68s0f0/vmbr2 | 172.16.104.34 | Te1/0/41 | Built-in Broadcom BCM57810 10GbE |
| Fuji (Proxmox) | enp68s0f0/vmbr2 | 172.16.104.35 | Te1/0/42 | Dell Y40PH + SFP-10G-T-X transceiver |
Key Discoveries:
- Emerald had built-in Broadcom BCM57810 10GbE ports (no additional hardware needed)
- Apollo's Dell Y40PH card interfaces appear as eth4/eth5 in unRAID (not eth2/eth3)
- Fuji requires SFP-10G-T-X transceiver in Dell Y40PH card
- Switch ports already pre-configured on SW-RACK (VLAN 104, MTU 9000, access mode)
- Proxmox bridges don't need VLAN awareness when switch ports are in access mode
Performance Validation:
- ✅ 10Gbps link speed confirmed on all interfaces
- ✅ Jumbo frames (8972-byte ICMP payload) working across all server pairs
- ✅ Sub-millisecond latency between all servers
- ✅ All configurations persistent across reboots
Remaining Servers: Bishop, Castle, Domino (will follow same pattern when deployed)
Overview¶
This guide covers implementing a dedicated 10GbE network for Ceph storage traffic using the Cisco Catalyst 3850 switch and Dell PowerEdge servers (R720XD and R630).
Current Network Architecture¶
Existing Setup¶
Primary Network (1GbE): - Switch: Cisco SG300-28 (management/client traffic) - VLAN 103: Kubernetes cluster (172.16.103.0/24) - All servers connected via integrated 1GbE NICs
Problem: 1GbE is insufficient for Ceph replication traffic with 56 OSDs
Solution: Dedicated 10GbE storage network on VLAN 104
Hardware Inventory¶
Cisco Catalyst 3850 Switch¶
Your Switch: Cisco WS-C3850-12X48U
Specifications: - 10GbE Ports: 12x 10GBASE-T copper RJ45 (TenGigabitEthernet 1/1/1 through 1/1/12) - 1GbE Ports: 48x RJ45 copper ports (GigabitEthernet 1/0/1 through 1/0/48) - Stacking: StackWise-480 capable - Power: Dual power supply capable - PoE: PoE+ capable on 48 copper ports (optional)
Perfect for Your Use Case: - ✅ 12x 10GBASE-T ports (6 needed for servers, 6 spare for expansion) - ✅ 48 copper ports for management/client traffic - ✅ Layer 3 routing capable - ✅ Full VLAN support - ✅ Supports jumbo frames (MTU 9000)
Important Note: Since these are 10GBASE-T copper ports, and your Dell servers already have network cards with SFP+ module slots, you simply need 10GBASE-T SFP+ transceiver modules that plug into your existing ports. Connect them with Cat6a or Cat7 ethernet cables. No need for full network cards!
Dell Server Existing Network Configuration¶
R720XD (emerald, fuji, apollo): - Already equipped with network cards featuring 2x SFP+ module ports each - Ports are typically labeled on the rear NDC (Network Daughter Card) panel - 1 port will be used for storage, 1 port available for future expansion
R630 (bishop, castle, domino): - Already equipped with network cards featuring 2x SFP+ module ports each - Located on the rear NDC panel (1U form factor) - 1 port will be used for storage, 1 port available for future expansion
Verify Your Network Ports:
# From iDRAC or Proxmox host
lspci | grep -i ethernet
# Shows current network cards
ip link show
# Shows network interfaces - look for interfaces with 2 ports (e.g., ens3f0, ens3f1)
Expected: Each server should show a network card with multiple ports, ready to accept SFP+ modules.
Network Module Requirements¶
10GBASE-T SFP+ Transceiver Modules¶
Since your Dell servers already have network cards with empty 10GbE SFP+ module slots, and your Cisco 3850 has native 10GBASE-T copper ports, you need 10GBASE-T SFP+ transceiver modules to connect them.
What You Need: Small SFP+ modules that plug into your existing server network card ports and provide a 10GBASE-T copper RJ45 connector.
Recommended 10GBASE-T SFP+ Modules:
| Brand | Part Number | Description | Cost | Notes |
|---|---|---|---|---|
| Cisco | SFP-10G-T | Official Cisco 10GBASE-T SFP+ | $150-200 | Best compatibility, expensive |
| TP-Link | TXM431-SR | 10GBASE-T SFP+ module | $40-50 | Good value, widely compatible |
| FS.com | SFP-10G-T | 10GBASE-T SFP+ module | $30-40 | Budget option, good reviews |
| Generic | Various | 10GBASE-T SFP+ copper | $25-35 | Check compatibility reviews |
Recommendation: TP-Link TXM431-SR or FS.com SFP-10G-T for best value
Important Specifications: - Form factor: SFP+ (not SFP, not QSFP) - Connector type: 10GBASE-T (RJ45 copper) - Speed: 10Gbps - Distance: Up to 30m for Cat6a (typical for in-rack use) - Power: ~2.5W per module
Cabling: Cat6a or Cat7 Ethernet
| Cable Type | Length | Cost | Use Case |
|---|---|---|---|
| Cat6a | 3m (10ft) | $8-12 | Within rack (recommended) |
| Cat6a | 5m (16ft) | $10-15 | Cross-rack if needed |
| Cat7 | 3-5m | $12-18 | Better shielding (optional) |
Why Cat6a? - ✅ Supports 10Gbps up to 100 meters - ✅ Standard ethernet connector (RJ45) - ✅ Affordable and widely available - ✅ Works with 10GBASE-T SFP+ modules
Shopping Estimate: - 10GBASE-T SFP+ modules: ~$40 each × 6 = $240 - Cat6a cables 3m: ~$10 each × 6 = $60 - Total for 6 servers: ~$300
Much cheaper than full NICs! Since you already have network cards with SFP+ ports, you only need the small transceiver modules, not entire network cards.
Recommended Configuration¶
For Your Setup (6 Servers in Rack)¶
Recommended: 10GBASE-T SFP+ Modules with Cat6a Cables
Since your Dell servers have existing network cards with SFP+ module slots, and your Cisco 3850 has 10GBASE-T copper ports, the solution is simple:
Per Server (emerald, fuji, apollo, bishop, castle, domino): - 1x 10GBASE-T SFP+ module (plugs into existing network card's empty SFP+ port) - 1x Cat6a cable (3m/10ft for in-rack connection) - Uses 1 of 2 available ports (2nd port available for future expansion) - Connect module to Cisco 3850 10GBASE-T port with Cat6a ethernet cable
Network Design:
Cisco 3850 (12x 10GBASE-T ports)
├── TenGig 1/1/1: emerald (SFP+ module + Cat6a cable)
├── TenGig 1/1/2: fuji (SFP+ module + Cat6a cable)
├── TenGig 1/1/3: apollo (SFP+ module + Cat6a cable)
├── TenGig 1/1/4: bishop (SFP+ module + Cat6a cable)
├── TenGig 1/1/5: castle (SFP+ module + Cat6a cable)
├── TenGig 1/1/6: domino (SFP+ module + Cat6a cable)
├── TenGig 1/1/7-12: Available for expansion
VLAN Configuration: - VLAN 104: Storage network (172.16.104.0/24) - MTU: 9000 (Jumbo frames for storage) - QoS: Storage traffic priority
Shopping List¶
For 6 Servers with Existing Network Cards:
| Item | Quantity | Cost Each | Total |
|---|---|---|---|
| 10GBASE-T SFP+ module (TP-Link TXM431-SR or FS.com) | 6 | $40 | $240 |
| Cat6a cable 3m | 6 | $10 | $60 |
| Estimated Total | $300 |
Optional Add-ons: - Extra Cat6a cables (spares): 2x $10 = $20 - Cable management: $20-40
Grand Total: ~$340-360
Why This is Great: - ✅ Uses your existing network card SFP+ ports (no new NICs needed!) - ✅ Much cheaper than buying full network cards (~$300 vs $540-660) - ✅ Simple installation - just plug modules into empty ports - ✅ Standard Cat6a cabling (no special cables)
Installation Procedure¶
Phase 1: Hardware Installation¶
Per Server (Much Simpler - No Server Opening Required!):
Since your servers already have network cards with SFP+ ports, installation is very simple:
- Locate the existing 10GbE SFP+ ports on the rear of the server
- Usually labeled as "Port 1" and "Port 2" on the network module
- Typically found on the NDC (Network Daughter Card) area
- Insert 10GBASE-T SFP+ module:
- Remove any dust cover from empty SFP+ port
- Align module with port (notch on bottom)
- Firmly push module into port until it clicks
- Module should be flush with port (no gaps)
- Connect Cat6a cable:
- Plug one end into SFP+ module RJ45 port
- Plug other end into Cisco 3850 10GBASE-T port
- Verify link:
Cable Management: - Use velcro cable ties - Keep Cat6a cables organized and away from power cables - Label each cable clearly (e.g., "emerald-10G-storage", "bishop-10G-storage")
Phase 2: Cisco 3850 Switch Configuration¶
Initial Setup:
! Access switch
enable
configure terminal
! Create storage VLAN
vlan 104
name storage-network
exit
! Configure 10GBASE-T ports for storage (ports 1-6)
interface range TenGigabitEthernet1/1/1 - 6
description Storage Network - Ceph Cluster
switchport mode access
switchport access vlan 104
spanning-tree portfast
mtu 9000
no shutdown
exit
! Enable jumbo frames globally
system mtu 9000
! Save configuration
write memory
Verify Port Status:
Phase 3: Proxmox Host Network Configuration¶
Configure 10GbE Interface on Each Host:
Edit /etc/network/interfaces:
# Storage network bridge (for 10GbE)
auto vmbr2
iface vmbr2 inet static
address 172.16.104.10/24 # Unique per host
bridge-ports ens3f0 # Adjust interface name
bridge-stp off
bridge-fd 0
mtu 9000
# emerald: 172.16.104.10
# fuji: 172.16.104.11
# apollo: 172.16.104.12
# bishop: 172.16.104.13
# castle: 172.16.104.14
# domino: 172.16.104.15
Apply Configuration:
# Restart networking (or reboot)
systemctl restart networking
# Verify
ip addr show vmbr2
ping 172.16.104.11 # Test connectivity to another host
Test Bandwidth:
# Install iperf3
apt install iperf3
# On one host (server)
iperf3 -s
# On another host (client)
iperf3 -c 172.16.104.10
# Expected: ~9.4 Gbps with 10GbE
Phase 4: Kubernetes VM Network Configuration¶
Add Storage Network Interface to VMs:
# For each Kubernetes VM (example VM 300)
qm set 300 -net1 virtio,bridge=vmbr2,tag=104
# This creates a second network interface in the VM
# eth0: Management/cluster network (VLAN 103)
# eth1: Storage network (VLAN 104)
Inside Kubernetes VMs (Talos Linux):
Configure via Talos machine config:
machine:
network:
interfaces:
- interface: eth0
addresses:
- 172.16.103.10/24 # Cluster network
routes:
- network: 0.0.0.0/0
gateway: 172.16.103.1
- interface: eth1
addresses:
- 172.16.104.110/24 # Storage network
mtu: 9000
Rook-Ceph Configuration for Dual Networks¶
CephCluster Network Configuration¶
Update CephCluster CR to use dedicated storage network:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
# Network configuration
network:
provider: host
selectors:
# Public network (clients to OSDs)
public: "172.16.103.0/24" # 1GbE cluster network
# Cluster network (OSD replication)
cluster: "172.16.104.0/24" # 10GbE storage network
# ... rest of configuration
Benefits: - Client traffic (pods → Ceph) uses 1GbE cluster network - OSD replication uses 10GbE storage network - Separates concerns and improves performance
Monitoring and Validation¶
Network Interface Checks¶
On Proxmox Hosts:
# Check interface status
ip link show
ethtool ens3f0
# Check speed
ethtool ens3f0 | grep Speed
# Should show: Speed: 10000Mb/s
# Check MTU
ip link show vmbr2 | grep mtu
# Should show: mtu 9000
On Cisco 3850:
show interfaces TenGigabitEthernet1/1/1 status
show interfaces TenGigabitEthernet1/1/1 | include duplex
# Should show: Full-duplex, 10Gb/s
Performance Testing¶
Bandwidth Test (iperf3):
# Expected results:
# - 10GbE: ~9.4 Gbps
# - With jumbo frames: ~9.6 Gbps
# - Latency: <0.1ms within rack
Ceph Performance:
# After Ceph deployment
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd perf
# Monitor OSD network traffic
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool stats
Cost Breakdown¶
For Your Setup with Existing Network Cards¶
| Component | Quantity | Unit Cost | Total |
|---|---|---|---|
| 10GBASE-T SFP+ module (TP-Link/FS.com) | 6 | $40 | $240 |
| Cat6a cable 3m | 6 | $10 | $60 |
| Spares (cables) | 2 | $10 | $20 |
| Cable management | 1 | $30 | $30 |
| Total | $350 |
Benefits of Using Existing Network Cards: - ✅ No need to buy full NICs (saves ~$300-400!) - ✅ No server disassembly required - ✅ Simple plug-and-play installation - ✅ Familiar Cat6a cabling - ✅ Longer cable run capability (up to 100m) - ✅ Uses existing infrastructure
Net Result: Simple, cost-effective upgrade using your existing hardware
Purchasing Recommendations¶
Where to Buy¶
10GBASE-T SFP+ Modules: - Amazon: Search "10GBASE-T SFP+" or "TP-Link TXM431-SR" - TP-Link TXM431-SR: ~$40-50 each (good value) - Generic 10GBASE-T SFP+: ~$30-40 each - FS.com: Direct from manufacturer, good prices and quality - SFP-10G-T: ~$35-40 each - Known for reliable networking gear - eBay: Search "10GBASE-T SFP+ copper" - Generic modules: ~$25-35 each - Check seller ratings and reviews
Important: Make sure the module is: - SFP+ form factor (not SFP or QSFP) - 10GBASE-T (copper RJ45, not fiber) - Compatible with standard Cat6a cables
Cat6a Cables: - Amazon: Cable Matters, Monoprice, StarTech brands (reliable) - Monoprice.com: Direct from manufacturer, good prices - CableMatters.com: High quality, reasonably priced - Length: 3m (10ft) is ideal for in-rack, 5m (16ft) for cross-rack
Example Search Terms¶
For Amazon: - "10GBASE-T SFP+ module" - "TP-Link TXM431-SR" - "SFP+ to RJ45 10 gigabit" - "Cat6a ethernet cable 10ft"
For FS.com: - "SFP-10G-T" - "10GBASE-T SFP+ copper"
For eBay: - "10GBASE-T SFP+ copper module" - "SFP+ 10G RJ45"
Timeline¶
Estimated Time: 1 day
- Hardware procurement: 1-2 weeks (shipping)
- Phase 1 (SFP+ module installation): 30 minutes (all servers - very quick!)
- Phase 2 (Switch config): 1 hour
- Phase 3 (Proxmox config): 2-3 hours
- Phase 4 (VM config): 2-3 hours
- Testing and validation: 2 hours
Total hands-on time: 6-8 hours (much faster with modules vs full NICs!)
Troubleshooting Guide¶
SFP+ Module Not Detected¶
Problem: Network interface not showing link or module not recognized
Solutions:
- Ensure module fully seated in SFP+ port (should click when inserted)
- Remove and reinsert module firmly
- Check module orientation (notch on bottom)
- Verify it's an SFP+ module (not SFP or QSFP)
- Try module in the other SFP+ port on the same card
- Check ethtool output for interface status
- Update server BIOS/iDRAC firmware if module is not recognized
No Link on 3850 Port¶
Problem: Port shows "down" or "notconnect"
Check:
Solutions:
- Verify Cat6a cable fully inserted both ends
- Try different cable (ensure Cat6a or Cat7, not Cat5e/Cat6)
- Try different switch port
- Check if port is enabled (no shutdown)
- Verify cable not damaged (check for kinks or cuts)
Poor Performance (<1 Gbps)¶
Possible Causes: - Speed autonegotiation failed - MTU mismatch - Cable issue (ensure Cat6a or Cat7) - SFP+ module issue
Fix:
# Force 10G on server side
ethtool -s ens3f0 speed 10000 duplex full
# Check for errors
ethtool -S ens3f0 | grep -i error
Jumbo Frames Not Working¶
Check:
Fix: Ensure MTU 9000 on all devices in path: - Server NIC - Proxmox bridge - Switch port - VM interface
Implementation Notes (December 2025)¶
Actual Deployment Steps¶
Phase 1: Hardware Installation
- ✅ Installed Dell Y40PH 10GbE cards in Apollo and Fuji
- Apollo: Card interfaces appeared as eth4/eth5 (Broadcom bnx2x driver)
- Fuji: Card interfaces appeared as enp68s0f0/enp68s0f1
- Installed SFP-10G-T-X transceivers in Port 1 of each card
- ✅ Emerald: Discovered existing Broadcom BCM57810 10GbE ports
- Interfaces: enp68s0f0/enp68s0f1
- No additional hardware needed
Phase 2: Physical Connections
Connected Cat6a cables from each server to SW-RACK:
- Apollo Port 1 → SW-RACK Te1/0/37
- Emerald Port 1 → SW-RACK Te1/0/41
- Fuji Port 1 → SW-RACK Te1/0/42
Phase 3: Switch Configuration
Switch (SW-RACK) was already configured with:
- VLAN 104 ("Data-Sync") created
- Ports Te1/0/37, 41, 42 in access mode on VLAN 104
- System MTU 9000 globally configured
- Spanning-tree portfast enabled
Phase 4: Server Configuration
Emerald (Proxmox):
# /etc/network/interfaces
iface enp68s0f0 inet manual
mtu 9000
auto vmbr2
iface vmbr2 inet static
address 172.16.104.34/24
bridge-ports enp68s0f0
bridge-stp off
bridge-fd 0
mtu 9000
Fuji (Proxmox):
# /etc/network/interfaces
iface enp68s0f0 inet manual
mtu 9000
auto vmbr2
iface vmbr2 inet static
address 172.16.104.35/24
bridge-ports enp68s0f0
bridge-stp off
bridge-fd 0
mtu 9000
Apollo (unRAID):
# /boot/config/network.cfg
SYSNICS="3"
IFNAME[2]="eth4"
DESCRIPTION[2]="10GbE Storage Network"
PROTOCOL[2]="ipv4"
USE_DHCP[2]="no"
IPADDR[2]="172.16.104.30"
NETMASK[2]="255.255.255.0"
MTU[2]="9000"
# Applied with:
/etc/rc.d/rc.inet1
Troubleshooting Encountered¶
Issue 1: Apollo Interface Naming
- Problem: Initially configured eth2, but cable was connected to eth4
- Cause: Dell Y40PH card uses different interface naming on unRAID
- Solution: Identified correct interface with
lspciandethtool -i, reconfigured to use eth4
Issue 2: Apollo Firewall Blocking Traffic
- Problem: Emerald/Fuji could ping each other, but not Apollo
- Cause: iptables INPUT chain didn't allow 172.16.104.0/24 subnet
- Solution: Added firewall rule:
iptables -I INPUT 1 -s 172.16.104.0/24 -j ACCEPT
Issue 3: Duplicate Routes on Apollo
- Problem: ARP worked but ICMP failed
- Cause: Duplicate routes for 172.16.104.0/24 on both eth2 (linkdown) and eth4
- Solution: Removed IP from eth2:
ip addr del 172.16.104.30/24 dev eth2
Issue 4: Jumbo Frames Not Working
- Problem: Regular pings worked, but jumbo frame pings failed
- Cause: Physical interfaces had MTU 1500, only bridges had MTU 9000
- Solution: Set MTU 9000 on physical interfaces and made persistent in
/etc/network/interfaces
Issue 5: VLAN Configuration on Proxmox
- Problem: Initially tried bridge-vlan-aware with bridge-vids 104
- Cause: Switch ports are in access mode (untagged), bridge doesn't need VLAN awareness
- Solution: Removed VLAN awareness from bridge configuration
Lessons Learned¶
- Interface Naming Varies by OS: unRAID uses different naming (ethX) vs Proxmox (enpXsYfZ)
- Check Existing Hardware: Emerald already had 10GbE, saving $90 on hardware
- Access Mode = No VLAN Tags: When switch ports are in access mode, don't configure VLAN awareness on bridges
- MTU Must Match Everywhere: Physical interface, bridge, and switch all need MTU 9000
- Firewall Rules Matter: unRAID needs explicit iptables rules for new subnets
- Use ARP for L2 Testing:
arpinghelped isolate the issue to layer 3
IP Address Scheme¶
Following existing pattern across all subnets:
| Server | VLAN 90 (Mgmt) | VLAN 103 (K8s) | VLAN 104 (Storage) |
|---|---|---|---|
| Apollo | .30 | .30 | .30 |
| Bishop | .31 | .31 | .31 |
| Castle | .32 | .32 | .32 |
| Domino | .33 | .33 | .33 |
| Emerald | .34 | .34 | .34 |
| Fuji | .35 | .35 | .35 |
References¶
- 10GBASE-T SFP+ Modules Guide - Technical specifications and compatibility
- Dell PowerEdge R630 Network Daughter Card Guide
- Cisco 3850 Configuration Guide
- Rook-Ceph Network Configuration
Change Log¶
| Date | Author | Changes |
|---|---|---|
| 2025-10-26 | Claude Code | Initial 10GbE storage network guide created |
| 2025-10-26 | Claude Code | Updated to use 10GBASE-T SFP+ modules instead of full NICs (servers have existing network cards with SFP+ ports) |
| 2025-12-29 | Claude Code | Deployment completed for Apollo, Emerald, and Fuji. Added Implementation Status section with actual configuration, troubleshooting steps, and lessons learned |