Skip to content

10GbE Storage Network Implementation Guide

Implementation Status

Status: ✅ COMPLETE (Deployed December 29, 2025)

Deployed Servers: Apollo, Emerald, Fuji (3 of 6 servers)

Network Summary:

  • VLAN: 104 (Data-Sync) on 172.16.104.0/24
  • Switch: SW-RACK (Cisco WS-C3850-12X48U)
  • Speed: 10 Gbps confirmed on all links
  • MTU: 9000 (jumbo frames working)
  • Latency: ~0.3-0.4ms between servers
  • Configuration: Persistent across reboots

Actual Server Configuration:

Server Interface IP Address Switch Port Hardware Details
Apollo (unRAID) eth4 172.16.104.30 Te1/0/37 Dell Y40PH (Broadcom bnx2x)
Emerald (Proxmox) enp68s0f0/vmbr2 172.16.104.34 Te1/0/41 Built-in Broadcom BCM57810 10GbE
Fuji (Proxmox) enp68s0f0/vmbr2 172.16.104.35 Te1/0/42 Dell Y40PH + SFP-10G-T-X transceiver

Key Discoveries:

  • Emerald had built-in Broadcom BCM57810 10GbE ports (no additional hardware needed)
  • Apollo's Dell Y40PH card interfaces appear as eth4/eth5 in unRAID (not eth2/eth3)
  • Fuji requires SFP-10G-T-X transceiver in Dell Y40PH card
  • Switch ports already pre-configured on SW-RACK (VLAN 104, MTU 9000, access mode)
  • Proxmox bridges don't need VLAN awareness when switch ports are in access mode

Performance Validation:

  • ✅ 10Gbps link speed confirmed on all interfaces
  • ✅ Jumbo frames (8972-byte ICMP payload) working across all server pairs
  • ✅ Sub-millisecond latency between all servers
  • ✅ All configurations persistent across reboots

Remaining Servers: Bishop, Castle, Domino (will follow same pattern when deployed)

Overview

This guide covers implementing a dedicated 10GbE network for Ceph storage traffic using the Cisco Catalyst 3850 switch and Dell PowerEdge servers (R720XD and R630).

Current Network Architecture

Existing Setup

Primary Network (1GbE): - Switch: Cisco SG300-28 (management/client traffic) - VLAN 103: Kubernetes cluster (172.16.103.0/24) - All servers connected via integrated 1GbE NICs

Problem: 1GbE is insufficient for Ceph replication traffic with 56 OSDs

Solution: Dedicated 10GbE storage network on VLAN 104

Hardware Inventory

Cisco Catalyst 3850 Switch

Your Switch: Cisco WS-C3850-12X48U

Specifications: - 10GbE Ports: 12x 10GBASE-T copper RJ45 (TenGigabitEthernet 1/1/1 through 1/1/12) - 1GbE Ports: 48x RJ45 copper ports (GigabitEthernet 1/0/1 through 1/0/48) - Stacking: StackWise-480 capable - Power: Dual power supply capable - PoE: PoE+ capable on 48 copper ports (optional)

Perfect for Your Use Case: - ✅ 12x 10GBASE-T ports (6 needed for servers, 6 spare for expansion) - ✅ 48 copper ports for management/client traffic - ✅ Layer 3 routing capable - ✅ Full VLAN support - ✅ Supports jumbo frames (MTU 9000)

Important Note: Since these are 10GBASE-T copper ports, and your Dell servers already have network cards with SFP+ module slots, you simply need 10GBASE-T SFP+ transceiver modules that plug into your existing ports. Connect them with Cat6a or Cat7 ethernet cables. No need for full network cards!

Dell Server Existing Network Configuration

R720XD (emerald, fuji, apollo): - Already equipped with network cards featuring 2x SFP+ module ports each - Ports are typically labeled on the rear NDC (Network Daughter Card) panel - 1 port will be used for storage, 1 port available for future expansion

R630 (bishop, castle, domino): - Already equipped with network cards featuring 2x SFP+ module ports each - Located on the rear NDC panel (1U form factor) - 1 port will be used for storage, 1 port available for future expansion

Verify Your Network Ports:

# From iDRAC or Proxmox host
lspci | grep -i ethernet
# Shows current network cards

ip link show
# Shows network interfaces - look for interfaces with 2 ports (e.g., ens3f0, ens3f1)

Expected: Each server should show a network card with multiple ports, ready to accept SFP+ modules.

Network Module Requirements

10GBASE-T SFP+ Transceiver Modules

Since your Dell servers already have network cards with empty 10GbE SFP+ module slots, and your Cisco 3850 has native 10GBASE-T copper ports, you need 10GBASE-T SFP+ transceiver modules to connect them.

What You Need: Small SFP+ modules that plug into your existing server network card ports and provide a 10GBASE-T copper RJ45 connector.

Recommended 10GBASE-T SFP+ Modules:

Brand Part Number Description Cost Notes
Cisco SFP-10G-T Official Cisco 10GBASE-T SFP+ $150-200 Best compatibility, expensive
TP-Link TXM431-SR 10GBASE-T SFP+ module $40-50 Good value, widely compatible
FS.com SFP-10G-T 10GBASE-T SFP+ module $30-40 Budget option, good reviews
Generic Various 10GBASE-T SFP+ copper $25-35 Check compatibility reviews

Recommendation: TP-Link TXM431-SR or FS.com SFP-10G-T for best value

Important Specifications: - Form factor: SFP+ (not SFP, not QSFP) - Connector type: 10GBASE-T (RJ45 copper) - Speed: 10Gbps - Distance: Up to 30m for Cat6a (typical for in-rack use) - Power: ~2.5W per module

Cabling: Cat6a or Cat7 Ethernet

Cable Type Length Cost Use Case
Cat6a 3m (10ft) $8-12 Within rack (recommended)
Cat6a 5m (16ft) $10-15 Cross-rack if needed
Cat7 3-5m $12-18 Better shielding (optional)

Why Cat6a? - ✅ Supports 10Gbps up to 100 meters - ✅ Standard ethernet connector (RJ45) - ✅ Affordable and widely available - ✅ Works with 10GBASE-T SFP+ modules

Shopping Estimate: - 10GBASE-T SFP+ modules: ~$40 each × 6 = $240 - Cat6a cables 3m: ~$10 each × 6 = $60 - Total for 6 servers: ~$300

Much cheaper than full NICs! Since you already have network cards with SFP+ ports, you only need the small transceiver modules, not entire network cards.

For Your Setup (6 Servers in Rack)

Recommended: 10GBASE-T SFP+ Modules with Cat6a Cables

Since your Dell servers have existing network cards with SFP+ module slots, and your Cisco 3850 has 10GBASE-T copper ports, the solution is simple:

Per Server (emerald, fuji, apollo, bishop, castle, domino): - 1x 10GBASE-T SFP+ module (plugs into existing network card's empty SFP+ port) - 1x Cat6a cable (3m/10ft for in-rack connection) - Uses 1 of 2 available ports (2nd port available for future expansion) - Connect module to Cisco 3850 10GBASE-T port with Cat6a ethernet cable

Network Design:

Cisco 3850 (12x 10GBASE-T ports)
├── TenGig 1/1/1: emerald (SFP+ module + Cat6a cable)
├── TenGig 1/1/2: fuji (SFP+ module + Cat6a cable)
├── TenGig 1/1/3: apollo (SFP+ module + Cat6a cable)
├── TenGig 1/1/4: bishop (SFP+ module + Cat6a cable)
├── TenGig 1/1/5: castle (SFP+ module + Cat6a cable)
├── TenGig 1/1/6: domino (SFP+ module + Cat6a cable)
├── TenGig 1/1/7-12: Available for expansion

VLAN Configuration: - VLAN 104: Storage network (172.16.104.0/24) - MTU: 9000 (Jumbo frames for storage) - QoS: Storage traffic priority

Shopping List

For 6 Servers with Existing Network Cards:

Item Quantity Cost Each Total
10GBASE-T SFP+ module (TP-Link TXM431-SR or FS.com) 6 $40 $240
Cat6a cable 3m 6 $10 $60
Estimated Total $300

Optional Add-ons: - Extra Cat6a cables (spares): 2x $10 = $20 - Cable management: $20-40

Grand Total: ~$340-360

Why This is Great: - ✅ Uses your existing network card SFP+ ports (no new NICs needed!) - ✅ Much cheaper than buying full network cards (~$300 vs $540-660) - ✅ Simple installation - just plug modules into empty ports - ✅ Standard Cat6a cabling (no special cables)

Installation Procedure

Phase 1: Hardware Installation

Per Server (Much Simpler - No Server Opening Required!):

Since your servers already have network cards with SFP+ ports, installation is very simple:

  1. Locate the existing 10GbE SFP+ ports on the rear of the server
  2. Usually labeled as "Port 1" and "Port 2" on the network module
  3. Typically found on the NDC (Network Daughter Card) area
  4. Insert 10GBASE-T SFP+ module:
  5. Remove any dust cover from empty SFP+ port
  6. Align module with port (notch on bottom)
  7. Firmly push module into port until it clicks
  8. Module should be flush with port (no gaps)
  9. Connect Cat6a cable:
  10. Plug one end into SFP+ module RJ45 port
  11. Plug other end into Cisco 3850 10GBASE-T port
  12. Verify link:
    # From Proxmox host (no reboot needed!)
    ip link show
    # Should show existing interfaces with new link status
    
    # Check link speed
    ethtool <interface-name>  # e.g., ens3f0
    # Should show: Speed: 10000Mb/s
    

Cable Management: - Use velcro cable ties - Keep Cat6a cables organized and away from power cables - Label each cable clearly (e.g., "emerald-10G-storage", "bishop-10G-storage")

Phase 2: Cisco 3850 Switch Configuration

Initial Setup:

! Access switch
enable
configure terminal

! Create storage VLAN
vlan 104
  name storage-network
  exit

! Configure 10GBASE-T ports for storage (ports 1-6)
interface range TenGigabitEthernet1/1/1 - 6
  description Storage Network - Ceph Cluster
  switchport mode access
  switchport access vlan 104
  spanning-tree portfast
  mtu 9000
  no shutdown
  exit

! Enable jumbo frames globally
system mtu 9000

! Save configuration
write memory

Verify Port Status:

show interfaces status
show vlan brief
show interfaces TenGigabitEthernet1/1/1

Phase 3: Proxmox Host Network Configuration

Configure 10GbE Interface on Each Host:

Edit /etc/network/interfaces:

# Storage network bridge (for 10GbE)
auto vmbr2
iface vmbr2 inet static
    address 172.16.104.10/24   # Unique per host
    bridge-ports ens3f0         # Adjust interface name
    bridge-stp off
    bridge-fd 0
    mtu 9000

# emerald: 172.16.104.10
# fuji: 172.16.104.11
# apollo: 172.16.104.12
# bishop: 172.16.104.13
# castle: 172.16.104.14
# domino: 172.16.104.15

Apply Configuration:

# Restart networking (or reboot)
systemctl restart networking

# Verify
ip addr show vmbr2
ping 172.16.104.11   # Test connectivity to another host

Test Bandwidth:

# Install iperf3
apt install iperf3

# On one host (server)
iperf3 -s

# On another host (client)
iperf3 -c 172.16.104.10

# Expected: ~9.4 Gbps with 10GbE

Phase 4: Kubernetes VM Network Configuration

Add Storage Network Interface to VMs:

# For each Kubernetes VM (example VM 300)
qm set 300 -net1 virtio,bridge=vmbr2,tag=104

# This creates a second network interface in the VM
# eth0: Management/cluster network (VLAN 103)
# eth1: Storage network (VLAN 104)

Inside Kubernetes VMs (Talos Linux):

Configure via Talos machine config:

machine:
  network:
    interfaces:
      - interface: eth0
        addresses:
          - 172.16.103.10/24   # Cluster network
        routes:
          - network: 0.0.0.0/0
            gateway: 172.16.103.1
      - interface: eth1
        addresses:
          - 172.16.104.110/24  # Storage network
        mtu: 9000

Rook-Ceph Configuration for Dual Networks

CephCluster Network Configuration

Update CephCluster CR to use dedicated storage network:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  # Network configuration
  network:
    provider: host
    selectors:
      # Public network (clients to OSDs)
      public: "172.16.103.0/24"   # 1GbE cluster network

      # Cluster network (OSD replication)
      cluster: "172.16.104.0/24"  # 10GbE storage network

  # ... rest of configuration

Benefits: - Client traffic (pods → Ceph) uses 1GbE cluster network - OSD replication uses 10GbE storage network - Separates concerns and improves performance

Monitoring and Validation

Network Interface Checks

On Proxmox Hosts:

# Check interface status
ip link show
ethtool ens3f0

# Check speed
ethtool ens3f0 | grep Speed
# Should show: Speed: 10000Mb/s

# Check MTU
ip link show vmbr2 | grep mtu
# Should show: mtu 9000

On Cisco 3850:

show interfaces TenGigabitEthernet1/1/1 status
show interfaces TenGigabitEthernet1/1/1 | include duplex
# Should show: Full-duplex, 10Gb/s

Performance Testing

Bandwidth Test (iperf3):

# Expected results:
# - 10GbE: ~9.4 Gbps
# - With jumbo frames: ~9.6 Gbps
# - Latency: <0.1ms within rack

Ceph Performance:

# After Ceph deployment
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd perf

# Monitor OSD network traffic
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool stats

Cost Breakdown

For Your Setup with Existing Network Cards

Component Quantity Unit Cost Total
10GBASE-T SFP+ module (TP-Link/FS.com) 6 $40 $240
Cat6a cable 3m 6 $10 $60
Spares (cables) 2 $10 $20
Cable management 1 $30 $30
Total $350

Benefits of Using Existing Network Cards: - ✅ No need to buy full NICs (saves ~$300-400!) - ✅ No server disassembly required - ✅ Simple plug-and-play installation - ✅ Familiar Cat6a cabling - ✅ Longer cable run capability (up to 100m) - ✅ Uses existing infrastructure

Net Result: Simple, cost-effective upgrade using your existing hardware

Purchasing Recommendations

Where to Buy

10GBASE-T SFP+ Modules: - Amazon: Search "10GBASE-T SFP+" or "TP-Link TXM431-SR" - TP-Link TXM431-SR: ~$40-50 each (good value) - Generic 10GBASE-T SFP+: ~$30-40 each - FS.com: Direct from manufacturer, good prices and quality - SFP-10G-T: ~$35-40 each - Known for reliable networking gear - eBay: Search "10GBASE-T SFP+ copper" - Generic modules: ~$25-35 each - Check seller ratings and reviews

Important: Make sure the module is: - SFP+ form factor (not SFP or QSFP) - 10GBASE-T (copper RJ45, not fiber) - Compatible with standard Cat6a cables

Cat6a Cables: - Amazon: Cable Matters, Monoprice, StarTech brands (reliable) - Monoprice.com: Direct from manufacturer, good prices - CableMatters.com: High quality, reasonably priced - Length: 3m (10ft) is ideal for in-rack, 5m (16ft) for cross-rack

Example Search Terms

For Amazon: - "10GBASE-T SFP+ module" - "TP-Link TXM431-SR" - "SFP+ to RJ45 10 gigabit" - "Cat6a ethernet cable 10ft"

For FS.com: - "SFP-10G-T" - "10GBASE-T SFP+ copper"

For eBay: - "10GBASE-T SFP+ copper module" - "SFP+ 10G RJ45"

Timeline

Estimated Time: 1 day

  • Hardware procurement: 1-2 weeks (shipping)
  • Phase 1 (SFP+ module installation): 30 minutes (all servers - very quick!)
  • Phase 2 (Switch config): 1 hour
  • Phase 3 (Proxmox config): 2-3 hours
  • Phase 4 (VM config): 2-3 hours
  • Testing and validation: 2 hours

Total hands-on time: 6-8 hours (much faster with modules vs full NICs!)

Troubleshooting Guide

SFP+ Module Not Detected

Problem: Network interface not showing link or module not recognized

Solutions: - Ensure module fully seated in SFP+ port (should click when inserted) - Remove and reinsert module firmly - Check module orientation (notch on bottom) - Verify it's an SFP+ module (not SFP or QSFP) - Try module in the other SFP+ port on the same card - Check ethtool output for interface status - Update server BIOS/iDRAC firmware if module is not recognized

Problem: Port shows "down" or "notconnect"

Check:

show interfaces TenGigabitEthernet1/1/1 status
show logging | include TenGigabitEthernet1/1/1

Solutions: - Verify Cat6a cable fully inserted both ends - Try different cable (ensure Cat6a or Cat7, not Cat5e/Cat6) - Try different switch port - Check if port is enabled (no shutdown) - Verify cable not damaged (check for kinks or cuts)

Poor Performance (<1 Gbps)

Possible Causes: - Speed autonegotiation failed - MTU mismatch - Cable issue (ensure Cat6a or Cat7) - SFP+ module issue

Fix:

# Force 10G on server side
ethtool -s ens3f0 speed 10000 duplex full

# Check for errors
ethtool -S ens3f0 | grep -i error

Jumbo Frames Not Working

Check:

# Ping with large packet
ping -M do -s 8972 172.16.104.11

# If fails, MTU path issue

Fix: Ensure MTU 9000 on all devices in path: - Server NIC - Proxmox bridge - Switch port - VM interface

Implementation Notes (December 2025)

Actual Deployment Steps

Phase 1: Hardware Installation

  1. ✅ Installed Dell Y40PH 10GbE cards in Apollo and Fuji
  2. Apollo: Card interfaces appeared as eth4/eth5 (Broadcom bnx2x driver)
  3. Fuji: Card interfaces appeared as enp68s0f0/enp68s0f1
  4. Installed SFP-10G-T-X transceivers in Port 1 of each card
  5. ✅ Emerald: Discovered existing Broadcom BCM57810 10GbE ports
  6. Interfaces: enp68s0f0/enp68s0f1
  7. No additional hardware needed

Phase 2: Physical Connections

Connected Cat6a cables from each server to SW-RACK:

  • Apollo Port 1 → SW-RACK Te1/0/37
  • Emerald Port 1 → SW-RACK Te1/0/41
  • Fuji Port 1 → SW-RACK Te1/0/42

Phase 3: Switch Configuration

Switch (SW-RACK) was already configured with:

  • VLAN 104 ("Data-Sync") created
  • Ports Te1/0/37, 41, 42 in access mode on VLAN 104
  • System MTU 9000 globally configured
  • Spanning-tree portfast enabled

Phase 4: Server Configuration

Emerald (Proxmox):

# /etc/network/interfaces
iface enp68s0f0 inet manual
    mtu 9000

auto vmbr2
iface vmbr2 inet static
    address 172.16.104.34/24
    bridge-ports enp68s0f0
    bridge-stp off
    bridge-fd 0
    mtu 9000

Fuji (Proxmox):

# /etc/network/interfaces
iface enp68s0f0 inet manual
    mtu 9000

auto vmbr2
iface vmbr2 inet static
    address 172.16.104.35/24
    bridge-ports enp68s0f0
    bridge-stp off
    bridge-fd 0
    mtu 9000

Apollo (unRAID):

# /boot/config/network.cfg
SYSNICS="3"
IFNAME[2]="eth4"
DESCRIPTION[2]="10GbE Storage Network"
PROTOCOL[2]="ipv4"
USE_DHCP[2]="no"
IPADDR[2]="172.16.104.30"
NETMASK[2]="255.255.255.0"
MTU[2]="9000"

# Applied with:
/etc/rc.d/rc.inet1

Troubleshooting Encountered

Issue 1: Apollo Interface Naming

  • Problem: Initially configured eth2, but cable was connected to eth4
  • Cause: Dell Y40PH card uses different interface naming on unRAID
  • Solution: Identified correct interface with lspci and ethtool -i, reconfigured to use eth4

Issue 2: Apollo Firewall Blocking Traffic

  • Problem: Emerald/Fuji could ping each other, but not Apollo
  • Cause: iptables INPUT chain didn't allow 172.16.104.0/24 subnet
  • Solution: Added firewall rule: iptables -I INPUT 1 -s 172.16.104.0/24 -j ACCEPT

Issue 3: Duplicate Routes on Apollo

  • Problem: ARP worked but ICMP failed
  • Cause: Duplicate routes for 172.16.104.0/24 on both eth2 (linkdown) and eth4
  • Solution: Removed IP from eth2: ip addr del 172.16.104.30/24 dev eth2

Issue 4: Jumbo Frames Not Working

  • Problem: Regular pings worked, but jumbo frame pings failed
  • Cause: Physical interfaces had MTU 1500, only bridges had MTU 9000
  • Solution: Set MTU 9000 on physical interfaces and made persistent in /etc/network/interfaces

Issue 5: VLAN Configuration on Proxmox

  • Problem: Initially tried bridge-vlan-aware with bridge-vids 104
  • Cause: Switch ports are in access mode (untagged), bridge doesn't need VLAN awareness
  • Solution: Removed VLAN awareness from bridge configuration

Lessons Learned

  1. Interface Naming Varies by OS: unRAID uses different naming (ethX) vs Proxmox (enpXsYfZ)
  2. Check Existing Hardware: Emerald already had 10GbE, saving $90 on hardware
  3. Access Mode = No VLAN Tags: When switch ports are in access mode, don't configure VLAN awareness on bridges
  4. MTU Must Match Everywhere: Physical interface, bridge, and switch all need MTU 9000
  5. Firewall Rules Matter: unRAID needs explicit iptables rules for new subnets
  6. Use ARP for L2 Testing: arping helped isolate the issue to layer 3

IP Address Scheme

Following existing pattern across all subnets:

Server VLAN 90 (Mgmt) VLAN 103 (K8s) VLAN 104 (Storage)
Apollo .30 .30 .30
Bishop .31 .31 .31
Castle .32 .32 .32
Domino .33 .33 .33
Emerald .34 .34 .34
Fuji .35 .35 .35

References

Change Log

Date Author Changes
2025-10-26 Claude Code Initial 10GbE storage network guide created
2025-10-26 Claude Code Updated to use 10GBASE-T SFP+ modules instead of full NICs (servers have existing network cards with SFP+ ports)
2025-12-29 Claude Code Deployment completed for Apollo, Emerald, and Fuji. Added Implementation Status section with actual configuration, troubleshooting steps, and lessons learned