10GbE Storage Network Implementation Guide¶

Implementation Status¶

Status: ✅ COMPLETE (Deployed December 29, 2025)

Deployed Servers: Apollo, Emerald, Fuji (3 of 6 servers)

Network Summary:

VLAN: 104 (Data-Sync) on 172.16.104.0/24
Switch: SW-RACK (Cisco WS-C3850-12X48U)
Speed: 10 Gbps confirmed on all links
MTU: 9000 (jumbo frames working)
Latency: ~0.3-0.4ms between servers
Configuration: Persistent across reboots

Actual Server Configuration:

Server	Interface	IP Address	Switch Port	Hardware Details
Apollo (unRAID)	eth4	172.16.104.30	Te1/0/37	Dell Y40PH (Broadcom bnx2x)
Emerald (Proxmox)	enp68s0f0/vmbr2	172.16.104.34	Te1/0/41	Built-in Broadcom BCM57810 10GbE
Fuji (Proxmox)	enp68s0f0/vmbr2	172.16.104.35	Te1/0/42	Dell Y40PH + SFP-10G-T-X transceiver

Key Discoveries:

Emerald had built-in Broadcom BCM57810 10GbE ports (no additional hardware needed)
Apollo's Dell Y40PH card interfaces appear as eth4/eth5 in unRAID (not eth2/eth3)
Fuji requires SFP-10G-T-X transceiver in Dell Y40PH card
Switch ports already pre-configured on SW-RACK (VLAN 104, MTU 9000, access mode)
Proxmox bridges don't need VLAN awareness when switch ports are in access mode

Performance Validation:

✅ 10Gbps link speed confirmed on all interfaces
✅ Jumbo frames (8972-byte ICMP payload) working across all server pairs
✅ Sub-millisecond latency between all servers
✅ All configurations persistent across reboots

Remaining Servers: Bishop, Castle, Domino (will follow same pattern when deployed)

Overview¶

This guide covers implementing a dedicated 10GbE network for Ceph storage traffic using the Cisco Catalyst 3850 switch and Dell PowerEdge servers (R720XD and R630).

Current Network Architecture¶

Existing Setup¶

Primary Network (1GbE): - Switch: Cisco SG300-28 (management/client traffic) - VLAN 103: Kubernetes cluster (172.16.103.0/24) - All servers connected via integrated 1GbE NICs

Problem: 1GbE is insufficient for Ceph replication traffic with 56 OSDs

Solution: Dedicated 10GbE storage network on VLAN 104

Hardware Inventory¶

Cisco Catalyst 3850 Switch¶

Your Switch: Cisco WS-C3850-12X48U

Specifications: - 10GbE Ports: 12x 10GBASE-T copper RJ45 (TenGigabitEthernet 1/1/1 through 1/1/12) - 1GbE Ports: 48x RJ45 copper ports (GigabitEthernet 1/0/1 through 1/0/48) - Stacking: StackWise-480 capable - Power: Dual power supply capable - PoE: PoE+ capable on 48 copper ports (optional)

Perfect for Your Use Case: - ✅ 12x 10GBASE-T ports (6 needed for servers, 6 spare for expansion) - ✅ 48 copper ports for management/client traffic - ✅ Layer 3 routing capable - ✅ Full VLAN support - ✅ Supports jumbo frames (MTU 9000)

Important Note: Since these are 10GBASE-T copper ports, and your Dell servers already have network cards with SFP+ module slots, you simply need 10GBASE-T SFP+ transceiver modules that plug into your existing ports. Connect them with Cat6a or Cat7 ethernet cables. No need for full network cards!

Dell Server Existing Network Configuration¶

R720XD (emerald, fuji, apollo): - Already equipped with network cards featuring 2x SFP+ module ports each - Ports are typically labeled on the rear NDC (Network Daughter Card) panel - 1 port will be used for storage, 1 port available for future expansion

R630 (bishop, castle, domino): - Already equipped with network cards featuring 2x SFP+ module ports each - Located on the rear NDC panel (1U form factor) - 1 port will be used for storage, 1 port available for future expansion

Verify Your Network Ports:

# From iDRAC or Proxmox host
lspci | grep -i ethernet
# Shows current network cards

ip link show
# Shows network interfaces - look for interfaces with 2 ports (e.g., ens3f0, ens3f1)

Expected: Each server should show a network card with multiple ports, ready to accept SFP+ modules.

Network Module Requirements¶

10GBASE-T SFP+ Transceiver Modules¶

Since your Dell servers already have network cards with empty 10GbE SFP+ module slots, and your Cisco 3850 has native 10GBASE-T copper ports, you need 10GBASE-T SFP+ transceiver modules to connect them.

What You Need: Small SFP+ modules that plug into your existing server network card ports and provide a 10GBASE-T copper RJ45 connector.

Recommended 10GBASE-T SFP+ Modules:

Brand	Part Number	Description	Cost	Notes
Cisco	SFP-10G-T	Official Cisco 10GBASE-T SFP+	$150-200	Best compatibility, expensive
TP-Link	TXM431-SR	10GBASE-T SFP+ module	$40-50	Good value, widely compatible
FS.com	SFP-10G-T	10GBASE-T SFP+ module	$30-40	Budget option, good reviews
Generic	Various	10GBASE-T SFP+ copper	$25-35	Check compatibility reviews

Recommendation: TP-Link TXM431-SR or FS.com SFP-10G-T for best value

Important Specifications: - Form factor: SFP+ (not SFP, not QSFP) - Connector type: 10GBASE-T (RJ45 copper) - Speed: 10Gbps - Distance: Up to 30m for Cat6a (typical for in-rack use) - Power: ~2.5W per module

Cabling: Cat6a or Cat7 Ethernet

Cable Type	Length	Cost	Use Case
Cat6a	3m (10ft)	$8-12	Within rack (recommended)
Cat6a	5m (16ft)	$10-15	Cross-rack if needed
Cat7	3-5m	$12-18	Better shielding (optional)

Why Cat6a? - ✅ Supports 10Gbps up to 100 meters - ✅ Standard ethernet connector (RJ45) - ✅ Affordable and widely available - ✅ Works with 10GBASE-T SFP+ modules

Shopping Estimate: - 10GBASE-T SFP+ modules: ~$40 each × 6 = $240 - Cat6a cables 3m: ~$10 each × 6 = $60 - Total for 6 servers: ~$300

Much cheaper than full NICs! Since you already have network cards with SFP+ ports, you only need the small transceiver modules, not entire network cards.

Recommended Configuration¶

For Your Setup (6 Servers in Rack)¶

Recommended: 10GBASE-T SFP+ Modules with Cat6a Cables

Since your Dell servers have existing network cards with SFP+ module slots, and your Cisco 3850 has 10GBASE-T copper ports, the solution is simple:

Per Server (emerald, fuji, apollo, bishop, castle, domino): - 1x 10GBASE-T SFP+ module (plugs into existing network card's empty SFP+ port) - 1x Cat6a cable (3m/10ft for in-rack connection) - Uses 1 of 2 available ports (2nd port available for future expansion) - Connect module to Cisco 3850 10GBASE-T port with Cat6a ethernet cable

Network Design:

Cisco 3850 (12x 10GBASE-T ports)
├── TenGig 1/1/1: emerald (SFP+ module + Cat6a cable)
├── TenGig 1/1/2: fuji (SFP+ module + Cat6a cable)
├── TenGig 1/1/3: apollo (SFP+ module + Cat6a cable)
├── TenGig 1/1/4: bishop (SFP+ module + Cat6a cable)
├── TenGig 1/1/5: castle (SFP+ module + Cat6a cable)
├── TenGig 1/1/6: domino (SFP+ module + Cat6a cable)
├── TenGig 1/1/7-12: Available for expansion

VLAN Configuration: - VLAN 104: Storage network (172.16.104.0/24) - MTU: 9000 (Jumbo frames for storage) - QoS: Storage traffic priority

Shopping List¶

For 6 Servers with Existing Network Cards:

Item	Quantity	Cost Each	Total
10GBASE-T SFP+ module (TP-Link TXM431-SR or FS.com)	6	$40	$240
Cat6a cable 3m	6	$10	$60
Estimated Total			$300

Optional Add-ons: - Extra Cat6a cables (spares): 2x $10 = $20 - Cable management: $20-40

Grand Total: ~$340-360

Why This is Great: - ✅ Uses your existing network card SFP+ ports (no new NICs needed!) - ✅ Much cheaper than buying full network cards (~$300 vs $540-660) - ✅ Simple installation - just plug modules into empty ports - ✅ Standard Cat6a cabling (no special cables)

Installation Procedure¶

Phase 1: Hardware Installation¶

Per Server (Much Simpler - No Server Opening Required!):

Since your servers already have network cards with SFP+ ports, installation is very simple:

Locate the existing 10GbE SFP+ ports on the rear of the server
Usually labeled as "Port 1" and "Port 2" on the network module
Typically found on the NDC (Network Daughter Card) area
Insert 10GBASE-T SFP+ module:
Remove any dust cover from empty SFP+ port
Align module with port (notch on bottom)
Firmly push module into port until it clicks
Module should be flush with port (no gaps)
Connect Cat6a cable:
Plug one end into SFP+ module RJ45 port
Plug other end into Cisco 3850 10GBASE-T port

Verify link:

# From Proxmox host (no reboot needed!)
ip link show
# Should show existing interfaces with new link status

# Check link speed
ethtool <interface-name>  # e.g., ens3f0
# Should show: Speed: 10000Mb/s

Cable Management: - Use velcro cable ties - Keep Cat6a cables organized and away from power cables - Label each cable clearly (e.g., "emerald-10G-storage", "bishop-10G-storage")

Phase 2: Cisco 3850 Switch Configuration¶

Initial Setup:

! Access switch
enable
configure terminal

! Create storage VLAN
vlan 104
  name storage-network
  exit

! Configure 10GBASE-T ports for storage (ports 1-6)
interface range TenGigabitEthernet1/1/1 - 6
  description Storage Network - Ceph Cluster
  switchport mode access
  switchport access vlan 104
  spanning-tree portfast
  mtu 9000
  no shutdown
  exit

! Enable jumbo frames globally
system mtu 9000

! Save configuration
write memory

Verify Port Status:

show interfaces status
show vlan brief
show interfaces TenGigabitEthernet1/1/1

Phase 3: Proxmox Host Network Configuration¶

Configure 10GbE Interface on Each Host:

Edit /etc/network/interfaces:

# Storage network bridge (for 10GbE)
auto vmbr2
iface vmbr2 inet static
    address 172.16.104.10/24   # Unique per host
    bridge-ports ens3f0         # Adjust interface name
    bridge-stp off
    bridge-fd 0
    mtu 9000

# emerald: 172.16.104.10
# fuji: 172.16.104.11
# apollo: 172.16.104.12
# bishop: 172.16.104.13
# castle: 172.16.104.14
# domino: 172.16.104.15

Apply Configuration:

# Restart networking (or reboot)
systemctl restart networking

# Verify
ip addr show vmbr2
ping 172.16.104.11   # Test connectivity to another host

Test Bandwidth:

# Install iperf3
apt install iperf3

# On one host (server)
iperf3 -s

# On another host (client)
iperf3 -c 172.16.104.10

# Expected: ~9.4 Gbps with 10GbE

Phase 4: Kubernetes VM Network Configuration¶

Add Storage Network Interface to VMs:

# For each Kubernetes VM (example VM 300)
qm set 300 -net1 virtio,bridge=vmbr2,tag=104

# This creates a second network interface in the VM
# eth0: Management/cluster network (VLAN 103)
# eth1: Storage network (VLAN 104)

Inside Kubernetes VMs (Talos Linux):

Configure via Talos machine config:

machine:
  network:
    interfaces:
      - interface: eth0
        addresses:
          - 172.16.103.10/24   # Cluster network
        routes:
          - network: 0.0.0.0/0
            gateway: 172.16.103.1
      - interface: eth1
        addresses:
          - 172.16.104.110/24  # Storage network
        mtu: 9000

Rook-Ceph Configuration for Dual Networks¶

CephCluster Network Configuration¶

Update CephCluster CR to use dedicated storage network:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  # Network configuration
  network:
    provider: host
    selectors:
      # Public network (clients to OSDs)
      public: "172.16.103.0/24"   # 1GbE cluster network

      # Cluster network (OSD replication)
      cluster: "172.16.104.0/24"  # 10GbE storage network

  # ... rest of configuration

Benefits: - Client traffic (pods → Ceph) uses 1GbE cluster network - OSD replication uses 10GbE storage network - Separates concerns and improves performance

Monitoring and Validation¶

Network Interface Checks¶

On Proxmox Hosts:

# Check interface status
ip link show
ethtool ens3f0

# Check speed
ethtool ens3f0 | grep Speed
# Should show: Speed: 10000Mb/s

# Check MTU
ip link show vmbr2 | grep mtu
# Should show: mtu 9000

On Cisco 3850:

show interfaces TenGigabitEthernet1/1/1 status
show interfaces TenGigabitEthernet1/1/1 | include duplex
# Should show: Full-duplex, 10Gb/s

Performance Testing¶

Bandwidth Test (iperf3):

# Expected results:
# - 10GbE: ~9.4 Gbps
# - With jumbo frames: ~9.6 Gbps
# - Latency: <0.1ms within rack

Ceph Performance:

# After Ceph deployment
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd perf

# Monitor OSD network traffic
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool stats

Cost Breakdown¶

For Your Setup with Existing Network Cards¶

Component	Quantity	Unit Cost	Total
10GBASE-T SFP+ module (TP-Link/FS.com)	6	$40	$240
Cat6a cable 3m	6	$10	$60
Spares (cables)	2	$10	$20
Cable management	1	$30	$30
Total			$350

Benefits of Using Existing Network Cards: - ✅ No need to buy full NICs (saves ~$300-400!) - ✅ No server disassembly required - ✅ Simple plug-and-play installation - ✅ Familiar Cat6a cabling - ✅ Longer cable run capability (up to 100m) - ✅ Uses existing infrastructure

Net Result: Simple, cost-effective upgrade using your existing hardware

Purchasing Recommendations¶

Where to Buy¶

10GBASE-T SFP+ Modules: - Amazon: Search "10GBASE-T SFP+" or "TP-Link TXM431-SR" - TP-Link TXM431-SR: ~$40-50 each (good value) - Generic 10GBASE-T SFP+: ~$30-40 each - FS.com: Direct from manufacturer, good prices and quality - SFP-10G-T: ~$35-40 each - Known for reliable networking gear - eBay: Search "10GBASE-T SFP+ copper" - Generic modules: ~$25-35 each - Check seller ratings and reviews

Important: Make sure the module is: - SFP+ form factor (not SFP or QSFP) - 10GBASE-T (copper RJ45, not fiber) - Compatible with standard Cat6a cables

Cat6a Cables: - Amazon: Cable Matters, Monoprice, StarTech brands (reliable) - Monoprice.com: Direct from manufacturer, good prices - CableMatters.com: High quality, reasonably priced - Length: 3m (10ft) is ideal for in-rack, 5m (16ft) for cross-rack

Example Search Terms¶

For Amazon: - "10GBASE-T SFP+ module" - "TP-Link TXM431-SR" - "SFP+ to RJ45 10 gigabit" - "Cat6a ethernet cable 10ft"

For FS.com: - "SFP-10G-T" - "10GBASE-T SFP+ copper"

For eBay: - "10GBASE-T SFP+ copper module" - "SFP+ 10G RJ45"

Timeline¶

Estimated Time: 1 day

Hardware procurement: 1-2 weeks (shipping)
Phase 1 (SFP+ module installation): 30 minutes (all servers - very quick!)
Phase 2 (Switch config): 1 hour
Phase 3 (Proxmox config): 2-3 hours
Phase 4 (VM config): 2-3 hours
Testing and validation: 2 hours

Total hands-on time: 6-8 hours (much faster with modules vs full NICs!)

Troubleshooting Guide¶

SFP+ Module Not Detected¶

Problem: Network interface not showing link or module not recognized

Solutions: - Ensure module fully seated in SFP+ port (should click when inserted) - Remove and reinsert module firmly - Check module orientation (notch on bottom) - Verify it's an SFP+ module (not SFP or QSFP) - Try module in the other SFP+ port on the same card - Check ethtool output for interface status - Update server BIOS/iDRAC firmware if module is not recognized

No Link on 3850 Port¶

Problem: Port shows "down" or "notconnect"

Check:

show interfaces TenGigabitEthernet1/1/1 status
show logging | include TenGigabitEthernet1/1/1

Solutions: - Verify Cat6a cable fully inserted both ends - Try different cable (ensure Cat6a or Cat7, not Cat5e/Cat6) - Try different switch port - Check if port is enabled (no shutdown) - Verify cable not damaged (check for kinks or cuts)

Poor Performance (<1 Gbps)¶

Possible Causes: - Speed autonegotiation failed - MTU mismatch - Cable issue (ensure Cat6a or Cat7) - SFP+ module issue

Fix:

# Force 10G on server side
ethtool -s ens3f0 speed 10000 duplex full

# Check for errors
ethtool -S ens3f0 | grep -i error

Jumbo Frames Not Working¶

Check:

# Ping with large packet
ping -M do -s 8972 172.16.104.11

# If fails, MTU path issue

Fix: Ensure MTU 9000 on all devices in path: - Server NIC - Proxmox bridge - Switch port - VM interface

Implementation Notes (December 2025)¶

Actual Deployment Steps¶

Phase 1: Hardware Installation

✅ Installed Dell Y40PH 10GbE cards in Apollo and Fuji
Apollo: Card interfaces appeared as eth4/eth5 (Broadcom bnx2x driver)
Fuji: Card interfaces appeared as enp68s0f0/enp68s0f1
Installed SFP-10G-T-X transceivers in Port 1 of each card
✅ Emerald: Discovered existing Broadcom BCM57810 10GbE ports
Interfaces: enp68s0f0/enp68s0f1
No additional hardware needed

Phase 2: Physical Connections

Connected Cat6a cables from each server to SW-RACK:

Apollo Port 1 → SW-RACK Te1/0/37
Emerald Port 1 → SW-RACK Te1/0/41
Fuji Port 1 → SW-RACK Te1/0/42

Phase 3: Switch Configuration

Switch (SW-RACK) was already configured with:

VLAN 104 ("Data-Sync") created
Ports Te1/0/37, 41, 42 in access mode on VLAN 104
System MTU 9000 globally configured
Spanning-tree portfast enabled

Phase 4: Server Configuration

Emerald (Proxmox):

# /etc/network/interfaces
iface enp68s0f0 inet manual
    mtu 9000

auto vmbr2
iface vmbr2 inet static
    address 172.16.104.34/24
    bridge-ports enp68s0f0
    bridge-stp off
    bridge-fd 0
    mtu 9000

Fuji (Proxmox):

# /etc/network/interfaces
iface enp68s0f0 inet manual
    mtu 9000

auto vmbr2
iface vmbr2 inet static
    address 172.16.104.35/24
    bridge-ports enp68s0f0
    bridge-stp off
    bridge-fd 0
    mtu 9000

Apollo (unRAID):

# /boot/config/network.cfg
SYSNICS="3"
IFNAME[2]="eth4"
DESCRIPTION[2]="10GbE Storage Network"
PROTOCOL[2]="ipv4"
USE_DHCP[2]="no"
IPADDR[2]="172.16.104.30"
NETMASK[2]="255.255.255.0"
MTU[2]="9000"

# Applied with:
/etc/rc.d/rc.inet1

Troubleshooting Encountered¶

Issue 1: Apollo Interface Naming

Problem: Initially configured eth2, but cable was connected to eth4
Cause: Dell Y40PH card uses different interface naming on unRAID
Solution: Identified correct interface with lspci and ethtool -i, reconfigured to use eth4

Issue 2: Apollo Firewall Blocking Traffic

Problem: Emerald/Fuji could ping each other, but not Apollo
Cause: iptables INPUT chain didn't allow 172.16.104.0/24 subnet
Solution: Added firewall rule: iptables -I INPUT 1 -s 172.16.104.0/24 -j ACCEPT

Issue 3: Duplicate Routes on Apollo

Problem: ARP worked but ICMP failed
Cause: Duplicate routes for 172.16.104.0/24 on both eth2 (linkdown) and eth4
Solution: Removed IP from eth2: ip addr del 172.16.104.30/24 dev eth2

Issue 4: Jumbo Frames Not Working

Problem: Regular pings worked, but jumbo frame pings failed
Cause: Physical interfaces had MTU 1500, only bridges had MTU 9000
Solution: Set MTU 9000 on physical interfaces and made persistent in /etc/network/interfaces

Issue 5: VLAN Configuration on Proxmox

Problem: Initially tried bridge-vlan-aware with bridge-vids 104
Cause: Switch ports are in access mode (untagged), bridge doesn't need VLAN awareness
Solution: Removed VLAN awareness from bridge configuration

Lessons Learned¶

Interface Naming Varies by OS: unRAID uses different naming (ethX) vs Proxmox (enpXsYfZ)
Check Existing Hardware: Emerald already had 10GbE, saving $90 on hardware
Access Mode = No VLAN Tags: When switch ports are in access mode, don't configure VLAN awareness on bridges
MTU Must Match Everywhere: Physical interface, bridge, and switch all need MTU 9000
Firewall Rules Matter: unRAID needs explicit iptables rules for new subnets
Use ARP for L2 Testing: arping helped isolate the issue to layer 3

IP Address Scheme¶

Following existing pattern across all subnets:

Server	VLAN 90 (Mgmt)	VLAN 103 (K8s)	VLAN 104 (Storage)
Apollo	.30	.30	.30
Bishop	.31	.31	.31
Castle	.32	.32	.32
Domino	.33	.33	.33
Emerald	.34	.34	.34
Fuji	.35	.35	.35

References¶

10GBASE-T SFP+ Modules Guide - Technical specifications and compatibility
Dell PowerEdge R630 Network Daughter Card Guide
Cisco 3850 Configuration Guide
Rook-Ceph Network Configuration

Change Log¶

Date	Author	Changes
2025-10-26	Claude Code	Initial 10GbE storage network guide created
2025-10-26	Claude Code	Updated to use 10GBASE-T SFP+ modules instead of full NICs (servers have existing network cards with SFP+ ports)
2025-12-29	Claude Code	Deployment completed for Apollo, Emerald, and Fuji. Added Implementation Status section with actual configuration, troubleshooting steps, and lessons learned