Paperless-NGX Document Management System¶
Deployment Date: December 31, 2025 - January 1, 2026
Current Status: ✅ Operational
Namespace: paperless-ngx
Overview¶
Paperless-NGX is a document management system that scans, indexes, and archives all of your documents with full-text search and automatic OCR processing. The deployment uses a hybrid storage architecture combining Ceph for performance-critical workloads and NFS for bulk document storage.
Architecture¶
Core Components¶
| Component | Version | Purpose | Storage Backend |
|---|---|---|---|
| Paperless-NGX | 2.14.7 | Document management application | Hybrid (Ceph + NFS) |
| PostgreSQL | CloudNativePG | Application database | Ceph RBD |
| Redis | Latest | Task queue and caching | Ephemeral |
| File-Mover Sidecar | Alpine 3.21 | NFS → Ceph file transfer | N/A |
Infrastructure Dependencies¶
- Database: CloudNativePG cluster (10Gi on Ceph)
- Storage: Hybrid architecture (details below)
- Ingress: Traefik IngressRoute with TLS
- DNS: External-DNS for automatic DNS management
- Authentication: Authelia one-factor authentication
- Certificate: Let's Encrypt TLS certificate via cert-manager
Storage Architecture¶
Design Principle¶
Performance-critical data lives on Ceph. Large, write-once documents live on NFS.
The deployment uses a split storage architecture to optimize for both performance and capacity:
Volume Layout¶
| Volume | Size | Storage Class | Access Mode | Purpose |
|---|---|---|---|---|
paperless-ngx-data |
10Gi | ceph-block |
RWO | ML models, search index, internal state |
paperless-ngx-consume |
5Gi | ceph-block |
RWO | Document intake directory (fast, reliable inotify) |
paperless-ngx-incoming |
10Gi | Static PV (NFS) | RWX | User drop zone for new documents |
paperless-ngx-media |
500Gi | Static PV (NFS) | RWX | Final document archive storage |
paperless-ngx-export |
100Gi | Static PV (NFS) | RWX | Bulk export directory |
paperless-postgresql-1 |
10Gi | ceph-block |
RWO | PostgreSQL database |
Static PersistentVolumes (NFS)¶
Unlike dynamic provisioning, the NFS volumes use static PersistentVolumes for human-readable, stable paths:
# Example: paperless-media PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-paperless-media
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
nfs:
server: 172.16.103.30 # Apollo unRAID server
path: /mnt/user/data/paperless/media
Benefits:
* Human-readable paths on unRAID: /mnt/user/data/paperless/{export,incoming,media}/
* Retain reclaim policy protects data from accidental deletion
* Easy direct file access for backups and troubleshooting
* Better alignment with GitOps principles
Document Ingestion Workflow¶
The NFS inotify Problem¶
Network file systems (NFS/SMB) don't support reliable file system event notifications (inotify). This prevents Paperless from automatically detecting new files dropped on NFS shares.
Solution: File-Mover Sidecar¶
The deployment includes a lightweight Alpine-based sidecar container that solves this limitation:
┌─────────────────────────────────────────────────────────┐
│ Paperless Pod │
│ ┌──────────────────────┐ ┌────────────────────────┐ │
│ │ Paperless Container │ │ File-Mover Sidecar │ │
│ │ │ │ (Alpine 3.21) │ │
│ │ Watches /consume via │ │ │ │
│ │ inotify (Ceph) │ │ Every 15 minutes: │ │
│ │ │ │ mv /incoming/* /consume│ │
│ └──────────────────────┘ └────────────────────────┘ │
│ ▲ │ │
│ │ inotify works! │ │
│ │ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ /consume │◀─────────│ /incoming │ │
│ │ (Ceph RBD) │ move │ (NFS) │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
▲
│ Users drop files
┌─────────┴─────────┐
│ Apollo NFS Share │
│ /mnt/user/data/ │
│ paperless/incoming│
└───────────────────┘
Workflow:
- Users drop scanned documents into
/mnt/user/data/paperless/incoming/on Apollo (via SMB/NFS) - File-mover sidecar runs every 15 minutes, checking for new files
- Any files found are moved to
/consume(Ceph) - Paperless inotify detects files immediately on Ceph
- Documents are processed (OCR, indexing) and moved to
/media(NFS)
Implementation:
sidecars:
file-mover:
image: alpine:3.21
command:
- /bin/sh
- -c
- |
echo "File mover sidecar started - checking /incoming every 15 minutes"
while true; do
if [ -n "$(ls -A /incoming 2>/dev/null)" ]; then
echo "$(date): Found files in /incoming, moving to /consume"
mv -v /incoming/* /consume/
fi
sleep 900 # 15 minutes
done
Benefits:
- Combines ease of NFS drop zone with reliable Ceph inotify
- Minimal resource usage (Alpine container)
- Simple, auditable bash script
- 15-minute interval sufficient for document workflow
Resource Allocation¶
Application Resources¶
Rationale:
- OCR processing is CPU-intensive
- Machine learning models require memory
- Document ingestion involves transcoding and analysis
PostgreSQL Resources¶
CloudNativePG cluster configuration:
- 10Gi storage on Ceph RBD
- Automated backups (configuration TBD)
File-Mover Sidecar¶
Minimal resources (not explicitly limited):
- Runs simple bash loop every 15 minutes
- Negligible CPU/memory footprint
Network Configuration¶
Ingress¶
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: paperless-ngx
spec:
entryPoints:
- websecure
routes:
- match: Host(`paperless.skaggsfamily.us`)
kind: Rule
services:
- name: paperless-ngx
port: 8000
middlewares:
- name: authelia-forwardauth
namespace: authelia
tls:
secretName: paperless-tls
DNS: Managed by external-dns via DNSEndpoint CRD
Certificate: Let's Encrypt TLS via cert-manager ClusterIssuer
Authentication: Authelia forwardauth middleware (one-factor)
Deployment Configuration¶
Helm Chart¶
Chart: gabe565/paperless-ngx v0.24.1
Source: Based on bjw-s common library
Environment Variables (Key Settings)¶
env:
# Database
PAPERLESS_DBHOST: paperless-postgresql-rw
PAPERLESS_DBNAME: paperless
# URL
PAPERLESS_URL: https://paperless.skaggsfamily.us
# OCR
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_OCR_MODE: skip # OCR disabled (can be enabled later)
# Consumption
PAPERLESS_CONSUMER_RECURSIVE: "true"
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: "false"
Security Context¶
Operational Procedures¶
Scanning Documents¶
Recommended Scanner Settings (NAPS2):
- Resolution: 300 DPI
- Color: Grayscale
- Duplex: Enabled
- Deskew: Enabled
- Blank page removal: Enabled
- OCR: Disabled (Paperless handles this)
Scan Destination:
- Mac workstation → Mount Apollo SMB/NFS share
- Scan to:
/mnt/user/data/paperless/incoming/ - Files automatically processed within 15 minutes
Monitoring File Transfer¶
Check file-mover sidecar logs:
Expected output:
File mover sidecar started - checking /incoming every 15 minutes
Thu Jan 1 22:38:38 UTC 2026: Found files in /incoming, moving to /consume
'/incoming/document.pdf' -> '/consume/document.pdf'
Checking Consumption Status¶
View Paperless consumer logs:
kubectl logs -n paperless-ngx -l app.kubernetes.io/name=paperless-ngx -c paperless-ngx | grep consumer
Accessing NFS Volumes Directly¶
From Apollo unRAID server:
# Incoming directory
ls -la /mnt/user/data/paperless/incoming/
# Media archive
ls -la /mnt/user/data/paperless/media/
# Export directory
ls -la /mnt/user/data/paperless/export/
Data Protection¶
Critical Data¶
- PostgreSQL Database (10Gi Ceph)
- Application metadata, tags, correspondents, document types
-
Search index mappings
-
Media Directory (500Gi NFS)
- Final archived documents
- Represents all consumed documents
Backup Strategy¶
Required for Full Restore:
- PostgreSQL database backup (via CloudNativePG or volsync)
- NFS media directory backup (planned via volsync + Garage S3)
Not Critical:
/datadirectory (ML models can be re-downloaded)/consumedirectory (temporary staging, should be empty)/exportdirectory (regeneratable exports)
Important Operational Notes¶
Media Directory Ownership¶
⚠️ The /media directory is Paperless-owned
- Do not reorganize or rename files outside Paperless
- Do not edit document content directly
- Prefer fixing metadata in Paperless UI + rules
Storage Class Migration¶
The deployment previously used dynamic NFS provisioning with UUID-based directories. Migration to static PVs completed January 1, 2026:
Old (dynamic): /mnt/user/data/pvc-<uuid>/
New (static): /mnt/user/data/paperless/{export,incoming,media}/
Known Limitations¶
-
OCR Currently Disabled (
PAPERLESS_OCR_MODE: skip) -
Can be enabled by changing to
redoorforce_ocr -
Increases CPU usage significantly
-
15-Minute File Transfer Delay
-
Files dropped in
/incomingare moved every 15 minutes -
Acceptable for document workflow, not instant
-
No Email Ingestion
-
Not configured in initial deployment
- Can be added later
Future Enhancements¶
- Enable OCR processing
- Configure PostgreSQL automated backups
- Implement volsync backup to Garage S3
- Email ingestion configuration
- LLM-assisted auto-tagging
- Barcode-based document splitting
References¶
- GitOps Repository: flux-repo/apps/_bases/paperless-ngx
- Planning Document: Paperless-NGX Deployment & Ingestion Guide
- Helm Chart: gabe565/paperless-ngx
- Official Docs: Paperless-NGX Documentation
Last Updated: January 1, 2026