Deployment
I3K RAG Enterprise is typically installed with a one-command script on a single Linux host. For larger or more constrained environments, multi-server and air-gapped topologies are also supported. The same FastAPI backend, Qdrant vector store and Ollama runtime are used in every topology — what changes is only how the components are wired together.
Topologies
Single-host (recommended for most deployments)
All components — FastAPI backend, React + Vite frontend, Qdrant, Ollama, SQLite user DB, Apache Tika and Tesseract — run on the same server. Installation takes a single command and roughly one hour, most of which is spent pulling the Qwen3:14b-q4_K_M and Mistral 7B Q4 models plus the BAAI/bge-m3 embedding model (29 languages).
Single-host is suitable for teams up to a few hundred users and datasets up to tens of thousands of documents. The stack is production-ready up to 10,000+ documents on commodity hardware.
Air-gapped
For isolated networks with no outbound connectivity — typical of Defense, healthcare and critical infrastructure. The installer supports offline installation from pre-downloaded packages and model bundles. Transfer the bundle through your approved channel, run the installer, and the system comes up without ever calling out.
Multi-server (advanced)
For larger workloads, Qdrant and Ollama can be moved onto dedicated GPU nodes, separated from the FastAPI backend. Configuration is manual and applied after the standard install: point the backend at the remote Qdrant endpoint and the remote Ollama endpoint, then restart the API service.
Hardware requirements
| Resource | Requirement |
|---|---|
| GPU | NVIDIA CUDA with 8–16 GB VRAM (recommended), AMD ROCm, or CPU-only (reduced performance) |
| RAM | 16 GB minimum, 32 GB recommended |
| Storage | 50 GB minimum, scales with the dataset |
| OS | Ubuntu 20.04+ (22.04 recommended) |
| Network | 80+ Mbit/s recommended for initial setup (model download) |
CPU-only is supported and useful for evaluation, but expect noticeably higher latency on generation. For production workloads, plan around a CUDA GPU with at least 12 GB VRAM.
Backup & restore
Backup is built in and uses rclone under the hood, so every one of the 70+ providers rclone supports is available out of the box.
- Object storage: S3, MinIO, Backblaze B2, Wasabi and S3-compatible endpoints.
- Consumer cloud: Google Drive, OneDrive, Dropbox, Mega, pCloud.
- Self-hosted: WebDAV / Nextcloud, ownCloud.
- Traditional: FTP, SFTP.
You can:
- Schedule backups via
cron(daily, hourly, custom cadence). - Configure a retention policy (keep last N daily, weekly, monthly snapshots).
- Run zero-downtime backups — Qdrant snapshots and the SQLite user DB are captured consistently without interrupting the API.
Restore is the inverse operation against the same remote: pull the bundle, run the restore command, and the instance comes back at the chosen point in time.
Reverse proxy & TLS
The FastAPI backend listens on localhost:8000 and the React frontend on localhost:3000. Both should sit behind a reverse proxy that terminates TLS — Caddy, nginx or Traefik all work. A minimal Caddy configuration:
rag.example.com {
reverse_proxy /api/* localhost:8000
reverse_proxy localhost:3000
}Caddy will provision and renew a Let's Encrypt certificate automatically. For nginx or Traefik, mirror the same routing: /api/* to port 8000, everything else to port 3000.
Production checklist
Before going to production, work through:
- Frontend and backend behind a reverse proxy with TLS 1.3.
- Automatic backups to an external destination configured via rclone.
- Disk quotas and log rotation in place for the data and log directories.
- Monitoring wired up — Prometheus + node_exporter is a fine baseline; any solution you already run will do.
- Update plan documented — how upstream repo updates are pulled and applied.
- Disaster recovery plan tested end-to-end on a non-production dataset.
- JWT secret rotated away from the default value.
- Admin user provisioned with a strong password.
- IP-based access restrictions configured if your threat model requires them.
I3K RAG Enterprise is distributed under AGPL-3.0. The source repository — github.com/I3K-IT/RAG-Enterprise — is the canonical reference for installer scripts, configuration knobs and supported upgrade paths.