For engineers

Reference architecture for a private LLM on trade-secret data.

A buildable design for a self-hosted assistant a small team can use on confidential client material — threat model, three deployment options, model and hardware sizing, the hardening checklist, and a mapping from each control to the legal "reasonable steps" test.

Workload general chat assistant, small team Models open-weight, self-hosted Date 26 May 2026

Jump to the build ← Business assessment

Threat model first

One question decides everything: who can read the data in the clear?

For trade secrets the asset is confidentiality. So the only question that matters is: who or what can technically observe the prompts, the documents, the model's working memory, the disk, the logs, and the outputs — and can you prove the list is short?

Every design decision below shortens that list and makes it auditable. The legal payoff (covered in the business report) is that a short, controlled, documented list is exactly what "reasonable steps to keep it secret" looks like in practice.

Design principle: move the model to the data, never the data to a service. The model is the thing that should travel; the secret stays put.

The leak vectors that actually bite

Prompt / request logs in the model server, the chat UI, or a reverse proxy
Swap, crash dumps, temp files, shell history
Persistent disks / snapshots that outlive the job (rented infra)
Outbound telemetry from inference frameworks, tracing, "observability" or eval libraries
Accidental calls to hosted APIs for embeddings, rerankers, or "helper" features
Multi-tenant exposure: a neighbour or host operator on shared hardware

Deployment options

Three trust boundaries, graded by who's inside them

A · Own hardware (recommended)

On-prem / controlled LAN

Inside the boundary: only you.
No hypervisor, no host operator, no neighbours.
Can be fully air-gapped or LAN/VPN-only.
Residual risk: physical theft, malware, bad backups — all locally controllable.

B · Rented single-tenant

Dedicated / bare-metal, EU

Inside: you + one named provider under a DPA.
No co-tenants; you control the OS and disk.
Provider can physically access hardware — mitigate with encryption + no persistence.
e.g. Hetzner dedicated GPU, AWS dedicated host / bare metal.

C · Managed private endpoint

Bedrock / Azure OpenAI

Inside: you + the AI service, under enterprise terms.
Contractually no training on your data, no input/output storage; KMS, PrivateLink, EU region.
Data leaves your runtime — trust rests on contract + certifications.
Lowest ops; weakest control. Client-approved cases only.

Hard exclusions for entrusted secrets: anonymous GPU marketplaces (unknown host operator, unclear disk lifecycle) and consumer LLM APIs (data may be logged/retained/used for training). Both fail the "reasonable steps" test by default.

The recommended build

Reference architecture — a small-team private assistant

Built for Option A (own hardware); the same stack lifts onto Option B unchanged. Everything runs on one machine, behind a default-deny firewall, reachable only over your LAN or a VPN.

Trust boundary — your machine, no egress for data

Client / usersA handful of team members on the office network or WireGuard VPN — individual accounts, MFA.LAN / VPN only

Chat UISelf-hosted web chat with multi-user accounts & role-based access. Bound to localhost/LAN.Open WebUI

Inference serverServes the model over an OpenAI-compatible API. Request logging disabled.vLLM · or Ollama

Model weightsOpen-weight, permissive licence, downloaded once then kept offline; checksum pinned.30B–70B class

Optional: retrievalLocal embeddings + local vector store for document Q&A. Never a hosted vector DB.bge/e5 · Qdrant/pgvector

Host OSLinux on an encrypted disk (LUKS), no/encrypted swap, hardened SSH, egress firewall.Ubuntu LTS + LUKS

HardwareWorkstation with one 48 GB-class pro GPU; physically secured.RTX 6000 Ada 48 GB

Why vLLM: high-throughput batched serving for several concurrent users, OpenAI-compatible API so Open WebUI plugs straight in. Use Ollama instead if you want the simplest possible single-binary setup and can accept lower concurrency.

Why Open WebUI: a polished self-hosted chat front-end with users, groups, RBAC and document upload — gives the team a ChatGPT-like experience with nothing leaving the box.

Model selection

Pick by licence and VRAM fit, not by hype

The open-weight leaderboard reshuffles monthly, so choose by durable criteria and slot in the current best release at build time.

Permissive licence first. Prefer Apache-2.0 / MIT families (Qwen, Mistral, some Gemma/DeepSeek/GLM releases) to avoid licence entanglement on commercial client work. Check the exact release's terms.
Fit the GPU. Pick the largest model that fits your VRAM at 4-bit with room for context — sizing table at right.
Verify on a live leaderboard. Cross-check the current top instruct models on a neutral index before committing (sources below).
Download once, then air-gap. Pull weights on a connected machine, verify the checksum, move them over, and cut egress.

Good default for a 48 GB GPU: a strong ~30B dense instruct model (or a small MoE) at 4–8-bit gives near-flagship chat quality with comfortable context headroom for a small team.

*Approximate weights-only footprint; add headroom for KV-cache (grows with context length & concurrent users).
Model size	VRAM @ 4-bit*	Fits on
7–9B	~6–8 GB	Any modern GPU
13–14B	~10–12 GB	16–20 GB (e.g. GEX44)
30–34B	~20–24 GB	24–48 GB
70–72B	~40–48 GB	48 GB (RTX 6000 Ada) — tight; 80 GB comfortable
100B+ MoE	varies	Multi-GPU / 80 GB+

Quantisation: 4-bit (e.g. AWQ/GPTQ/GGUF Q4) roughly halves VRAM vs 8-bit with minor quality loss — the standard lever for fitting a bigger, smarter model on one card.

Hardware & hosting

What to buy, or what to rent

Indicative figures, May 2026. The RTX 5090 (32 GB, consumer) is cheaper but not a datacenter/pro part — viable for an owned box on a budget, with less VRAM headroom.
Path	Spec	GPU / VRAM	Cost	Notes
A · Own box	Workstation: 1× pro GPU, 64–128 GB RAM, NVMe (LUKS)	RTX 6000 Ada 48 GB (≈ €6.8k) or RTX PRO 6000 Blackwell 96 GB (≈ €8k)	~€8k–12k once	Runs 30–70B comfortably; full physical control
B · Hetzner GEX44	Dedicated, EU	RTX 4000 SFF Ada 20 GB	€184/mo + €79 setup	Good for ≤14B models / lighter use
B · Hetzner GEX130	Dedicated, EU	RTX 6000 Ada 48 GB	~€838/mo + €79 setup	Matches the owned-box GPU, as OPEX
B · AWS dedicated	Dedicated host / bare-metal GPU, EU region	L4 / L40S / A10G class	Usage-based, higher	Strong isolation primitives; easy to misconfigure
C · Managed	Bedrock / Azure OpenAI, EU region	n/a (service)	Per token	No GPU ops; data leaves runtime

The hardening checklist

The controls that make it "reasonable steps"

Apply all of these for A and B. For C, the network/crypto items become contract + provider-config items (KMS, PrivateLink, region pinning, no-logging).

Network & egress

Firewall default-deny inbound; no public inference endpoint
Access only via LAN or VPN (WireGuard); bind services to localhost/private iface
Default-deny outbound too — allow only OS/package mirrors you actually need
This single rule kills accidental telemetry and hosted-API calls

Cryptography & storage

Full-disk encryption (LUKS) on the model + data volumes
No swap, or encrypted swap only
Disable crash/core dumps; clear temp aggressively
Backups encrypted, access-controlled, or none

Data hygiene & logging

Disable prompt/response logging in the inference server and the chat UI
If any logging is required, keep it short-retention, encrypted, access-controlled
Disable shell history for secret-handling sessions
Ephemeral staging; wipe input/output files after use

Access & audit

SSH keys only — no passwords; restrict source IPs
Individual user accounts; RBAC in the chat UI; MFA on the VPN
Keep an access log: who reached the machine and when
Offboarding checklist: revoke keys/accounts on departure

Supply chain

Audit the inference stack for telemetry (OpenTelemetry/analytics/crash reporting off)
Pin model + container checksums; verify before load
Avoid tools that call out for embeddings, rerankers, tracing or eval
Vendor/pin dependencies; review what each library phones home

Lifecycle & decommission

Separate machine/account/project per sensitive workload
On rented infra: delete volumes & snapshots after use
Crypto-erase or securely wipe disks at end of engagement
Document the wipe — it's evidence of "reasonable steps"

The bridge to the legal test

Each control maps to "reasonable steps to keep it secret"

This table is the hand-off to the business report. It turns engineering into the evidence a court or a client wants: a documented, proportionate set of measures.

Technical control	Maps to the legal requirement…
Self-hosted open-weight model; no external API	No disclosure of the secret outside the trusted circle
LAN/VPN-only, no public endpoint, egress locked	Restricting access; preventing onward transmission
Full-disk encryption, no/encrypted swap	"Appropriate technical measures" to secure the information
Individual accounts, MFA, RBAC, access log	Limiting the number of people with access; demonstrable control
Logging disabled / minimised; temp wiped	Not creating uncontrolled copies of the secret
Single-tenant infra + signed DPA (Option B)	Sufficient guarantees from any third party that is involved
Documented wipe / decommission	Evidence the holder actively maintained secrecy throughout
NDAs + access policy (organisational, not technical)	The contractual half of "reasonable steps" — pair with the above

Operational runbook

From bare machine to locked-down assistant

An outline, not copy-paste commands — adapt to your distro and provider. The ordering matters: bring the data path online after egress is cut.

# 1. Provision & encrypt
install Ubuntu LTS on LUKS-encrypted NVMe; disable swap (or encrypt it)
harden SSH: keys only, no root login, restrict source IPs
ufw default deny incoming; ufw default deny outgoing
allow out only: OS + package mirrors (temporarily, for setup)

# 2. Pull the model (still online), then go dark
download weights on this box or a staging machine
verify sha256 against the published checksum
once installed: remove the temporary outbound allow-rules

# 3. Serve + UI, no logging
run vLLM (or Ollama) bound to 127.0.0.1; disable request logging
run Open WebUI bound to LAN/VPN iface; create per-user accounts + RBAC
confirm: no OpenTelemetry / analytics / crash reporting enabled

# 4. Verify the boundary (the important step)
tcpdump / firewall logs: confirm zero unexpected egress during a real chat
grep the box for prompt text in logs/temp/swap — should find nothing

# 5. Decommission (end of engagement)
stop services; securely wipe data + model volumes
on rented infra: delete the server, volumes, and snapshots
record the wipe (date, method, operator) in the evidence pack

Step 4 is the one people skip. Actually watching the network during a real conversation — and confirming nothing leaves — is what converts "we think it's private" into "we verified it's private." Keep that capture as evidence.

Keep the receipts

The evidence pack

"Reasonable steps" is only worth what you can show. Maintain a short folder, versioned, that a client or a court can be walked through:

Architecture & data-flow diagram (this report's stack) + the egress-verification capture
Access policy + the access log; offboarding checklist
Signed NDAs; and the provider DPA if on Option B/C
Encryption attestation (disk encrypted, swap handled, logging off)
Decommission / wipe records per engagement

← Back to the business assessment

Sources

Open-weight model landscape & leaderboards — Self-hosted LLM leaderboard; HF open-source LLM overview (verify current releases at build time)
Hetzner dedicated GPU servers (GEX44 / GEX130) — hetzner.com GPU matrix; GEX44
GPU pricing (RTX 6000 Ada, RTX PRO 6000 Blackwell, RTX 5090) — NVIDIA RTX 6000 Ada; getdeploying price index
AWS Bedrock data handling (no training, no input/output storage, KMS, PrivateLink) — aws.amazon.com/bedrock/security-privacy
Trade-secret "reasonable steps" standard (legal bridge) — see the business report sources (TRIPS Art. 39, EU 2016/943, UK Regs 2018, PT DL 110/2018)

Engineering guidance, not legal advice. Component names and figures reflect May 2026; the open-weight model field moves monthly, so re-verify the current best release and exact licence before deployment. The control-to-law mapping is a practical aid, not a legal opinion — confirm sufficiency with a qualified lawyer for each client's jurisdiction.