· 5 min read
Whose Premise Is It Anyway?
Understanding the three tiers of on-prem AI deployment — from managed cloud services to bare metal — and why the term 'on-prem' does not always mean the same thing.
The phrase “on-prem” used to mean one thing: your hardware, your building, your problem. Today it has been stretched to cover at least three fundamentally different deployment models for running language models — each with its own cost structure, security posture, and exposure surface. When you hear “on-prem,” one obvious question should be: whose premise?
All three models described below keep your data off the public internet and away from shared multi-tenant inference endpoints. All three can legitimately be called on-prem. But they are not the same, and treating them as interchangeable is where organizations get into trouble.
Tier 1: Managed Cloud Services
Examples: AWS Bedrock, Azure OpenAI Service, Google Vertex AI
This is the most packaged option. The cloud provider hosts the model, manages the infrastructure, and exposes an API. Your data stays within your cloud tenancy and is not used for training. No one outside your account is specifically allowed access to your prompts or completions.
Cost: Highest per-token cost. You pay for convenience, SLAs, and the provider’s operational overhead.
Security posture: You trust the cloud provider’s isolation guarantees. Your data transits their network fabric, sits in their memory during inference, and is governed by their access controls — not yours. You have no visibility into the host operating system, the GPU memory lifecycle, or whether a hypervisor vulnerability could expose your workload. The provider’s employees operate the infrastructure. You are trusting their policies, their patch cadence, and their insider threat program.
Exposure surface:
- Cloud provider’s internal network and staff
- Hypervisor and multi-tenant isolation boundaries
- API gateway and authentication layer
- Provider’s logging and telemetry pipeline
- Regulatory jurisdiction of the provider’s data centers
For many organizations this is the right trade. The provider’s security team is larger than yours, their compliance certifications are current, and the operational burden is near zero. But “on-prem” this is not — it is on someone else’s premise, with contractual guarantees substituting for physical control.
Tier 2: Self-Managed Cloud VM
Examples: Running llama.cpp, vLLM, or Ollama on an EC2 instance, Azure VM, or GCE instance with attached GPU
Here you provision your own virtual machine in the cloud, install the model weights, and run inference yourself. The cloud provider supplies the hardware and hypervisor. You control everything above that: the operating system, the runtime, the model, and the network configuration.
Cost: Moderate. You pay for GPU compute by the hour. No per-token markup. You absorb the operational cost of patching, monitoring, and managing the instance yourself.
Security posture: Significantly better than Tier 1. Your model weights, your prompts, and your completions never leave your VM. No provider API sits between you and the model. You control the firewall rules, the SSH keys, and the disk encryption. You can air-gap the instance from the internet entirely if your use case allows it.
But the hardware is still not yours. The hypervisor is the provider’s. The physical host is in their data center, maintained by their technicians. A sufficiently motivated state actor or a compromised cloud employee with physical access to the host could theoretically extract data from GPU memory or intercept DMA traffic. This is not a realistic threat for most organizations — but for some, it is the only threat that matters.
Exposure surface:
- Cloud provider’s hypervisor and physical access
- Your own OS patching and configuration discipline
- Network path between your users and the VM
- Disk and memory encryption implementation
Tier 3: Your Own Hardware
Examples: A rack-mounted server with NVIDIA GPUs in your own facility, a workstation under your desk running Ollama, a purpose-built inference appliance in your server room
This is on-prem in the original, unambiguous sense. You own the hardware. You control physical access. The model weights are on your disk, the inference happens on your GPU, and the only network involved is the one you built. No cloud provider, no hypervisor, no shared tenancy.
Cost: Highest capital expenditure, lowest marginal cost. A capable GPU server costs thousands upfront but runs inference at the cost of electricity. Over sustained workloads the economics invert — what costs dollars per hour in the cloud costs pennies. No one sends you a bill when your agent runs overnight.
Security posture: Maximum control. The attack surface is reduced to your own physical security, your own network, and your own operational discipline. There is no third party to trust, no contractual guarantee to interpret, no shared infrastructure to worry about. If you can see every cable, every process, and every user on the box, you have achieved the highest assurance level available outside of a SCIF.
The risk shifts entirely to you. If you do not patch, you are exposed. If your physical security is weak, someone can walk out with the drive. If your network is flat, a compromised workstation can reach the inference server. Sovereign control means sovereign responsibility.
Exposure surface:
- Your physical security perimeter
- Your network architecture and segmentation
- Your operational discipline (patching, access control, monitoring)
- Supply chain integrity of the hardware itself
The Point
All three are called “on-prem.” Only one of them means your data never touches infrastructure you do not own. The distinction matters most when the data is sensitive — classified, proprietary, medical, legal, or financial — and the consequences of exposure are not just embarrassing but material.
The right tier depends on your threat model, not your budget. An organization processing open-source research papers can comfortably use Tier 1. An organization processing legal discovery documents should think hard about Tier 2. An organization whose adversaries include nation-states should be running Tier 3 — and should have been for years.
When someone says “on-prem,” ask them: whose premise?

