Inference Cost Calculator

Calculate the cost of serving LLM inference at scale with self-hosted GPUs vs. API providers.

Model Size

Requests per Day

Avg. Input Tokens per Request

Avg. Output Tokens per Request

Latency SLA

GPU Model

Self-Hosted Results

GPUs Needed

Min 2 for VRAM

Cost per 1K Input Tokens

$0.0240

Daily Cost

$168.00

Monthly Cost

$5.0K

Provider	Input $/1K	Output $/1K	Daily Cost	Monthly Cost	vs. Self-hosted
Self-hosted (H100 SXM)	$0.0240	$0.0240	$168.00	$5.0K	--
OpenAI GPT-4o	$0.0025	$0.0100	$32.50	$975.00	-81%
Anthropic Claude Sonnet	$0.0030	$0.0150	$45.00	$1.4K	-73%
Google Gemini Pro	$0.0013	$0.0050	$16.25	$487.50	-90%