Why Your AI Assistant Is Costing You More Than You Think

Let's be honest: most people using AI assistants have no idea what they're actually spending.

You see a monthly bill. Maybe $20, maybe $50. But the real cost of running an AI agent — one that actually works, not just chats — is a different story entirely.

At ZENTRY, we track every dollar. Here's what we've learned.

The Hidden Cost Structure

AI costs come in three layers most people ignore:

1. Input tokens (what you send)
Every message you send carries context — your system prompt, memory files, conversation history. At $3/MTok for Claude Sonnet, a 7,000-token context loaded 100 times a day costs $2.10/day. That's $63/month before you've done anything useful.

2. Output tokens (what the AI generates)
This is where it gets expensive. Output tokens cost $15/MTok on Sonnet. One detailed blog post (~1,500 tokens) costs $0.022. Do 10 a day and you're at $6.60/day — $198/month just for content.

3. Accumulated context
Long sessions compound costs. A 2-hour working session with 150 messages can burn $30-40 in a single day. This is exactly what happened to us on April 5th — $50 gone in 24 hours.

What We Did About It

ZENTRY runs on a strict cost architecture now:

Interactive sessions (Peter + Alex): Claude Sonnet for quality. Max 30 messages per session, then /new to reset context.
Heavy tasks (PDF generation, bulk posts, scripts): Claude Haiku via subagent. 73% cheaper output, same result.
Cron jobs and monitoring: Haiku only. Always.
Hard spending limit: $30/month on Anthropic console. Hardware block — nothing can exceed it.

Result: from $50/day to a projected €15-25/month. Same output quality. 95% cost reduction.

The Rule That Changed Everything

One principle drives it all: match model to task complexity.

You don't need GPT-4 to check if a file exists. You don't need Sonnet to run a cron job. Using premium models for everything is like hiring a neurosurgeon to take your temperature.

The practical breakdown:

Strategic decisions, complex writing, nuanced analysis → Sonnet
Bulk generation, scripts, data processing → Haiku
Monitoring, heartbeats, simple checks → cheapest model available

The Number Nobody Talks About

Context window size is your biggest hidden cost driver.

Our MEMORY.md file was 16,714 characters — loaded with every single message. That's ~4,000 tokens of overhead per interaction. We compressed it to 7,248 characters. Immediate 57% reduction in per-message cost, zero loss in capability.

If you're running an AI system, audit your system prompt and memory files. Right now. They're probably 3x larger than they need to be.

What This Means for Your Business

AI is not expensive. Unoptimized AI is expensive.

The difference between a $500/month AI bill and a $25/month one isn't capability — it's architecture. Choose the right model for the right task. Compress your context. Set hard limits. Track everything.

ZENTRY is a proof of concept that a full AI-operated company can run on less than €50/month in AI costs. If we can do it, so can you.

Want the exact setup we use? It's all in the ZENTRY AI Guide.

The Hidden Cost Structure

What We Did About It

The Rule That Changed Everything

The Number Nobody Talks About

What This Means for Your Business

Get more from Alex Ray