AI Cost Guide
This page explains how AI costs work in Supervertaler for Trados and how to keep them under control. It deliberately avoids quoting exact per-model prices – those change often, and Supervertaler already shows you the real, current cost of every operation in the Reports tab. For exact figures, see Estimates vs actual cost below.
Estimates vs actual cost
The Reports tab shows real billed token counts and cost as reported by your provider’s API – with cache-hit tokens broken out (e.g. 830,000 in (720,000 cached) / 32,000 out · $1.36). When the cost has no ~ prefix, that’s the actual amount your provider will charge. This is the single best place to see what you’re actually spending – it’s live, per-operation, and provider-reported.
The number is cache-aware:
- Anthropic (Claude) native + OpenRouter → Claude: real usage from
usage.input_tokens+cache_creation_input_tokens+cache_read_input_tokens. Cache reads are billed at 0.1× the input rate, cache writes at 1.25×. - OpenAI: real
prompt_tokensandcompletion_tokensplusprompt_tokens_details.cached_tokensfor the auto-cache discount (50% off cached input). - DeepSeek: real
prompt_tokens/completion_tokenswith auto-cache (90% off cached input). - Gemini 2.5+: real
usageMetadatawith implicit cache (75% off cached input).
For these providers the in-app number is the authoritative billable figure (modulo any account-level credits or monthly minimums you may have).
The chars/4 estimate is still used as a fallback when the provider didn’t return usage info – this affects Ollama (local, free anyway), some provider edge cases, and any response shape we couldn’t parse. In those cases the cost still appears with a ~ prefix to flag it as an estimate.
If you want to cross-check against your provider’s own dashboard:
| Provider | Where to look |
|---|---|
| Anthropic (Claude) | platform.claude.com – Cost |
| OpenAI (GPT) | platform.openai.com – Usage |
| Google (Gemini) | console.cloud.google.com – Billing reports |
| xAI (Grok) | console.x.ai – Usage |
| Mistral AI | console.mistral.ai – Usage |
| OpenRouter | openrouter.ai – Activity |
| DeepSeek | platform.deepseek.com – Usage |
| Ollama | Free – local execution, no provider console. |
The in-app number and the provider dashboard should agree to within rounding for any given run. If you see a meaningful gap, the most likely causes (in order) are: a provider-side credit or volume discount the in-app calculator can’t see; the in-app pricing table being a little behind a recent rate change; or, for fallback (estimate) cases, the chars/4 heuristic over- or under-counting tokens for that particular language and content type.
How costs are calculated
AI providers charge per token – a unit of text roughly equal to ¾ of a word. Costs depend on:
- Input tokens – the text you send (source segment, system prompt, terminology context)
- Output tokens – the text the model returns (translated segment, proofread text, generated prompt)
Because Supervertaler translates segment by segment, the system prompt and terminology context are included with every segment. For a typical 5,000-word document (~250 segments), the token usage works out roughly like this:
| Task | Input tokens | Output tokens |
|---|---|---|
| Batch Translate | ~125,000 | ~8,000 |
| AI Proofreader | ~140,000 | ~8,000 |
| AutoPrompt | ~10,000 | ~2,000 |
These are estimates for a representative document – actual usage varies with segment length, terminology context size, and prompt complexity. Token counts like these are fairly stable; what changes is the price per token, which is why this guide points you to live figures rather than quoting them.
How much will it cost?
There’s a wide spread between models. As a rough mental model:
- Local models (Ollama) are free – they run on your own computer, with no API charges at all. The trade-off is that quality depends on your hardware, and they’re generally less capable than cloud-hosted models. If you have a computer with 8+ GB of RAM, TranslateGemma 12B delivers surprisingly good results for free.
- Budget cloud models – the “Mini”, “Flash-Lite” and “Small” tier from each provider (e.g. GPT-5.4 Mini, Gemini 3.1 Flash-Lite, Mistral Small, Claude Haiku 4.5) – typically cost a small fraction of a cent per segment. They’re excellent for routine, high-volume translation.
- Flagship models – Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro and the like – can run roughly 10–50× the price of the budget tier. Reserve them for specialised content where the quality difference earns its keep.
To see what a model actually costs for your work, run one operation and check the Reports tab – it shows the real billed cost. For OpenRouter, expect the underlying provider’s rate plus a small platform fee.
Our recommendation
For budget-conscious batch work, GPT-5.4 Mini or Gemini 3.1 Flash-Lite offer excellent quality at a fraction of the price. For the absolute highest quality on specialised content, Claude Opus 4.7 or GPT-5.5 are worth the premium.
Token pricing
Supervertaler’s in-app cost figures come from a built-in per-token pricing table. Because provider prices change regularly, that table is occasionally a little behind a recent rate change – the Reports tab’s provider-reported figures are always the authoritative ones. For the definitive current rates, check the provider’s own pricing page:
OpenAI · Anthropic · Google Gemini · xAI · Mistral · DeepSeek · OpenRouter
Tips for managing costs
- Start with a budget model – GPT-5.4 Mini, Gemini 3.1 Flash-Lite, or Mistral Small are excellent for routine translation at a fraction of the cost of a flagship.
- Use premium models selectively – reserve GPT-5.5, Claude Opus 4.7, or Gemini 2.5 Pro for specialised content (legal, medical, patents) where the quality difference justifies the cost.
- Try Ollama for zero cost – if you have a computer with 8+ GB of RAM, TranslateGemma 12B delivers surprisingly good results for free.
- Check your usage – the Reports tab in Supervertaler Assistant lists every AI call with its token count and cost, and your provider’s own console (see the Estimates vs actual cost table above) shows the authoritative billable figure.
Built-in cost protection
Supervertaler includes several safeguards to help you avoid unexpected costs:
QuickLauncher prompts are standalone
When you run a prompt from the QuickLauncher menu (Ctrl+Q), only the prompt itself is sent to the AI – not the chat history. This means a simple terminology query costs only what it needs to, even if you have a long conversation in the chat window.
Chat token budget
Regular chat messages include recent conversation history so the AI can follow your discussion. However, Supervertaler automatically trims older messages when the history grows too large (~50,000 tokens). This prevents costs from spiralling when previous messages contained large context blocks (e.g. full document content).
Cost warning
If a request is estimated to cost more than $0.50 in input tokens, a confirmation dialogue appears showing the estimated token count and cost. You can cancel before the expensive request is sent.
.png)
Choosing the right model
For everyday work – chat queries, terminology questions, QuickLauncher prompts – use GPT-5.4 Mini or another budget model. Reserve premium models like GPT-5.5 or Claude Opus 4.7 for AutoPrompt and complex tasks where the quality difference justifies the cost.
See also
- AI Settings – configure your API keys and choose a model
- Batch Translate – translate segments in bulk
- AI Proofreader – proofread translated segments
- AutoPrompt – generate translation prompts
- Licensing & Pricing – Supervertaler subscription plans