OpenClaw Cost Control: Cut API Spending, Keep Your Agent

Q: How much does OpenClaw cost per month for a typical user?

For a light user running occasional tasks, expect \$10-30 per month. Automated workflows with a 24/7 agent run \$30-150 per month depending on model choice and session volume. Heavy API-only deployments can reach \$800-1,500 per month. The biggest variable is whether you assign cheap models to routine tasks.

Q: Why doesn't /status show my dollar cost?

OpenClaw /status shows dollar cost only when you authenticate with a direct API key. If you're using OAuth (Claude Pro or Max), /status shows tokens only. Cost is estimated from the models.providers pricing config in openclaw.json, not from the provider's billing API.

Key takeaways

Context accumulation accounts for 40-50% of typical token usage; sessions re-send full conversation history on every turn
Model choice is the biggest lever: Opus costs roughly 25x more per input token than Haiku, and cron jobs run on whatever model you left as default
OpenClaw has no built-in hard spend cap; /status and /context detail are your monitoring tools
Provider-level spend limits (Anthropic console, OpenRouter budgets) are the only hard safety nets
Thinking/reasoning mode tokens cost 3-5x more than normal tokens; disable it for routine tasks

Feed this to your OpenClaw agent

Read https://www.stack-junkie.com/blog/openclaw-cost-control-manage-api-spending-without-killing-your-agent and follow the instructions to help me with: OpenClaw Cost Control: Cut API Spending, Keep Your Agent. Summarize each step before running commands and confirm with me first.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

Fixes when it breaks. Workflows when it doesn't.

OpenClaw guides, configs, and troubleshooting notes. Every two weeks.

What does OpenClaw actually cost per month?

It depends on two things: which models you're running and how often the agent talks. Someone running occasional tasks pays $10-30/month. Automate a bunch of workflows with a 24/7 agent and you're looking at $70-150. Go harder and it keeps climbing. These ranges come from community-reported figures:

Usage level	Monthly tokens	Estimated cost
Light (occasional use)	5-20M	$10-30
Medium (automated workflows)	20-50M	$30-70
Heavy (24/7 assistant)	50-200M	$70-150+
Extreme (fully automated, always-on)	180M+	$800-3,600+

The extreme end is real. Running OpenClaw 24/7 on API models alone runs $800-1,500/month for heavy commercial workflows. The often-cited $3,600/month figure represents an outlier, not a typical outcome.

OpenClaw (previously Clawdbot and Moltbot) bills per request. Every conversation turn, every cron job, every sub-agent spawn eats tokens. There's no flat monthly rate. The bill tracks activity.

API key vs subscription

On a Claude Pro or Max subscription? Bad news. OAuth tokens don't show dollar cost in /status, and Anthropic has been restricting third-party OAuth access. Direct API keys are the only billing path that actually works.

Why OpenClaw uses more tokens than you expect

Most people underestimate how many tokens each turn burns. The reason isn't obvious until you look under the hood. Every single request re-injects the full tool list, skills metadata, and workspace files (AGENTS.md, SOUL.md, MEMORY.md, the lot) before your conversation history even enters the picture.

The OpenClaw docs break down where context tokens go. Run /context detail to see your own numbers. Typical distribution:

Driver	Share of total usage
Context accumulation (session history)	40-50%
Tool output storage	20-30%
System prompt (files, tools, metadata)	10-15%
Multi-round reasoning steps	10-15%
Cache misses	5-10%

Session history is the big one. A 40-turn conversation re-sends message 1 forty times. Full history goes with every request. By turn 40, you've paid for 40 copies of your first message.

Common misconfigurations make it worse:

Leaving Opus as the default model for everything, including heartbeat cron jobs
Enabling thinking/reasoning mode globally. Reasoning-mode tokens cost 3-5x more than standard tokens
Oversized MEMORY.md and AGENTS.md files getting injected into every request
Browser automation per-scrape token cost varies by page size and model. A single scrape can run a few cents to tens of cents depending on context length and reasoning mode

One user spent $2 just initializing a fresh instance. Another burned $200 in a day from an agent stuck in a loop. Not edge cases. That's what happens when nobody's watching the meter.

How to diagnose your OpenClaw API spend

Three commands tell you where the money's going. Run them in this order.

Give this to your agent

Show me my current token usage and which files are taking up the most context. Run /status, then /context detail, and break down what's costing me the most tokens.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

Manual diagnostic steps

Start with /status. This shows the current model, context token count, tokens from the last response, and estimated dollar cost. Cost is only shown when using a direct API key, not OAuth. If you see tokens but no dollar amount, you're on OAuth.

/status

Next, run /usage full to append a cost footer to every response going forward, so you can see what each individual exchange costs.

/usage full

Then /context detail. This shows a per-file, per-tool token breakdown. Run it when you want to know which files are eating context. If AGENTS.md, SOUL.md, or MEMORY.md are dominating, those files need trimming. If tool outputs are the issue, check whether those tools are being called unnecessarily.

/context detail

For provider-level quota windows, use the CLI version:

bash

openclaw status --usage

Compare against your Anthropic console or OpenRouter dashboard for the full billing picture.

OpenClaw model tiering: which model for which task

Simple rule: cheap model by default. Expensive model only when the task actually needs it. Most of what OpenClaw does doesn't need Opus.

Cost comparison as of March 2026, from Anthropic's published pricing:

Model	Input cost (per 1M tokens)	Output cost (per 1M tokens)
Claude Opus	$15	$75
Claude Sonnet	~$3	~$15
Claude Haiku	$0.80	$4
Gemini Flash	$0.30	$1.20

Opus costs roughly 25x more per input token than Haiku. The gap between Opus and Gemini Flash is approximately 50x. On our setup, tiered routing cut API spend by over 60%.

The practical split:

Status checks, cron heartbeats, routing decisions: Haiku or Gemini Flash
Summarization, research, email drafting: Sonnet
Complex writing, multi-step reasoning, code generation: Opus or Sonnet with reasoning

A heartbeat cron running on Opus costs roughly $0.50/day just to say "nothing to report." That same cron on Haiku costs under $0.02. Over a month, that's $15 vs under $0.60.

Here's a model routing config that reflects this split:

json

{
  "agents": {
    "defaults": {
      "model": "anthropic/claude-haiku-3-5"
    },
    "cron": {
      "model": "anthropic/claude-haiku-3-5"
    },
    "main": {
      "model": "anthropic/claude-sonnet-4-5"
    }
  }
}

For tasks that need Opus, trigger it per-session rather than setting it as default. The /model command switches models mid-session.

Thinking/reasoning mode: disable it at the agent level for all tasks except those that genuinely need it. Enable per-session when required.

How to reduce OpenClaw session token costs

Every message you send drags the entire session history along with it. Long sessions get expensive fast. This is the easiest place to claw back spend.

By turn 20, you're paying for 20 copies of your earliest messages. The fix: reset sessions and use compact.

The /compact command prunes the session history while keeping essential context intact. Use it when sessions get long, or when you're switching to a different task in the same session. Compacting before the window fills stops costs from compounding.

See the OpenClaw context persistence and compaction guide for full /compact and auto-prune configuration.

Two config settings limit how much workspace content gets injected into the system prompt on every request. From the official token-use docs:

bootstrapMaxChars (default: 20,000): caps how many characters of each individual file are injected
bootstrapTotalMaxChars (default: 150,000): caps the total across all injected files

If your AGENTS.md and SOUL.md are verbose, trimming them and lowering these settings cuts costs immediately, with no capability loss.

json

{
  "agents": {
    "defaults": {
      "bootstrapMaxChars": 12000,
      "bootstrapTotalMaxChars": 80000
    }
  }
}

The getopenclaw.ai official blog recommends keeping MEMORY.md under 3,000 tokens as a hygiene target. If yours has grown past that, trim sections that aren't actively referenced.

One more lever: imageMaxDimensionPx. The default is 1,200px. Drop it to 800 and vision-token usage from screenshots drops without affecting most use cases.

json

{
  "imageMaxDimensionPx": 800
}

Cache timing matters too. Cache TTL pruning and heartbeat scheduling can avoid re-caching full context after an idle period expires the cache. If your heartbeat runs every 30 minutes but your cache TTL is 20 minutes, you're paying full re-cache cost on every heartbeat. Keep heartbeat frequency inside the cache window to avoid this.

How to assign cheaper models to OpenClaw cron jobs

Cron jobs inherit whatever model you set as default. Leave it on Opus and every heartbeat, every daily digest, every monitoring check bills at Opus rates.

Give this to your agent

Show me which model each of my cron jobs is using, and flag any that are on Opus or Sonnet that don't need to be. Suggest cheaper alternatives for each one.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

Cron model assignment

In your cron job configuration, each job can specify its own model. Assign lighter models to routine jobs:

json

{
  "jobs": [
    {
      "id": "heartbeat",
      "schedule": "*/30 * * * *",
      "model": "anthropic/claude-haiku-3-5",
      "task": "Read HEARTBEAT.md and follow it."
    },
    {
      "id": "daily-digest",
      "schedule": "0 8 * * *",
      "model": "anthropic/claude-sonnet-4-5",
      "task": "Read HEARTBEAT.md. Prepare morning briefing."
    }
  ]
}

See the OpenClaw Cron Jobs guide for full configuration options and scheduling examples.

Sub-agents aren't free. Routing messages between them, injecting context for each one, collecting results: all of that burns tokens a single-agent workflow wouldn't touch. The exact overhead depends on setup and nobody's published official benchmarks. Spawn sub-agents when the parallelism pays for itself, not by default.

In multi-agent setups, cacheRetention is tunable per agent. If you have a persistent agent for a specific function, setting its cacheRetention to match the task frequency cuts re-cache costs.

Provider-level spend limits: the only hard safety net

OpenClaw won't stop itself from spending. There's no built-in mechanism to halt when costs exceed a threshold. The only guardrails are in your provider console.

Set them up now. Not later.

Log into console.anthropic.com, go to Settings > Limits, and set a monthly spend cap. Anthropic stops accepting API calls when the limit is hit. The agent stops responding, but it won't burn past the cap.

If you route through OpenRouter, each API key supports a budget. Set it per-key so individual agents can't exhaust the full balance.

A daily cron check compares your actual spend against a threshold and alerts you if you're trending over:

Give this to your agent

Check my Anthropic API usage for today. If it's over $2, send me an alert to Telegram with the current amount.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

The getopenclaw.ai blog suggests this pattern for daily spend monitoring as a lightweight alternative to provider-level alerts.

When you hit the limit, the agent goes silent. Sessions in progress can fail mid-task. No graceful degradation, just a hard stop. Set the cap high enough that normal heavy use won't trigger it, but low enough to catch runaways.

One documented case involved an agent stuck in a retry loop that burned $200 in a single day before the user noticed. Provider limits are the only protection against this pattern.

When local models via Ollama make sense (honest trade-offs)

Local models via Ollama cost $0 in API fees. You trade that for latency, a lower capability ceiling, and setup time. Not a universal answer, but it works for specific things.

Ollama runs on your hardware. No API calls, no bill beyond electricity. For tasks that don't need strong reasoning, the quality gap is livable.

Where local models work:

Pre-filtering high volumes of low-signal input (notifications, emails, webhook payloads)
Simple summarization where precision isn't critical
Routing decisions between agents ("is this task for the research agent or the writing agent?")
Any task where you'd normally use Haiku but want to cut the API cost entirely

Where they fall short:

Complex writing or long-form drafts
Code generation and debugging
Multi-step reasoning chains
Anything where quality mistakes have downstream cost

On our setup, we use a cheap model as the orchestrator for routing and triage, while Sonnet or Opus only gets called for the 20% of tasks that actually need the horsepower.

OpenClaw doesn't abstract this into a single toggle. You'd set it up as model routing rules, deliberately choosing which task types go where.

Subscription vs API key: what actually works in 2026

If you were hoping a Claude Pro or Max subscription would dodge API costs, that door closed. OAuth tokens don't report dollar costs, and Anthropic has been restricting third-party OAuth access.

What actually works now:

Anthropic direct API key: full support, dollar cost in /status, every model available. Per-token billing. Anthropic's tier system drops per-token prices at higher usage (Tiers 3-4).

OpenRouter: one API key, multiple providers. Handy if you want Anthropic, Google, and open models without juggling separate credentials. Per-key budget limits come built in.

OAuth (Claude Pro/Max): not recommended. Even when it worked, /status showed tokens but never dollars. Cost tracking was a guess.

The official docs on token use confirm that cost estimation in /status requires a direct API key. There's no workaround for OAuth sessions.

What our setup costs after tuning

Here's what actually changed on our setup after applying all of the above.

We started with Opus on everything. Heartbeats, daily crons, main sessions, sub-agents. The bill was what you'd expect.

Moving the heartbeat cron to Haiku alone dropped that job from ~$0.50/day to under $0.02/day. Trimming MEMORY.md and AGENTS.md (stale sections, redundant rules) cut bootstrap injection by about 30%. We set bootstrapMaxChars to 12,000 and bootstrapTotalMaxChars to 80,000. Added a monthly spend limit in the Anthropic console. Started running /compact at the end of long working sessions.

Our monthly total isn't $20. The agent runs all day with automated workflows. But the cost came down, and now Opus only fires for tasks that actually need it.

Starting from scratch? One move matters more than everything else: set a cheap model as default. Escalate to Opus per-task. The rest is fine-tuning.

Key Terms

Context window: the amount of text, measured in tokens, that a model can process in a single request. Costs scale with how much of the context window you fill on each turn.

Token: roughly 0.75 words in English, per Anthropic's tokenizer documentation. The unit of cost for all API calls. A 1,000-word article is approximately 1,300 tokens.

Prompt caching: a provider feature that reduces cost when the same context is re-sent across multiple requests. When the cache is warm, you pay less to re-send repeated content.

Model routing: directing different task types to different models based on cost and capability. Cheap models for simple tasks, expensive models for complex ones.

bootstrapMaxChars: a config setting that limits how many characters of each workspace file get injected into the system prompt on each request.

OAuth token: an authorization token from a Claude Pro or Max subscription. These tokens don't report dollar costs in OpenClaw, and third-party OAuth access has been restricted.

FAQ

Does OpenClaw have a built-in spending limit?

No. OpenClaw has no built-in hard spend cap. Spending discipline is manual. The guardrails are in your provider console: Anthropic's monthly spend limit and OpenRouter's per-key budget. Without those set, a runaway agent can burn through significant API credit in hours, as documented in community reports.

How much does OpenClaw cost per month for a typical user?

For a light user running occasional tasks, expect $10-30 per month. Automated workflows with a 24/7 agent run $30-150 per month depending on model choice and session volume. Heavy API-only deployments can reach $800-1,500 per month. The biggest variable is whether you assign cheap models to routine tasks.

Why doesn't /status show my dollar cost?

OpenClaw /status shows dollar cost only when you authenticate with a direct API key. If you're using OAuth (Claude Pro or Max), /status shows tokens only. Cost is estimated from the models.providers pricing config in openclaw.json, not from the provider's billing API.

Can I use Claude Pro or Max with OpenClaw to avoid API costs?

Not reliably as of 2026. OAuth tokens don't show dollar costs in OpenClaw's /status, and Anthropic has been restricting third-party OAuth access. Even when it worked, cost tracking was unreliable. Direct API keys remain the supported path.

Which OpenClaw model setting has the biggest cost impact?

Model assignment. Running Opus for all tasks costs roughly 25x more per input token than Haiku. A heartbeat cron that fires every 30 minutes on Opus costs approximately $0.50 per day. That same cron on Haiku costs under $0.02 per day. Assign cheap models to monitoring, routing, and summarization tasks.

Evidence & Methodology

Model pricing sourced from Anthropic's official pricing page as of March 2026. Token usage breakdowns derived from OpenClaw's token-use documentation and /context detail output from our own running instance. Monthly cost ranges based on community discussion in the OpenClaw GitHub and our own operational data. The getopenclaw.ai official blog provided additional context on heartbeat costs and model routing practices.

The multi-agent token overhead varies by setup and is not officially benchmarked by OpenClaw. The $3,600/month figure represents a documented extreme case, not a typical outcome. Individual costs vary with usage patterns, caching behavior, and model mix.

Changelog

Date	Change
2026-03-08	Initial draft published

Key takeaways

Context accumulation accounts for 40-50% of typical token usage; sessions re-send full conversation history on every turn
Model choice is the biggest lever: Opus costs roughly 25x more per input token than Haiku, and cron jobs run on whatever model you left as default
OpenClaw has no built-in hard spend cap; /status and /context detail are your monitoring tools
Provider-level spend limits (Anthropic console, OpenRouter budgets) are the only hard safety nets
Thinking/reasoning mode tokens cost 3-5x more than normal tokens; disable it for routine tasks

Feed this to your OpenClaw agent

Read https://www.stack-junkie.com/blog/openclaw-cost-control-manage-api-spending-without-killing-your-agent and follow the instructions to help me with: OpenClaw Cost Control: Cut API Spending, Keep Your Agent. Summarize each step before running commands and confirm with me first.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

Fixes when it breaks. Workflows when it doesn't.

OpenClaw guides, configs, and troubleshooting notes. Every two weeks.

What does OpenClaw actually cost per month?

Usage level	Monthly tokens	Estimated cost
Light (occasional use)	5-20M	$10-30
Medium (automated workflows)	20-50M	$30-70
Heavy (24/7 assistant)	50-200M	$70-150+
Extreme (fully automated, always-on)	180M+	$800-3,600+

OpenClaw (previously Clawdbot and Moltbot) bills per request. Every conversation turn, every cron job, every sub-agent spawn eats tokens. There's no flat monthly rate. The bill tracks activity.

API key vs subscription

Why OpenClaw uses more tokens than you expect

The OpenClaw docs break down where context tokens go. Run /context detail to see your own numbers. Typical distribution:

Driver	Share of total usage
Context accumulation (session history)	40-50%
Tool output storage	20-30%
System prompt (files, tools, metadata)	10-15%
Multi-round reasoning steps	10-15%
Cache misses	5-10%

Session history is the big one. A 40-turn conversation re-sends message 1 forty times. Full history goes with every request. By turn 40, you've paid for 40 copies of your first message.

Common misconfigurations make it worse:

Leaving Opus as the default model for everything, including heartbeat cron jobs
Enabling thinking/reasoning mode globally. Reasoning-mode tokens cost 3-5x more than standard tokens
Oversized MEMORY.md and AGENTS.md files getting injected into every request
Browser automation per-scrape token cost varies by page size and model. A single scrape can run a few cents to tens of cents depending on context length and reasoning mode

One user spent $2 just initializing a fresh instance. Another burned $200 in a day from an agent stuck in a loop. Not edge cases. That's what happens when nobody's watching the meter.

How to diagnose your OpenClaw API spend

Three commands tell you where the money's going. Run them in this order.

Give this to your agent

Show me my current token usage and which files are taking up the most context. Run /status, then /context detail, and break down what's costing me the most tokens.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

Manual diagnostic steps

/status

Next, run /usage full to append a cost footer to every response going forward, so you can see what each individual exchange costs.

/usage full

/context detail

For provider-level quota windows, use the CLI version:

bash

openclaw status --usage

Compare against your Anthropic console or OpenRouter dashboard for the full billing picture.

OpenClaw model tiering: which model for which task

Simple rule: cheap model by default. Expensive model only when the task actually needs it. Most of what OpenClaw does doesn't need Opus.

Cost comparison as of March 2026, from Anthropic's published pricing:

Model	Input cost (per 1M tokens)	Output cost (per 1M tokens)
Claude Opus	$15	$75
Claude Sonnet	~$3	~$15
Claude Haiku	$0.80	$4
Gemini Flash	$0.30	$1.20

Opus costs roughly 25x more per input token than Haiku. The gap between Opus and Gemini Flash is approximately 50x. On our setup, tiered routing cut API spend by over 60%.

The practical split:

Status checks, cron heartbeats, routing decisions: Haiku or Gemini Flash
Summarization, research, email drafting: Sonnet
Complex writing, multi-step reasoning, code generation: Opus or Sonnet with reasoning

A heartbeat cron running on Opus costs roughly $0.50/day just to say "nothing to report." That same cron on Haiku costs under $0.02. Over a month, that's $15 vs under $0.60.

Here's a model routing config that reflects this split:

json

{
  "agents": {
    "defaults": {
      "model": "anthropic/claude-haiku-3-5"
    },
    "cron": {
      "model": "anthropic/claude-haiku-3-5"
    },
    "main": {
      "model": "anthropic/claude-sonnet-4-5"
    }
  }
}

For tasks that need Opus, trigger it per-session rather than setting it as default. The /model command switches models mid-session.

Thinking/reasoning mode: disable it at the agent level for all tasks except those that genuinely need it. Enable per-session when required.

How to reduce OpenClaw session token costs

Every message you send drags the entire session history along with it. Long sessions get expensive fast. This is the easiest place to claw back spend.

By turn 20, you're paying for 20 copies of your earliest messages. The fix: reset sessions and use compact.

See the OpenClaw context persistence and compaction guide for full /compact and auto-prune configuration.

Two config settings limit how much workspace content gets injected into the system prompt on every request. From the official token-use docs:

bootstrapMaxChars (default: 20,000): caps how many characters of each individual file are injected
bootstrapTotalMaxChars (default: 150,000): caps the total across all injected files

If your AGENTS.md and SOUL.md are verbose, trimming them and lowering these settings cuts costs immediately, with no capability loss.

json

{
  "agents": {
    "defaults": {
      "bootstrapMaxChars": 12000,
      "bootstrapTotalMaxChars": 80000
    }
  }
}

The getopenclaw.ai official blog recommends keeping MEMORY.md under 3,000 tokens as a hygiene target. If yours has grown past that, trim sections that aren't actively referenced.

One more lever: imageMaxDimensionPx. The default is 1,200px. Drop it to 800 and vision-token usage from screenshots drops without affecting most use cases.

json

{
  "imageMaxDimensionPx": 800
}

How to assign cheaper models to OpenClaw cron jobs

Cron jobs inherit whatever model you set as default. Leave it on Opus and every heartbeat, every daily digest, every monitoring check bills at Opus rates.

Give this to your agent

Show me which model each of my cron jobs is using, and flag any that are on Opus or Sonnet that don't need to be. Suggest cheaper alternatives for each one.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

Cron model assignment

In your cron job configuration, each job can specify its own model. Assign lighter models to routine jobs:

json

{
  "jobs": [
    {
      "id": "heartbeat",
      "schedule": "*/30 * * * *",
      "model": "anthropic/claude-haiku-3-5",
      "task": "Read HEARTBEAT.md and follow it."
    },
    {
      "id": "daily-digest",
      "schedule": "0 8 * * *",
      "model": "anthropic/claude-sonnet-4-5",
      "task": "Read HEARTBEAT.md. Prepare morning briefing."
    }
  ]
}

See the OpenClaw Cron Jobs guide for full configuration options and scheduling examples.

In multi-agent setups, cacheRetention is tunable per agent. If you have a persistent agent for a specific function, setting its cacheRetention to match the task frequency cuts re-cache costs.

Provider-level spend limits: the only hard safety net

OpenClaw won't stop itself from spending. There's no built-in mechanism to halt when costs exceed a threshold. The only guardrails are in your provider console.

Set them up now. Not later.

If you route through OpenRouter, each API key supports a budget. Set it per-key so individual agents can't exhaust the full balance.

A daily cron check compares your actual spend against a threshold and alerts you if you're trending over:

Give this to your agent

Check my Anthropic API usage for today. If it's over $2, send me an alert to Telegram with the current amount.

Always review commands your agent suggests before approving them. Don't paste prompts from sources you don't trust.

The getopenclaw.ai blog suggests this pattern for daily spend monitoring as a lightweight alternative to provider-level alerts.

One documented case involved an agent stuck in a retry loop that burned $200 in a single day before the user noticed. Provider limits are the only protection against this pattern.

When local models via Ollama make sense (honest trade-offs)

Local models via Ollama cost $0 in API fees. You trade that for latency, a lower capability ceiling, and setup time. Not a universal answer, but it works for specific things.

Ollama runs on your hardware. No API calls, no bill beyond electricity. For tasks that don't need strong reasoning, the quality gap is livable.

Where local models work:

Pre-filtering high volumes of low-signal input (notifications, emails, webhook payloads)
Simple summarization where precision isn't critical
Routing decisions between agents ("is this task for the research agent or the writing agent?")
Any task where you'd normally use Haiku but want to cut the API cost entirely

Where they fall short:

Complex writing or long-form drafts
Code generation and debugging
Multi-step reasoning chains
Anything where quality mistakes have downstream cost

On our setup, we use a cheap model as the orchestrator for routing and triage, while Sonnet or Opus only gets called for the 20% of tasks that actually need the horsepower.

OpenClaw doesn't abstract this into a single toggle. You'd set it up as model routing rules, deliberately choosing which task types go where.

Subscription vs API key: what actually works in 2026

If you were hoping a Claude Pro or Max subscription would dodge API costs, that door closed. OAuth tokens don't report dollar costs, and Anthropic has been restricting third-party OAuth access.

What actually works now:

Anthropic direct API key: full support, dollar cost in /status, every model available. Per-token billing. Anthropic's tier system drops per-token prices at higher usage (Tiers 3-4).

OpenRouter: one API key, multiple providers. Handy if you want Anthropic, Google, and open models without juggling separate credentials. Per-key budget limits come built in.

OAuth (Claude Pro/Max): not recommended. Even when it worked, /status showed tokens but never dollars. Cost tracking was a guess.

The official docs on token use confirm that cost estimation in /status requires a direct API key. There's no workaround for OAuth sessions.

What our setup costs after tuning

Here's what actually changed on our setup after applying all of the above.

We started with Opus on everything. Heartbeats, daily crons, main sessions, sub-agents. The bill was what you'd expect.

Our monthly total isn't $20. The agent runs all day with automated workflows. But the cost came down, and now Opus only fires for tasks that actually need it.

Starting from scratch? One move matters more than everything else: set a cheap model as default. Escalate to Opus per-task. The rest is fine-tuning.

Key Terms

Context window: the amount of text, measured in tokens, that a model can process in a single request. Costs scale with how much of the context window you fill on each turn.

Token: roughly 0.75 words in English, per Anthropic's tokenizer documentation. The unit of cost for all API calls. A 1,000-word article is approximately 1,300 tokens.

Prompt caching: a provider feature that reduces cost when the same context is re-sent across multiple requests. When the cache is warm, you pay less to re-send repeated content.

Model routing: directing different task types to different models based on cost and capability. Cheap models for simple tasks, expensive models for complex ones.

bootstrapMaxChars: a config setting that limits how many characters of each workspace file get injected into the system prompt on each request.

OAuth token: an authorization token from a Claude Pro or Max subscription. These tokens don't report dollar costs in OpenClaw, and third-party OAuth access has been restricted.

Date	Change
2026-03-08	Initial draft published

More Advanced guides

Related Articles

How to Set Up OpenClaw Cron Jobs (With Examples You Can Copy)

How to Install OpenClaw Skills From ClawHub (+ Build Your Own)

OpenClaw TikTok Marketing: How the Larry Skill Works

More Advanced guides

Related Articles

How to Set Up OpenClaw Cron Jobs (With Examples You Can Copy)

How to Install OpenClaw Skills From ClawHub (+ Build Your Own)

OpenClaw TikTok Marketing: How the Larry Skill Works