Cost Management - Rabbithole Documentation

Honest warning: Rabbithole is not a cost-effective platform for high-traffic sites. Every uncached page view triggers a live LLM API call. At scale, this gets expensive fast. Read this page before deploying to production.

Contents

How Billing Works
Model Tiers & Per-Page Cost
Caching Behavior
Web Tools Cost Multiplier
Cost by Traffic Scenario
Cost Control Strategies
When NOT to Use Rabbithole
Monitoring Your Spend

How Billing Works

Rabbithole generates pages by calling the Anthropic Claude API. Each page generation is a single API call. You are billed by Anthropic directly based on token consumption — both input (the system prompt + page concept) and output (the generated HTML).

You pay per uncached page visit. The first visitor to any URL triggers an API call and pays the cost. Subsequent visitors to the same URL are served from the SQLite cache for free. There is no Rabbithole-specific billing layer; it is a pass-through to Anthropic's token pricing.

A typical page generation involves:

Input tokens: the global system prompt (~1,000–2,000 tokens) + the per-page concept prompt (~200–600 tokens) + any tool results if web tools are enabled
Output tokens: the complete HTML document + the ---MAPPINGS--- JSON block (±3,000–8,000 tokens for a typical page)
Tool calls (optional): each web search or fetch invocation adds extra input/output overhead

See the Architecture page for a detailed breakdown of what goes into each API call.

Model Tiers & Per-Page Cost

You configure which Claude model Rabbithole uses in config.toml. The choice of model is the single biggest lever on cost. Anthropic offers three recommended tiers in 2026: Haiku 4.5 ($1/$5), Sonnet 4.6 ($3/$15), and Opus 4.6 ($5/$25) per million input/output tokens.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Est. cost / page (no tools)	Est. cost / page (with tools)	Quality
`claude-haiku-4-5`	$1.00	$5.00	~$0.02–$0.05	~$0.06–$0.15	Fast; simpler pages; occasional errors in complex layouts
`claude-sonnet-4-6`	$3.00	$15.00	~$0.05–$0.12	~$0.15–$0.40	Good balance; recommended default for most sites
`claude-opus-4-6`	$5.00	$25.00	~$0.08–$0.20	~$0.25–$0.60	Best quality; richest output; highest cost

These are estimates. Actual cost depends on the length of the system prompt, the complexity of the page concept, and how much HTML the model generates. A sparse, minimalist page costs less than a rich data-heavy one.

Token math example (Sonnet 4.6, no tools):
Input: 2,000 tokens × $3.00 / 1,000,000 = $0.006
Output: 5,000 tokens × $15.00 / 1,000,000 = $0.075
Total: ~$0.08 per page generation

Anthropic API pricing ranges from $0.25/$1.25 per million tokens (Haiku) to $15/$75 per million tokens (older Opus generations). The newer model families are generally more efficient. Don't default to Opus for routine workloads — costs can be up to 100x higher than Haiku.

Caching Behavior

Rabbithole uses SQLite for page caching. Once a URL is generated, the resulting HTML is stored permanently. Subsequent requests for the same URL are served directly from the database with zero API calls and zero cost.

Visit Type	API Call?	Cost	Latency
First visit (cache miss)	Yes	$0.02–$0.60 depending on model & tools	3–30 seconds (LLM generation time)
Repeat visit (cache hit)	No	$0.00	<10ms (SQLite read)
After cache clear / invalidation	Yes	$0.02–$0.60	3–30 seconds

Implications:

A site with 100 unique pages where every page is visited at least once costs 100 API calls. Future traffic to those pages is free.
A site with rapidly shifting URLs (e.g., user-generated paths, query parameters that create new paths) will generate API calls continuously.
Cache clearing is a destructive cost operation. Do not clear caches carelessly in production.
The cache is stored in rabbithole.db (SQLite). Back it up. Losing it means re-paying to regenerate all cached pages.

Note: Rabbithole's SQLite cache is separate from Anthropic's own prompt caching feature (which reduces cost for reused system prompts within API calls). Prompt caching can be configured to additionally reduce costs on the Anthropic side. Prompt caching reduces costs and latency by reusing previously processed portions of your prompt across API calls, reading from cache at a fraction of the standard input price instead of reprocessing the same large system prompt on every request.

Web Tools Cost Multiplier

When web tools (web search and web fetch) are enabled, each page generation may involve multiple additional round-trips with the API. Each tool invocation causes the model to:

Output a tool call request (tokens charged)
Receive the tool result (injected as input tokens)
Continue generating (more output tokens)

A typical page with 2 web searches + 1 web fetch will add approximately 3,000–8,000 additional input tokens (search results can be verbose) and extend generation time significantly.

Tool Usage	Additional Input Tokens (est.)	Cost Impact (Sonnet 4.6)
No tools	0	baseline
1 web search	+2,000–4,000	+$0.006–$0.012
2 web searches + 1 fetch	+6,000–15,000	+$0.018–$0.045
3+ searches + multiple fetches	+15,000–40,000	+$0.045–$0.12

The combined cost of an Opus-class model with aggressive tool use can reach $0.40–$0.60 per page. This page you are reading likely cost approximately that amount to generate.

Tip: Disable web tools (web_tools = false in config) for sites that don't need real-time information. Static content, fiction, documentation, and other non-news sites rarely benefit enough to justify the added cost.

Cost by Traffic Scenario

The following table models realistic deployment scenarios. These use Sonnet 4.6 without tools as the baseline. Adjust proportionally for other models.

Scenario	Unique Pages	Visitors/day	Cache hit rate	API calls/day	Est. daily cost	Est. monthly cost
Personal blog, small	~50	10–50	~99%	0–2	$0.00–$0.16	~$1–$5
Demo / portfolio site	~100	50–200	~98%	1–4	$0.08–$0.48	~$3–$15
Reference site (pre-generated)	~500	200–1,000	~99.5%	1–5	$0.08–$0.60	~$3–$18
Medium traffic site, many URLs	~5,000	1,000–5,000	~90%	100–500	$8–$60	$240–$1,800
High traffic, dynamic URLs	Unbounded	10,000+	<50%	5,000+	$400+	$12,000+

Reality check: A site with even modest traffic and a large URL space (e.g., generated path parameters, user-specific pages, infinite scroll, search result pages) can run up hundreds of dollars per day. Rabbithole has no built-in rate limiting or spending cap by default. You can burn through your Anthropic API budget in hours if you don't constrain your URL space.

Cost Control Strategies

1. Use the cheapest model that meets your quality bar

Use Haiku for lightweight, high-volume workloads; Sonnet or Opus only when advanced reasoning or coding is required. For most Rabbithole use cases — generating informational pages, blog posts, documentation — Haiku 4.5 produces acceptable output at roughly 1/5th the cost of Sonnet. Test Haiku first before upgrading.

2. Disable web tools unless necessary

Set web_tools = false in config.toml. Web tool calls are the largest per-page cost multiplier. If your site generates fictional content, static documentation, or creative writing, there is no need for live web searches.

3. Pre-generate important pages at deploy time

Use the rabbithole crawl CLI command to pre-warm your cache before going live. Walk your link graph starting from the homepage and generate all reachable pages. When real users arrive, every page will be a cache hit at $0.00. See Deployment for a pre-generation script.

4. Constrain your URL space

Every distinct URL Rabbithole has never seen before is a billable event. Do not allow user-controlled URL parameters to reach the Rabbithole handler. Use a reverse proxy (nginx, Caddy) to block arbitrary paths, limit URL patterns, and prevent URL enumeration attacks that could generate thousands of API calls.

5. Set Anthropic spend limits

In your Anthropic Console, configure a monthly budget cap and configure alerts. Rabbithole itself does not enforce spending limits. A hard cap at Anthropic's side is your safety net. Do not deploy without one.

6. Shorten system prompts

Trim unnecessary context, system instructions, or verbose text — every token adds cost. Review the default system prompt in src/prompt.rs and strip anything that doesn't add value for your specific use case. A 500-token reduction in the system prompt, multiplied across thousands of page generations, adds up quickly.

7. Use the Batch API for pre-generation

Combining prompt caching and the batch API can reduce costs by up to 95%. The Anthropic Batch API offers a 50% discount on all token pricing for asynchronous workloads. If you are pre-generating a large site offline (not serving live requests), batching saves significant money. Batch support is not built into Rabbithole's live server mode but can be used in scripted crawls.

8. Cache aggressively, back up your database

Your SQLite database is your primary cost-saving asset. Back it up regularly. Do not run DELETE FROM pages in production unless you are prepared to pay regeneration costs. Consider mounting rabbithole.db on persistent cloud storage.

Tip: Many teams use Haiku for lightweight queries and Sonnet/Opus for more complex reasoning — balancing cost and performance. In Rabbithole's context, you could route simple pages (indexes, stubs) to Haiku and complex, tool-enriched pages (articles, data pages) to Sonnet. This per-route model selection is not built in but can be implemented by running multiple Rabbithole instances with different configs behind a reverse proxy.

When NOT to Use Rabbithole

To be direct: Rabbithole is a novelty and a research tool. It is not a production CMS for high-traffic sites. The following use cases are poor fits:

Any site expecting >10,000 unique page views/month on uncached URLs. Monthly costs will exceed what you'd pay for a conventional static site generator (which costs nothing per page view).
E-commerce product pages. Product catalog URLs are typically unique per item. A 50,000-product catalog could cost $4,000+ just to generate once.
User-generated content platforms. If users can create pages, they can also generate unbounded API costs. This is a denial-of-wallet attack surface.
News or real-time data sites requiring sub-second response. Even cached pages may have occasional cold starts; non-cached pages take seconds.
Any site where you need deterministic, auditable content. LLM output is non-deterministic. Clearing the cache and re-generating may produce different pages.

Good fits: personal experiments, demos, low-traffic hobbyist sites, AI research projects, small creative projects where the novelty of live generation is the point.

Monitoring Your Spend

Rabbithole logs each API call to stdout including token counts and estimated cost (if log_cost = true in config). For aggregate tracking:

Use the Anthropic Console usage dashboard to see real-time token consumption by day.
Set email alerts at 50% and 90% of your monthly budget.
Query your SQLite cache to count total cached pages: SELECT COUNT(*) FROM pages;
Monitor your server logs for 404s or unusual URL patterns that may indicate crawlers or bots triggering generation.
Consider adding a robots.txt served statically (bypass Rabbithole entirely) to discourage aggressive crawlers.

# Example: query your cache to estimate remaining generation surface area
$ sqlite3 rabbithole.db "SELECT COUNT(*) FROM pages;"
142

# Compare to total mapped URLs to estimate % cached
$ sqlite3 rabbithole.db "SELECT COUNT(*) FROM url_mappings;"
387

# ~37% of mapped URLs have been generated (= potential cost exposure)

Rabbithole — Cost Management

How Billing Works

Model Tiers & Per-Page Cost

Caching Behavior

Web Tools Cost Multiplier

Cost by Traffic Scenario

Cost Control Strategies

1. Use the cheapest model that meets your quality bar

2. Disable web tools unless necessary

3. Pre-generate important pages at deploy time

4. Constrain your URL space

5. Set Anthropic spend limits

6. Shorten system prompts

7. Use the Batch API for pre-generation

8. Cache aggressively, back up your database

When NOT to Use Rabbithole

Monitoring Your Spend