Rabbithole is a Rust HTTP server that generates every page of a website lazily — on first access — by calling the Anthropic Claude API. Pages are cached after generation; repeat visitors are served the cached HTML. The server is single-binary, configured entirely via CLI flags, and has no required external dependencies beyond an Anthropic API key.
The high-level data flow is:
Browser GET /some/path
│
├─ URL cached? ──YES──► Serve cached HTML immediately
│
└─ NO
│
├─ Spawn async task: call Claude API (streaming SSE)
│ │
│ ├─ Tool use rounds (web_search / web_fetch, up to 10)
│ │
│ └─ Parse ---MAPPINGS--- delimiter from response
│ │
│ ├─ Store HTML + mappings in Store
│ └─ Store child URL→prompt mappings
│
└─ Return loading page immediately
│
└─ Browser JS polls /__ready?url=/some/path every 1s
│
└─ /__ready returns 200 → browser redirects to final page
All persistent state is accessed through a Rust trait called Store. There are two implementations, selectable at startup.
pub trait Store: Send + Sync {
fn get_page(&self, url: &str) -> Option<PageRecord>;
fn set_page(&self, url: &str, record: PageRecord);
fn get_prompt(&self, url: &str) -> Option<String>;
fn set_prompt(&self, url: &str, prompt: String);
fn is_generating(&self, url: &str) -> bool;
fn set_generating(&self, url: &str, flag: bool);
}
| Field | Type | Description |
|---|---|---|
html | String | The generated HTML content for the page |
prompt | String | The prompt that was used to generate this page |
depth | u32 | Generation depth (seed = 1, each linked page = parent depth + 1) |
input_tokens | u64 | Input token count reported by Anthropic for this page |
output_tokens | u64 | Output token count reported by Anthropic for this page |
cost_usd | f64 | Estimated API cost in USD for this page's generation |
api_rounds | u32 | Number of tool-use round-trips taken to generate this page |
gen_time_ms | u64 | Wall-clock milliseconds from request to finished HTML |
The default in-memory implementation uses a HashMap<String, PageRecord> wrapped in an Arc<RwLock<...>>. It is fast and zero-configuration. All data is lost when the server process exits. Suitable for local development and ephemeral deployments.
# Default: in-memory rabbithole --seed "My website about Rust" --seed-prompt "Homepage of..."
Passing --db path/to/site.db enables the SQLite-backed store. It uses rusqlite to maintain a persistent pages table. The database is created and schema-migrated automatically on first run. Generated pages survive server restarts, making this the recommended mode for any public deployment.
# Persistent SQLite store rabbithole --seed "My website" --seed-prompt "..." --db ./site.db
The SQLite schema looks approximately like:
CREATE TABLE IF NOT EXISTS pages (
url TEXT PRIMARY KEY,
html TEXT,
prompt TEXT,
depth INTEGER,
input_tokens INTEGER,
output_tokens INTEGER,
cost_usd REAL,
api_rounds INTEGER,
gen_time_ms INTEGER
);
CREATE TABLE IF NOT EXISTS prompts (
url TEXT PRIMARY KEY,
prompt TEXT
);
MemoryStore and SqliteStore also maintain a separate "generating" flag set per URL. This prevents duplicate in-flight API calls if two users hit the same uncached URL simultaneously.Rabbithole uses reqwest with its streaming body feature to receive Anthropic API responses as Server-Sent Events (SSE). Rather than waiting for the entire response, the server processes the stream incrementally.
| Event Type | Action |
|---|---|
content_block_start | Detects whether block is text or tool_use; initializes accumulator |
content_block_delta (text_delta) | Appends delta text to running string buffer |
content_block_delta (input_json_delta) | Appends delta JSON to tool input accumulator |
content_block_stop | Finalizes the block; dispatches tool call or appends text |
message_delta | Captures stop_reason (end_turn vs tool_use) and usage tokens |
message_stop | Signals end of this API round; triggers tool execution or returns final text |
let mut stream = response.bytes_stream();
let mut text_buf = String::new();
let mut tool_calls: Vec<ToolCall> = Vec::new();
let mut current_block: Option<BlockAccumulator> = None;
while let Some(chunk) = stream.next().await {
let bytes = chunk?;
// SSE lines are "data: {...}" or "event: ..."
for line in bytes_to_lines(&bytes) {
if let Some(json) = line.strip_prefix("data: ") {
let event: SseEvent = serde_json::from_str(json)?;
match event.event_type.as_str() {
"content_block_start" => { /* init block */ }
"content_block_delta" => { /* append delta */ }
"content_block_stop" => { /* finalize */ }
"message_delta" => { /* capture stop_reason + usage */ }
_ => {}
}
}
}
}
This approach means that even very long responses (full HTML pages with embedded CSS and JS) are received and assembled without loading the entire payload into memory before parsing begins. Token usage and cost metadata are captured from the message_delta event's usage field at stream end.
The tool use loop is the mechanism that allows Claude to make real web searches and page fetches while generating each page. It runs for at most 10 rounds per page generation.
let mut messages: Vec<Message> = vec![initial_user_message];
let mut round = 0;
let max_rounds = 10;
loop {
round += 1;
let response = call_anthropic_streaming(&messages).await?;
if response.stop_reason == "end_turn" {
// Model produced final text — done
return Ok(response.text);
}
if round >= max_rounds || response.stop_reason != "tool_use" {
// Hit round limit or unexpected stop — return whatever text we have
return Ok(response.text);
}
// Execute tool calls
let mut tool_results = Vec::new();
for tool_call in &response.tool_calls {
let result = match tool_call.name.as_str() {
"web_search" => execute_web_search(&tool_call.input).await,
"web_fetch" => execute_web_fetch(&tool_call.input).await,
_ => Err("unknown tool".into()),
};
tool_results.push(ToolResult {
tool_use_id: tool_call.id.clone(),
content: result.unwrap_or_else(|e| format!("Error: {}", e)),
});
}
// Append assistant message + tool results as new user message
messages.push(Message::assistant(response.content_blocks));
messages.push(Message::user_tool_results(tool_results));
}
| Tool Name | Input Schema | Description |
|---|---|---|
web_search | { query: string } | Performs a web search; returns a list of result snippets with titles and URLs |
web_fetch | { url: string } | Fetches a URL and returns the page body as plain text (HTML stripped) |
Both tools are defined in the Anthropic tool schema format and passed in every API request. The model decides whether and when to call them. See Web Tools for details on tool implementation.
Rabbithole's protocol for getting both HTML content and child page definitions from a single model response is a delimiter-based text format. The model is instructed via system prompt to always produce output in this exact structure:
<!DOCTYPE html>
<html>
... complete HTML page ...
</html>
---MAPPINGS---
[
{"url": "/about.html", "prompt": "An about page for..."},
{"url": "/docs/guide.html", "prompt": "Documentation for..."}
]
fn parse_response(raw: &str) -> Result<(String, Vec<Mapping>)> {
const DELIM: &str = "---MAPPINGS---";
if let Some(idx) = raw.find(DELIM) {
let html = raw[..idx].trim().to_string();
let json_str = raw[idx + DELIM.len()..].trim();
let mappings: Vec<Mapping> = serde_json::from_str(json_str)
.unwrap_or_default(); // graceful fallback to empty
Ok((html, mappings))
} else {
// No delimiter found — treat entire output as HTML, no child mappings
Ok((raw.trim().to_string(), vec![]))
}
}
After parsing, each (url, prompt) pair is written to the Store's prompts table. When a browser later visits one of those URLs, the server looks up the stored prompt and uses it to generate that page. The HTML portion is served to the client.
Depth limits prevent infinite recursive generation. The seed URL is assigned depth 1. Every URL registered via a page's ---MAPPINGS--- section is assigned parent_depth + 1.
// In the system prompt, when depth == max_depth:
"You are at the maximum depth limit. Generate the HTML page normally,
but output an empty mappings array: ---MAPPINGS---\n[]"
// In the server:
fn build_system_prompt(depth: u32, max_depth: u32) -> String {
let depth_instruction = if depth >= max_depth {
"IMPORTANT: Output ---MAPPINGS---\n[] (empty array). \
Do not generate any child page links."
} else {
"Generate 5–10 links to child pages with full prompt context."
};
format!("{BASE_SYSTEM_PROMPT}\n\n{depth_instruction}")
}
| Flag | Default | Description |
|---|---|---|
--max-depth | 5 | Maximum depth at which child URLs are registered. Pages at this depth still generate HTML but produce no further mappings. |
Depth is stored in PageRecord and surfaced in the browser console debug output so developers can see how deep into the tree each page sits.
Because page generation takes 5–30 seconds (depending on tool use rounds), the server cannot hold the HTTP connection open for the full duration (this would risk browser timeouts and makes the UX feel broken). Instead, Rabbithole uses an immediate loading page + polling pattern.
is_generating = true in Store.
<!DOCTYPE html>
<html>
<head><title>Generating...</title>
<script>
(function() {
var path = encodeURIComponent(window.location.pathname);
function poll() {
fetch("/__ready?url=" + path)
.then(function(r) {
if (r.status === 200) {
window.location.reload();
} else {
setTimeout(poll, 1000);
}
})
.catch(function() { setTimeout(poll, 1000); });
}
setTimeout(poll, 1000);
})();
</script>
</head>
<body>
<p>Generating page, please wait...</p>
</body>
</html>
GET /__ready?url=/some/path // Returns 202 if still generating // Returns 200 if page is cached and ready // Returns 500 if generation failed
When the async generation task completes, it writes the result to the Store and sets is_generating = false. The next poll from the browser's JS hits /__ready, gets a 200 back, and the browser calls window.location.reload(). On the reload, the URL is cached and the full HTML is served instantly.
The system prompt is the most critical piece of the architecture. Because each page is generated in complete isolation — the generator for /docs/api.html has no memory of /index.html — the system prompt must communicate everything the model needs to know to produce coherent, well-linked pages.
| Section | Purpose |
|---|---|
| Role description | Tells the model it is "Rabbithole", an AI web page generator that builds entire websites one page at a time |
| How the system works | Explains the isolation model — each page generated separately, only the prompt carries context |
| Output format | Specifies ---MAPPINGS--- delimiter, JSON array format, absolute paths requirement |
| Link density guidance | Instructs model to generate 5–10 local links per page minimum |
| Prompt quality requirements | Explains that each child prompt must include: full site context, visual style, recurring characters/lore, specific page content, constraints |
| Inline CSS/JS requirement | All CSS in <style> tags, all JS in <script> tags — no external dependencies |
| Depth instruction | Dynamic section — either "generate mappings" or "output empty mappings" depending on current depth vs. max depth |
| Tool use guidance | Instructions on when/how to use web_search and web_fetch to enrich content |
The fundamental challenge is that a page like /characters/villain.html only receives a prompt string — no HTML from the homepage, no CSS from a stylesheet, nothing. The prompt must encode the full visual design system, all character names, the site's color scheme, nav bar structure, tone, and content. Short prompts produce disconnected-looking pages; rich prompts produce coherent sites.
// Poor child prompt (produces disconnected page):
{"url": "/about.html", "prompt": "About page"}
// Good child prompt (produces coherent page):
{"url": "/about.html", "prompt": "About page for GalactiCorp Industries,
a sci-fi corporate satire site. Dark theme: #1a1a2e background, #e94560
accent, Orbitron font headers. Nav: Home | Products | About | Contact.
Describes founding in 2157 by CEO Zara Voss. Tone: dry corporate humor.
Same layout as homepage with sidebar widgets."}
After every successful page generation, the server post-processes the HTML to inject a small diagnostic <script> block just before the closing </body> tag. This script logs generation metadata to the browser's developer console.
<script>
/* rabbithole debug */
console.group("rabbithole: /docs/architecture.html");
console.log("prompt:", "Architecture deep-dive for Rabbithole...");
console.log("depth:", 2);
console.log("input_tokens:", 8432);
console.log("output_tokens:", 3187);
console.log("cost_usd:", 0.04821);
console.log("api_rounds:", 3);
console.log("gen_time_ms:", 12847);
console.groupEnd();
</script>
This is injected server-side as a simple string operation — finding the last occurrence of </body> and inserting before it. The script block is harmless to page rendering and invisible unless the developer opens the console. It is particularly useful when debugging why a page looks different from expected or understanding how many tool rounds were needed.
To prevent runaway API spend, Rabbithole maintains a global atomic cost accumulator in u64 (storing cost in microdollars — i.e. 1 USD = 1,000,000 units) using Rust's std::sync::atomic::AtomicU64.
static TOTAL_COST_MICRODOLLARS: AtomicU64 = AtomicU64::new(0);
fn record_cost(cost_usd: f64) {
let microdollars = (cost_usd * 1_000_000.0) as u64;
TOTAL_COST_MICRODOLLARS.fetch_add(microdollars, Ordering::Relaxed);
}
fn get_total_cost_usd() -> f64 {
TOTAL_COST_MICRODOLLARS.load(Ordering::Relaxed) as f64 / 1_000_000.0
}
// On each new uncached URL request:
if get_total_cost_usd() > config.max_cost {
// Redirect to /404 instead of generating
return HttpResponse::Found()
.insert_header(("Location", "/404.html"))
.finish();
}
| Flag | Default | Behavior when exceeded |
|---|---|---|
--max-cost | no limit | New uncached URLs return a redirect to /404.html instead of generating |
The atomic approach is intentional: multiple concurrent page generations may record costs simultaneously, and using Ordering::Relaxed is sufficient here because the cost check is a soft cap, not a hard financial guarantee. Slight over-spend of one page's worth is acceptable.
The model occasionally produces output that either lacks the <!DOCTYPE html> declaration, contains garbled JSON in the mappings section, or terminates unexpectedly. Rabbithole retries generation up to 3 times before giving up and returning an error page.
async fn generate_with_retry(
prompt: &str,
depth: u32,
config: &Config,
) -> Result<(String, Vec<Mapping>)> {
let max_retries = 3;
for attempt in 1..=max_retries {
match generate_page(prompt, depth, config).await {
Ok((html, mappings)) => {
if html.trim_start().starts_with("<!DOCTYPE") ||
html.trim_start().starts_with("<html") {
return Ok((html, mappings));
}
eprintln!("Attempt {attempt}: output not valid HTML, retrying...");
}
Err(e) => {
eprintln!("Attempt {attempt}: API error: {e}, retrying...");
}
}
}
// All retries exhausted — return a minimal error page
Ok((error_page(prompt), vec![]))
}
Validation checks on each attempt:
<!DOCTYPE html> or <html (case-insensitive)---MAPPINGS--- section, if present, parses as valid JSON arrayurl (string, starts with /) and prompt (non-empty string) fieldsPutting it all together, here is the complete sequence for a first-time page visit:
1. Browser: GET /wiki/history.html
2. Server: Check Store.get_page("/wiki/history.html") → None
3. Server: Check Store.is_generating("/wiki/history.html") → false
4. Server: Store.set_generating("/wiki/history.html", true)
5. Server: Look up Store.get_prompt("/wiki/history.html") → "Wiki history page for..."
6. Server: Spawn tokio::task::spawn(async { generate("/wiki/history.html", prompt, depth) })
7. Server: Return 200 with loading page HTML immediately
8. Browser: Renders "Generating..." page, JS starts polling
9. Task: Check cost limit → OK
10. Task: Build messages = [system_prompt, user_message(prompt)]
11. Task: POST to Anthropic API (streaming SSE)
12. Task: SSE parser receives stream chunks
13. Task: Model calls web_search("wiki history...") → tool_use stop
14. Task: Execute web_search, get results
15. Task: Append tool result, POST again (round 2)
16. Task: Model calls web_fetch("https://...") → tool_use stop
17. Task: Execute web_fetch, get page body
18. Task: Append tool result, POST again (round 3)
19. Task: Model produces final HTML with ---MAPPINGS--- → end_turn
20. Task: Parse HTML and mappings
21. Task: Validate HTML (starts with DOCTYPE) → OK
22. Task: Inject debug <script> block into HTML
23. Task: Store.set_page("/wiki/history.html", PageRecord{html, cost, tokens, ...})
24. Task: For each mapping: Store.set_prompt(url, prompt)
25. Task: Store.set_generating("/wiki/history.html", false)
26. Task: record_cost(cost_usd)
27. Browser: /__ready?url=/wiki/history.html → 200
28. Browser: window.location.reload()
29. Browser: GET /wiki/history.html (again)
30. Server: Store.get_page → cached HTML
31. Server: Return 200 with full HTML page
| Crate | Version | Role |
|---|---|---|
| actix-web | 4.x | HTTP server, routing, request/response handling |
| reqwest | 0.11+ | Async HTTP client for Anthropic API calls; streaming SSE via bytes_stream() |
| rusqlite | 0.31+ | SQLite bindings for SqliteStore; zero-config, embedded database |
| clap | 4.x | CLI argument parsing; --db, --max-cost, --max-depth, --port, etc. |
| serde / serde_json | 1.x | JSON serialization for Anthropic API requests/responses, mappings parsing |
| tokio | 1.x | Async runtime; background task spawning for page generation |
| futures / futures-util | 0.3 | StreamExt for async iteration over SSE byte chunks |
The full Cargo.toml is visible in the GitHub repository.
web_search and web_fetch tool implementation details