rabbithole / docs / architecture

Architecture

Internal design of the Rabbithole on-demand website generation engine — Rust + Actix-web + Anthropic Claude
Contents

1. Overview

Rabbithole is a Rust HTTP server that generates every page of a website lazily — on first access — by calling the Anthropic Claude API. Pages are cached after generation; repeat visitors are served the cached HTML. The server is single-binary, configured entirely via CLI flags, and has no required external dependencies beyond an Anthropic API key.

The high-level data flow is:

Browser GET /some/path
    │
    ├─ URL cached? ──YES──► Serve cached HTML immediately
    │
    └─ NO
        │
        ├─ Spawn async task: call Claude API (streaming SSE)
        │       │
        │       ├─ Tool use rounds (web_search / web_fetch, up to 10)
        │       │
        │       └─ Parse ---MAPPINGS--- delimiter from response
        │               │
        │               ├─ Store HTML + mappings in Store
        │               └─ Store child URL→prompt mappings
        │
        └─ Return loading page immediately
            │
            └─ Browser JS polls /__ready?url=/some/path every 1s
                    │
                    └─ /__ready returns 200 → browser redirects to final page

2. The Store Trait

All persistent state is accessed through a Rust trait called Store. There are two implementations, selectable at startup.

The Store Trait Interface

pub trait Store: Send + Sync {
    fn get_page(&self, url: &str) -> Option<PageRecord>;
    fn set_page(&self, url: &str, record: PageRecord);
    fn get_prompt(&self, url: &str) -> Option<String>;
    fn set_prompt(&self, url: &str, prompt: String);
    fn is_generating(&self, url: &str) -> bool;
    fn set_generating(&self, url: &str, flag: bool);
}

PageRecord Fields

FieldTypeDescription
htmlStringThe generated HTML content for the page
promptStringThe prompt that was used to generate this page
depthu32Generation depth (seed = 1, each linked page = parent depth + 1)
input_tokensu64Input token count reported by Anthropic for this page
output_tokensu64Output token count reported by Anthropic for this page
cost_usdf64Estimated API cost in USD for this page's generation
api_roundsu32Number of tool-use round-trips taken to generate this page
gen_time_msu64Wall-clock milliseconds from request to finished HTML

MemoryStore (default)

The default in-memory implementation uses a HashMap<String, PageRecord> wrapped in an Arc<RwLock<...>>. It is fast and zero-configuration. All data is lost when the server process exits. Suitable for local development and ephemeral deployments.

# Default: in-memory
rabbithole --seed "My website about Rust" --seed-prompt "Homepage of..."

SqliteStore (--db flag)

Passing --db path/to/site.db enables the SQLite-backed store. It uses rusqlite to maintain a persistent pages table. The database is created and schema-migrated automatically on first run. Generated pages survive server restarts, making this the recommended mode for any public deployment.

# Persistent SQLite store
rabbithole --seed "My website" --seed-prompt "..." --db ./site.db

The SQLite schema looks approximately like:

CREATE TABLE IF NOT EXISTS pages (
    url          TEXT PRIMARY KEY,
    html         TEXT,
    prompt       TEXT,
    depth        INTEGER,
    input_tokens INTEGER,
    output_tokens INTEGER,
    cost_usd     REAL,
    api_rounds   INTEGER,
    gen_time_ms  INTEGER
);

CREATE TABLE IF NOT EXISTS prompts (
    url    TEXT PRIMARY KEY,
    prompt TEXT
);
Note: Both MemoryStore and SqliteStore also maintain a separate "generating" flag set per URL. This prevents duplicate in-flight API calls if two users hit the same uncached URL simultaneously.

3. SSE Streaming Parser

Rabbithole uses reqwest with its streaming body feature to receive Anthropic API responses as Server-Sent Events (SSE). Rather than waiting for the entire response, the server processes the stream incrementally.

SSE Event Types Handled

Event TypeAction
content_block_startDetects whether block is text or tool_use; initializes accumulator
content_block_delta (text_delta)Appends delta text to running string buffer
content_block_delta (input_json_delta)Appends delta JSON to tool input accumulator
content_block_stopFinalizes the block; dispatches tool call or appends text
message_deltaCaptures stop_reason (end_turn vs tool_use) and usage tokens
message_stopSignals end of this API round; triggers tool execution or returns final text

Stream Processing Loop (simplified)

let mut stream = response.bytes_stream();
let mut text_buf = String::new();
let mut tool_calls: Vec<ToolCall> = Vec::new();
let mut current_block: Option<BlockAccumulator> = None;

while let Some(chunk) = stream.next().await {
    let bytes = chunk?;
    // SSE lines are "data: {...}" or "event: ..."
    for line in bytes_to_lines(&bytes) {
        if let Some(json) = line.strip_prefix("data: ") {
            let event: SseEvent = serde_json::from_str(json)?;
            match event.event_type.as_str() {
                "content_block_start" => { /* init block */ }
                "content_block_delta" => { /* append delta */ }
                "content_block_stop"  => { /* finalize */ }
                "message_delta"       => { /* capture stop_reason + usage */ }
                _ => {}
            }
        }
    }
}

This approach means that even very long responses (full HTML pages with embedded CSS and JS) are received and assembled without loading the entire payload into memory before parsing begins. Token usage and cost metadata are captured from the message_delta event's usage field at stream end.

4. Tool Use Loop

The tool use loop is the mechanism that allows Claude to make real web searches and page fetches while generating each page. It runs for at most 10 rounds per page generation.

Loop Structure

let mut messages: Vec<Message> = vec![initial_user_message];
let mut round = 0;
let max_rounds = 10;

loop {
    round += 1;
    let response = call_anthropic_streaming(&messages).await?;

    if response.stop_reason == "end_turn" {
        // Model produced final text — done
        return Ok(response.text);
    }

    if round >= max_rounds || response.stop_reason != "tool_use" {
        // Hit round limit or unexpected stop — return whatever text we have
        return Ok(response.text);
    }

    // Execute tool calls
    let mut tool_results = Vec::new();
    for tool_call in &response.tool_calls {
        let result = match tool_call.name.as_str() {
            "web_search" => execute_web_search(&tool_call.input).await,
            "web_fetch"  => execute_web_fetch(&tool_call.input).await,
            _            => Err("unknown tool".into()),
        };
        tool_results.push(ToolResult {
            tool_use_id: tool_call.id.clone(),
            content: result.unwrap_or_else(|e| format!("Error: {}", e)),
        });
    }

    // Append assistant message + tool results as new user message
    messages.push(Message::assistant(response.content_blocks));
    messages.push(Message::user_tool_results(tool_results));
}

Registered Tools

Tool NameInput SchemaDescription
web_search{ query: string }Performs a web search; returns a list of result snippets with titles and URLs
web_fetch{ url: string }Fetches a URL and returns the page body as plain text (HTML stripped)

Both tools are defined in the Anthropic tool schema format and passed in every API request. The model decides whether and when to call them. See Web Tools for details on tool implementation.

5. The ---MAPPINGS--- Format

Rabbithole's protocol for getting both HTML content and child page definitions from a single model response is a delimiter-based text format. The model is instructed via system prompt to always produce output in this exact structure:

<!DOCTYPE html>
<html>
  ... complete HTML page ...
</html>
---MAPPINGS---
[
  {"url": "/about.html", "prompt": "An about page for..."},
  {"url": "/docs/guide.html", "prompt": "Documentation for..."}
]

Server-Side Parsing

fn parse_response(raw: &str) -> Result<(String, Vec<Mapping>)> {
    const DELIM: &str = "---MAPPINGS---";
    if let Some(idx) = raw.find(DELIM) {
        let html = raw[..idx].trim().to_string();
        let json_str = raw[idx + DELIM.len()..].trim();
        let mappings: Vec<Mapping> = serde_json::from_str(json_str)
            .unwrap_or_default(); // graceful fallback to empty
        Ok((html, mappings))
    } else {
        // No delimiter found — treat entire output as HTML, no child mappings
        Ok((raw.trim().to_string(), vec![]))
    }
}

After parsing, each (url, prompt) pair is written to the Store's prompts table. When a browser later visits one of those URLs, the server looks up the stored prompt and uses it to generate that page. The HTML portion is served to the client.

Design rationale: Putting HTML first ensures that even a truncated response (e.g. token limit hit) still yields a valid, serveable HTML document. The mappings section is optional and gracefully degraded.

6. Depth Tracking

Depth limits prevent infinite recursive generation. The seed URL is assigned depth 1. Every URL registered via a page's ---MAPPINGS--- section is assigned parent_depth + 1.

Depth Limit Enforcement

// In the system prompt, when depth == max_depth:
"You are at the maximum depth limit. Generate the HTML page normally,
but output an empty mappings array: ---MAPPINGS---\n[]"

// In the server:
fn build_system_prompt(depth: u32, max_depth: u32) -> String {
    let depth_instruction = if depth >= max_depth {
        "IMPORTANT: Output ---MAPPINGS---\n[] (empty array). \
         Do not generate any child page links."
    } else {
        "Generate 5–10 links to child pages with full prompt context."
    };
    format!("{BASE_SYSTEM_PROMPT}\n\n{depth_instruction}")
}
FlagDefaultDescription
--max-depth5Maximum depth at which child URLs are registered. Pages at this depth still generate HTML but produce no further mappings.

Depth is stored in PageRecord and surfaced in the browser console debug output so developers can see how deep into the tree each page sits.

7. Loading Page & /__ready Polling

Because page generation takes 5–30 seconds (depending on tool use rounds), the server cannot hold the HTTP connection open for the full duration (this would risk browser timeouts and makes the UX feel broken). Instead, Rabbithole uses an immediate loading page + polling pattern.

First Visit to an Uncached URL

  1. Server checks Store — URL not cached, not currently generating.
  2. Server sets is_generating = true in Store.
  3. Server spawns an async Tokio task to call the Anthropic API.
  4. Server immediately returns a minimal HTML loading page with status 200.

The Loading Page

<!DOCTYPE html>
<html>
<head><title>Generating...</title>
<script>
  (function() {
    var path = encodeURIComponent(window.location.pathname);
    function poll() {
      fetch("/__ready?url=" + path)
        .then(function(r) {
          if (r.status === 200) {
            window.location.reload();
          } else {
            setTimeout(poll, 1000);
          }
        })
        .catch(function() { setTimeout(poll, 1000); });
    }
    setTimeout(poll, 1000);
  })();
</script>
</head>
<body>
  <p>Generating page, please wait...</p>
</body>
</html>

The /__ready Endpoint

GET /__ready?url=/some/path

// Returns 202 if still generating
// Returns 200 if page is cached and ready
// Returns 500 if generation failed

When the async generation task completes, it writes the result to the Store and sets is_generating = false. The next poll from the browser's JS hits /__ready, gets a 200 back, and the browser calls window.location.reload(). On the reload, the URL is cached and the full HTML is served instantly.

8. System Prompt Design

The system prompt is the most critical piece of the architecture. Because each page is generated in complete isolation — the generator for /docs/api.html has no memory of /index.html — the system prompt must communicate everything the model needs to know to produce coherent, well-linked pages.

System Prompt Sections

SectionPurpose
Role descriptionTells the model it is "Rabbithole", an AI web page generator that builds entire websites one page at a time
How the system worksExplains the isolation model — each page generated separately, only the prompt carries context
Output formatSpecifies ---MAPPINGS--- delimiter, JSON array format, absolute paths requirement
Link density guidanceInstructs model to generate 5–10 local links per page minimum
Prompt quality requirementsExplains that each child prompt must include: full site context, visual style, recurring characters/lore, specific page content, constraints
Inline CSS/JS requirementAll CSS in <style> tags, all JS in <script> tags — no external dependencies
Depth instructionDynamic section — either "generate mappings" or "output empty mappings" depending on current depth vs. max depth
Tool use guidanceInstructions on when/how to use web_search and web_fetch to enrich content

Prompt Isolation Problem

The fundamental challenge is that a page like /characters/villain.html only receives a prompt string — no HTML from the homepage, no CSS from a stylesheet, nothing. The prompt must encode the full visual design system, all character names, the site's color scheme, nav bar structure, tone, and content. Short prompts produce disconnected-looking pages; rich prompts produce coherent sites.

// Poor child prompt (produces disconnected page):
{"url": "/about.html", "prompt": "About page"}

// Good child prompt (produces coherent page):
{"url": "/about.html", "prompt": "About page for GalactiCorp Industries,
a sci-fi corporate satire site. Dark theme: #1a1a2e background, #e94560
accent, Orbitron font headers. Nav: Home | Products | About | Contact.
Describes founding in 2157 by CEO Zara Voss. Tone: dry corporate humor.
Same layout as homepage with sidebar widgets."}

9. Debug Script Injection

After every successful page generation, the server post-processes the HTML to inject a small diagnostic <script> block just before the closing </body> tag. This script logs generation metadata to the browser's developer console.

<script>
/* rabbithole debug */
console.group("rabbithole: /docs/architecture.html");
console.log("prompt:", "Architecture deep-dive for Rabbithole...");
console.log("depth:", 2);
console.log("input_tokens:", 8432);
console.log("output_tokens:", 3187);
console.log("cost_usd:", 0.04821);
console.log("api_rounds:", 3);
console.log("gen_time_ms:", 12847);
console.groupEnd();
</script>

This is injected server-side as a simple string operation — finding the last occurrence of </body> and inserting before it. The script block is harmless to page rendering and invisible unless the developer opens the console. It is particularly useful when debugging why a page looks different from expected or understanding how many tool rounds were needed.

10. Atomic Cost Tracker

To prevent runaway API spend, Rabbithole maintains a global atomic cost accumulator in u64 (storing cost in microdollars — i.e. 1 USD = 1,000,000 units) using Rust's std::sync::atomic::AtomicU64.

static TOTAL_COST_MICRODOLLARS: AtomicU64 = AtomicU64::new(0);

fn record_cost(cost_usd: f64) {
    let microdollars = (cost_usd * 1_000_000.0) as u64;
    TOTAL_COST_MICRODOLLARS.fetch_add(microdollars, Ordering::Relaxed);
}

fn get_total_cost_usd() -> f64 {
    TOTAL_COST_MICRODOLLARS.load(Ordering::Relaxed) as f64 / 1_000_000.0
}

Cost Limit Enforcement

// On each new uncached URL request:
if get_total_cost_usd() > config.max_cost {
    // Redirect to /404 instead of generating
    return HttpResponse::Found()
        .insert_header(("Location", "/404.html"))
        .finish();
}
FlagDefaultBehavior when exceeded
--max-costno limitNew uncached URLs return a redirect to /404.html instead of generating

The atomic approach is intentional: multiple concurrent page generations may record costs simultaneously, and using Ordering::Relaxed is sufficient here because the cost check is a soft cap, not a hard financial guarantee. Slight over-spend of one page's worth is acceptable.

11. Retry Logic

The model occasionally produces output that either lacks the <!DOCTYPE html> declaration, contains garbled JSON in the mappings section, or terminates unexpectedly. Rabbithole retries generation up to 3 times before giving up and returning an error page.

async fn generate_with_retry(
    prompt: &str,
    depth: u32,
    config: &Config,
) -> Result<(String, Vec<Mapping>)> {
    let max_retries = 3;
    for attempt in 1..=max_retries {
        match generate_page(prompt, depth, config).await {
            Ok((html, mappings)) => {
                if html.trim_start().starts_with("<!DOCTYPE") ||
                   html.trim_start().starts_with("<html") {
                    return Ok((html, mappings));
                }
                eprintln!("Attempt {attempt}: output not valid HTML, retrying...");
            }
            Err(e) => {
                eprintln!("Attempt {attempt}: API error: {e}, retrying...");
            }
        }
    }
    // All retries exhausted — return a minimal error page
    Ok((error_page(prompt), vec![]))
}

Validation checks on each attempt:

12. Full Request Flow

Putting it all together, here is the complete sequence for a first-time page visit:

1.  Browser:  GET /wiki/history.html
2.  Server:   Check Store.get_page("/wiki/history.html") → None
3.  Server:   Check Store.is_generating("/wiki/history.html") → false
4.  Server:   Store.set_generating("/wiki/history.html", true)
5.  Server:   Look up Store.get_prompt("/wiki/history.html") → "Wiki history page for..."
6.  Server:   Spawn tokio::task::spawn(async { generate("/wiki/history.html", prompt, depth) })
7.  Server:   Return 200 with loading page HTML immediately
8.  Browser:  Renders "Generating..." page, JS starts polling
9.  Task:     Check cost limit → OK
10. Task:     Build messages = [system_prompt, user_message(prompt)]
11. Task:     POST to Anthropic API (streaming SSE)
12. Task:     SSE parser receives stream chunks
13. Task:     Model calls web_search("wiki history...") → tool_use stop
14. Task:     Execute web_search, get results
15. Task:     Append tool result, POST again (round 2)
16. Task:     Model calls web_fetch("https://...") → tool_use stop
17. Task:     Execute web_fetch, get page body
18. Task:     Append tool result, POST again (round 3)
19. Task:     Model produces final HTML with ---MAPPINGS--- → end_turn
20. Task:     Parse HTML and mappings
21. Task:     Validate HTML (starts with DOCTYPE) → OK
22. Task:     Inject debug <script> block into HTML
23. Task:     Store.set_page("/wiki/history.html", PageRecord{html, cost, tokens, ...})
24. Task:     For each mapping: Store.set_prompt(url, prompt)
25. Task:     Store.set_generating("/wiki/history.html", false)
26. Task:     record_cost(cost_usd)
27. Browser:  /__ready?url=/wiki/history.html → 200
28. Browser:  window.location.reload()
29. Browser:  GET /wiki/history.html (again)
30. Server:   Store.get_page → cached HTML
31. Server:   Return 200 with full HTML page

13. Crate Dependencies

CrateVersionRole
actix-web4.xHTTP server, routing, request/response handling
reqwest0.11+Async HTTP client for Anthropic API calls; streaming SSE via bytes_stream()
rusqlite0.31+SQLite bindings for SqliteStore; zero-config, embedded database
clap4.xCLI argument parsing; --db, --max-cost, --max-depth, --port, etc.
serde / serde_json1.xJSON serialization for Anthropic API requests/responses, mappings parsing
tokio1.xAsync runtime; background task spawning for page generation
futures / futures-util0.3StreamExt for async iteration over SSE byte chunks

The full Cargo.toml is visible in the GitHub repository.


See Also