Web Tools

Rabbithole — open-source Rust tool for on-the-fly LLM website generation.
Source: github.com/ajbt200128/rabbithole | Live demo: isarabbithole.com | This page: /docs/web-tools.html

Contents

1. Overview
2. The Two Tools
- 2a. web_search
- 2b. web_fetch
3. The Multi-Turn Tool Loop
4. Disabling Web Tools
5. When to Use vs. Disable
6. Performance & Cost
7. Example Exchange
8. Implementation Notes

1. Overview

Web tools give the LLM the ability to perform real-time internet research during page generation. They are enabled by default. When active, the model may issue tool calls that the Rabbithole server intercepts, executes server-side, and feeds back as additional context before the model produces its final HTML output.

The model never directly accesses the internet. All HTTP requests are made by the Rust server using the reqwest crate. The model only receives structured results returned by the server.

Two tools are available:

Tool name	Purpose	Input	Output
`web_search`	Search the web; get titles, URLs, and snippets	`query` (string)	Structured list: title, URL, snippet per result
`web_fetch`	Fetch and read a URL as plain text	`url` (string)	Plain text body, truncated to 50,000 characters

2. The Two Tools

2a. web_search

web_search issues a web search query and returns structured results containing the page title, URL, and a snippet of relevant content for each result. The model can use these results to:

Cite real facts, figures, and data on generated pages
Hotlink to real images found in search results (via <img src="...">)
Find authoritative sources and link to them
Verify current information (prices, versions, events) that may have changed since training

The search is performed server-side. The model receives a JSON-like array of result objects. It does not see raw HTML or the full content of any result page — only the snippet. To read a full page, the model must follow up with web_fetch.

2b. web_fetch

web_fetch takes a URL, fetches it via reqwest, strips all HTML tags to produce plain text, then truncates the result at 50,000 characters to fit within the model's context window. The plain text is returned to the model as a tool result.

Typical uses:

Reading Wikipedia articles for factual content
Fetching official documentation pages (e.g., to summarize an API)
Extracting reference material to embed in a generated page
Following up on a URL found via web_search

Note: The 50,000 character truncation is intentional. Very large pages (e.g., long Wikipedia articles or large source files) are cut off at that limit. The model only receives the first 50k characters. If a needed section appears late in a document, consider whether web_search with a more targeted query might be more effective.

3. The Multi-Turn Tool Loop

Web tool use is implemented as a multi-turn conversation loop inside the Rabbithole server. The server runs up to 10 rounds per page generation. Each round follows this sequence:

The model produces a response. If the response contains a tool call, proceed to step 2. If the response is plain text (the final HTML), exit the loop.
The server parses the tool call (tool name + arguments).
The server executes the tool call (performs the HTTP request via reqwest).
The result is appended to the conversation as a user message containing the tool output.
The model is called again with the updated conversation. Return to step 1.

The loop exits when either: (a) the model produces a final text response with no further tool calls, or (b) the 10-round maximum is reached, at which point the server uses whatever text the model last produced.

In practice most pages use between 0 and 5 tool call rounds. The 10-round cap prevents runaway loops in pathological cases.

Round 1:  model → [tool_call: web_search("rust reqwest HTTP client")]
          server executes → returns 10 search results
          server appends result as user message

Round 2:  model → [tool_call: web_fetch("https://docs.rs/reqwest/latest/reqwest/")]
          server executes → returns plain text, truncated at 50k chars
          server appends result as user message

Round 3:  model → <!DOCTYPE html>...</html>    ← final output, loop exits

4. Disabling Web Tools

Pass the --no-web-tools flag to disable both web_search and web_fetch entirely:

rabbithole --no-web-tools

When this flag is set, the server does not register either tool with the model. The model generates all pages purely from its training knowledge and the prompt. No HTTP requests are made during generation.

This is a global flag — it cannot be toggled per-page or per-request. To enable tools for some pages and not others, you would need to run two separate server instances.

5. When to Use vs. Disable

Scenario	Recommendation	Reason
Documentation site (e.g., this site)	Leave enabled	Real API docs, version numbers, links, and facts improve accuracy
News or current-events site	Leave enabled	Model knowledge cutoff makes recent facts unreliable without search
Fictional / creative site (e.g., ACAPA demo, CGPA demo)	Consider disabling	No real-world facts needed; disabling reduces latency and cost
Static portfolio or marketing site	Either	Depends on whether you want to pull in real external content
High-throughput / low-latency deployment	Disable	Each tool call round adds 2–5 seconds; disabling can halve total latency
Restricted network environment	Disable	Avoids outbound HTTP calls from the server process entirely

6. Performance & Cost

Each tool call round has two components of overhead:

Latency: Each round requires a full round-trip to the LLM API plus the time to execute the tool (the HTTP request). In practice, each round adds roughly 2–5 seconds. A page that makes 3 rounds of tool calls might take 15–20 seconds total to generate, versus 5–8 seconds with no tool calls.
Token cost: Tool results are fed back into the conversation as additional tokens. A large web_fetch result (up to 50,000 characters ≈ ~12,000–15,000 tokens) will significantly increase the token count for that generation and thus the API cost. web_search results are much smaller (typically a few hundred tokens per search call).

Cost warning: If a page generation hits the 10-round limit and each round includes a large web_fetch, the token count for that request could become very large. Monitor costs when running in production with large numbers of page requests.

Rough estimates (varies by model and provider):

Configuration	Approx. time per page	Approx. token count
No tool calls (or `--no-web-tools`)	5–8 sec	2,000–5,000
1–2 `web_search` calls	8–12 sec	3,000–7,000
1 `web_fetch` + 1–2 `web_search`	12–18 sec	10,000–20,000
3+ rounds mixed	15–25 sec	15,000–40,000+

7. Example Tool Call Exchange

Below is a simplified representation of the message exchange that occurs inside the Rabbithole server during a single page generation that uses web tools. This is what the server constructs and passes to the LLM API internally — it is not user-facing.

// ── Initial request ────────────────────────────────────────────────────────
{
  "role": "user",
  "content": "Generate an HTML page about the reqwest Rust crate. ..."
}

// ── Round 1: model requests a tool call ────────────────────────────────────
{
  "role": "assistant",
  "content": null,
  "tool_calls": [{
    "id": "call_abc123",
    "type": "function",
    "function": {
      "name": "web_search",
      "arguments": "{\"query\": \"reqwest Rust HTTP client crate docs\"}"
    }
  }]
}

// ── Server executes web_search, returns results ───────────────────────────
{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "[{\"title\": \"seanmonstar/reqwest\", \"url\": \"https://github.com/seanmonstar/reqwest\",
               \"snippet\": \"An ergonomic, batteries-included HTTP Client for Rust.\"},
              {\"title\": \"reqwest - docs.rs\", \"url\": \"https://docs.rs/reqwest\",
               \"snippet\": \"reqwest 0.12.x — async and blocking HTTP client for Rust\"},
              ...]"
}

// ── Round 2: model requests another tool call ─────────────────────────────
{
  "role": "assistant",
  "tool_calls": [{
    "id": "call_def456",
    "function": {
      "name": "web_fetch",
      "arguments": "{\"url\": \"https://docs.rs/reqwest/latest/reqwest/\"}"
    }
  }]
}

// ── Server fetches URL, strips HTML, truncates at 50,000 chars ───────────
{
  "role": "tool",
  "tool_call_id": "call_def456",
  "content": "reqwest\nAn ergonomic, batteries-included HTTP Client for Rust.\n\nFeatures\n- async/await\n- TLS\n- ..."
  // (truncated at 50,000 chars)
}

// ── Round 3: model produces final output ──────────────────────────────────
{
  "role": "assistant",
  "content": "<!DOCTYPE html>\n<html>\n..."
}
// Loop exits. Server returns HTML to the cache and serves it.

8. Implementation Notes

A few details about how the tools are implemented server-side in Rust:

HTTP client: Both tools use the reqwest crate, which is the most popular async HTTP client for Rust. It is built on top of hyper and supports TLS, connection pooling, and async/await via Tokio.
HTML stripping in web_fetch: After fetching the raw HTTP response body, the server strips all HTML tags using a regex or simple parser, leaving only the text content of the page. This avoids feeding raw HTML (with <script>, <style>, attributes, etc.) into the model's context, which would waste tokens and reduce signal quality.
Truncation: The plain text result of web_fetch is truncated at 50,000 characters. This is a hard limit applied before the content is appended to the conversation.
No caching of tool results: Tool results are not cached between page generations. Each page generation that calls web_fetch or web_search makes fresh HTTP requests.
Tool definitions: The tools are registered with the LLM API as standard function-calling tool definitions (JSON schema). The server parses the model's tool_calls response field and dispatches to the appropriate Rust function.
Error handling: If a tool call fails (network error, non-200 response, timeout), the server returns an error string as the tool result. The model may choose to retry with a different URL, skip the tool, or proceed with available information.