Best practices for writing page prompts and system prompts that guide the LLM to use web search and fetch tools effectively.
When a Rabbithole instance is configured with web tools (search and/or fetch), the LLM generating each page can access live internet data. This is powerful, but it requires deliberate prompting: without guidance, the model may ignore its tools, overuse them, or use them at the wrong time. This page covers strategies for getting reliable, cost-effective, and well-cited tool use in generated pages.
The model has a large built-in knowledge base. Not every page needs a web search. Unnecessary searches slow down page generation and increase API costs. The key question is: does this page need information that changes over time or that the model is unlikely to know precisely?
| Use web search when… | Rely on model knowledge when… |
|---|---|
| Content depends on current data (prices, versions, statistics, news) | Content is stable and general (explanations, history, concepts) |
| You need a specific fact the model might hallucinate (e.g., exact release dates, changelogs) | The page is primarily creative, structural, or UI-focused |
| You want to hotlink real images from authoritative sources | Placeholder or illustrative images are acceptable |
| The prompt references a specific project, person, or event post-2023 | The topic predates the model's knowledge cutoff and is unlikely to have changed |
| You want citations with real URLs for credibility | The page is internal documentation that does not cite external sources |
The simplest heuristic: if you would open a browser to verify it before publishing, include a search instruction in the prompt.
The model will not always search on its own. To reliably trigger a search, use direct imperative language in your page prompt:
Search for the latest stable release of PostgreSQL and include the version number and release date on this page.
Use web search to find the current price of a DigitalOcean Droplet (1 vCPU, 1 GB RAM) and display it in the pricing table.
Show the latest PostgreSQL version.
The "bad" example may cause the model to output a version it memorized during training, which could be stale. The "good" example forces a live lookup.
Useful imperative phrases:
Search for …Use web search to find …Look up the current … and display itFind a real image of … and hotlink itVerify … with a web search before including itWhen you need data from a known authoritative source, name it explicitly. The web fetch tool retrieves the plain-text content of a page; this is ideal for reading documentation, changelogs, or structured data that the model can then reformat into HTML.
Fetch https://www.rust-lang.org/en-US/what-is-rust.html and summarize the key points in a bulleted list on this page.
Fetch the Rabbithole README from https://raw.githubusercontent.com/ajbt200128/rabbithole/main/README.md and use it as the source of truth for all feature descriptions on this page.
Search for the nginx documentation domain, then fetch the page for the "proxy_pass" directive and include a usage example from it.
Include information from the Rust website about memory safety.
Naming the exact URL removes ambiguity and avoids the model inventing a plausible-looking but incorrect URL. For GitHub repositories, the raw content URL (e.g., https://raw.githubusercontent.com/…/README.md) is usually cleaner to fetch than the rendered HTML page.
If you don't know the exact URL, you can specify a domain to constrain the search:
Search docs.rs for the Tokio crate's async_std compatibility notes and display the relevant section.
Find the GitHub releases page for ajbt200128/rabbithole and display the most recent tag.
By default the model may incorporate fetched information without attributing it. For pages where credibility matters, explicitly require citations:
For every statistic or version number you include, add a visible citation with the source URL as a hyperlink immediately after the claim, in the format: [Source: <url>]
After each paragraph that uses information from a web search or fetch, include a <p><small>Source: <a href="...">...</a></small></p> element.
Specifying the format of citations in the prompt makes them consistent across pages. If you want footnote-style citations, say so; if you want inline parenthetical links, say that instead.
Citation strategies to include in your prompts:
<a href="SOURCE_URL"> tag<ol> at the bottom of the pagetitle attribute on a superscript to show the source on hoverEach tool call adds round-trip latency and, depending on your LLM provider, API cost. A page that makes five searches before generating HTML may take 15–30 seconds to load. Rabbithole caches generated pages permanently, so the cost is paid once — but it still matters for first-load UX and for sites that generate many pages dynamically (e.g., via MAPPINGS links).
You can cap tool usage directly in the prompt:
Use at most 2 web searches to gather information for this page. Do not search for information you already know with confidence.
Make one fetch call to retrieve the changelog, then generate the page without any additional tool calls.
You can also set a global cap in the system prompt that applies to all pages:
# System Prompt (rabbithole.toml → llm.system_prompt)
You generate HTML pages. You have access to web_search and web_fetch tools.
Use them only when the page prompt explicitly asks for live data or when
information is likely to be out of date. Never make more than 3 tool calls
per page. If you have already found sufficient information, stop searching
and generate the HTML immediately.
A budget in the system prompt sets a floor; per-page prompt overrides can tighten it further for specific pages (e.g., a simple "About" page should never search at all).
| Page type | Recommended max tool calls | Notes |
|---|---|---|
| Static content (About, Contact, FAQ) | 0 | No live data needed; prohibit searches explicitly if desired |
| Documentation page referencing one library | 1–2 | One fetch for the README or changelog is usually sufficient |
| News / current events summary | 2–4 | Multiple searches may be needed for breadth; set a hard cap |
| Product comparison / pricing page | 1 per product | State the cap as "one search per vendor, max 5 total" |
| Image gallery with hotlinked photos | 1–2 | One broad image search is usually enough for 4–8 images |
Every Rabbithole page generation ends with a ---MAPPINGS--- block that lists child URLs and their prompts. Web tool use interacts with MAPPINGS in two ways:
The model uses its tools, generates the HTML body, then writes the MAPPINGS block. Information retrieved during tool calls can and should influence the MAPPINGS you emit — for example, if you fetch a changelog and discover three new major features, you can generate a dedicated sub-page prompt for each one and include those in MAPPINGS.
# Example page prompt that chains tools into child pages:
Fetch https://raw.githubusercontent.com/ajbt200128/rabbithole/main/CHANGELOG.md
and identify the three most recent releases. Generate a summary page, then
emit MAPPINGS entries for /release-v1.html, /release-v2.html, /release-v3.html
— each prompt should include the relevant changelog text so the child page
generator has all necessary context without needing to fetch again.
Each MAPPINGS prompt is a complete, self-contained instruction set for a future page generation. If a child page needs live data, its prompt must say so explicitly:
/pricing.html | Pricing page for Acme SaaS. DESIGN: white bg, Arial, minimal, blue links. NAV: Home|Docs|Pricing|About (paths /, /docs.html, /pricing.html, /about.html). Search for current pricing for AWS EC2 t3.micro and t3.small instances and display in a comparison table. Use at most 2 searches. Cite sources with inline links.
/pricing.html | Pricing page showing AWS EC2 prices.
The bad example omits design context, navigation, search instructions, and tool limits. The child page generator starts from zero — it has none of the parent's context.
If the parent page fetches a resource to build its own content, and then emits a child page about the same topic, pass the relevant text directly into the child prompt rather than telling the child to fetch the same URL. This halves the number of tool calls and eliminates the risk of the child seeing a different version of the document.
# Parent prompt:
Fetch https://example.com/api-docs.json. Use it to build a quick-reference table
on this page. Also emit a MAPPINGS entry for /api-reference.html with a prompt
that embeds the full JSON content (truncated to the 10 most important endpoints)
so that page does not need to re-fetch it.
# Resulting MAPPINGS entry (generated by the parent):
/api-reference.html | Full API reference page. ... CONTENT: The following 10
endpoints were retrieved from the official docs: [GET /users — ..., POST /auth — ...]
Do not fetch any URLs; use only the content provided here.
Think of tool-use instructions as falling into two tiers:
| Instruction type | Put in system prompt | Put in page prompt |
|---|---|---|
| Global tool call budget | ✓ | |
| Default citation format | ✓ | Override if different format needed |
| Domains to always trust / prefer | ✓ | |
| Domains to never fetch (e.g., login-walled sites) | ✓ | |
| Specific URL to fetch for this page | ✓ | |
| Search query for live data on this page | ✓ | |
| "Do not search" for a static page | ✓ | |
| Tighter per-page tool budget | ✓ |
The following is a full example llm.system_prompt value in rabbithole.toml that sets sensible defaults for web tool use:
You are Rabbithole, an AI-powered web server that generates HTML pages on demand.
Each request gives you a page prompt; your output is served directly as the HTTP
response, so it must be a complete, valid HTML document.
## Tools
You have access to web_search and web_fetch tools.
- Use them only when the page prompt requests live data, or when a fact is
highly likely to be outdated (version numbers, prices, statistics, news).
- Do NOT use tools for general knowledge, definitions, or stable historical facts.
- Make at most 3 tool calls per page unless the prompt explicitly raises the limit.
- Prefer fetching a single authoritative URL over multiple broad searches.
- When you cite external information, include the source URL as a hyperlink
in the generated HTML.
## Output format
1. Emit a complete HTML5 document starting with <!DOCTYPE html>.
2. After </html>, emit ---MAPPINGS--- on its own line, followed by one
mapping per line in the format: /path.html | full self-contained prompt
3. Every mapping prompt must include design, nav, and content instructions
sufficient to generate that page in complete isolation.
4. Embed any data retrieved by tools directly into child prompts where needed;
do not assume child pages will re-fetch the same resources.
# Search + cite pattern
Search for the current stable version of [PROJECT] and the release date.
Display them prominently. Cite the source with an inline hyperlink.
Use at most 1 search.
# Fetch + summarize pattern
Fetch [EXACT_URL]. Summarize the content in no more than 3 short paragraphs.
Do not make any other tool calls.
# Image hotlink pattern
Search for a high-quality photograph of [SUBJECT] (prefer Wikimedia Commons
or Unsplash). Hotlink the image directly with an <img> tag. Use 1 search.
# No-search pattern (for static pages)
Do not use any web search or fetch tools. Generate this page entirely from
the information provided in this prompt.
# Child-page data-passing pattern
Fetch [URL], extract [SPECIFIC DATA], include a summary on this page,
and embed the raw extracted data into the MAPPINGS prompt for /child.html
so that page does not need to re-fetch it.