Prompting Strategies for Web Tools

Best practices for writing page prompts and system prompts that guide the LLM to use web search and fetch tools effectively.

When a Rabbithole instance is configured with web tools (search and/or fetch), the LLM generating each page can access live internet data. This is powerful, but it requires deliberate prompting: without guidance, the model may ignore its tools, overuse them, or use them at the wrong time. This page covers strategies for getting reliable, cost-effective, and well-cited tool use in generated pages.

1. When to Instruct the LLM to Search

The model has a large built-in knowledge base. Not every page needs a web search. Unnecessary searches slow down page generation and increase API costs. The key question is: does this page need information that changes over time or that the model is unlikely to know precisely?

Use web search when…	Rely on model knowledge when…
Content depends on current data (prices, versions, statistics, news)	Content is stable and general (explanations, history, concepts)
You need a specific fact the model might hallucinate (e.g., exact release dates, changelogs)	The page is primarily creative, structural, or UI-focused
You want to hotlink real images from authoritative sources	Placeholder or illustrative images are acceptable
The prompt references a specific project, person, or event post-2023	The topic predates the model's knowledge cutoff and is unlikely to have changed
You want citations with real URLs for credibility	The page is internal documentation that does not cite external sources

The simplest heuristic: if you would open a browser to verify it before publishing, include a search instruction in the prompt.

2. Explicitly Triggering a Search

The model will not always search on its own. To reliably trigger a search, use direct imperative language in your page prompt:

✓ Good

Search for the latest stable release of PostgreSQL and include the version number and release date on this page.

✓ Good

Use web search to find the current price of a DigitalOcean Droplet (1 vCPU, 1 GB RAM) and display it in the pricing table.

✗ Bad

Show the latest PostgreSQL version.

The "bad" example may cause the model to output a version it memorized during training, which could be stale. The "good" example forces a live lookup.

Useful imperative phrases:

Search for …
Use web search to find …
Look up the current … and display it
Find a real image of … and hotlink it
Verify … with a web search before including it

3. Naming Specific URLs or Domains to Fetch

When you need data from a known authoritative source, name it explicitly. The web fetch tool retrieves the plain-text content of a page; this is ideal for reading documentation, changelogs, or structured data that the model can then reformat into HTML.

✓ Good

Fetch https://www.rust-lang.org/en-US/what-is-rust.html and summarize the key points in a bulleted list on this page.

✓ Good

Fetch the Rabbithole README from https://raw.githubusercontent.com/ajbt200128/rabbithole/main/README.md and use it as the source of truth for all feature descriptions on this page.

✓ Good

Search for the nginx documentation domain, then fetch the page for the "proxy_pass" directive and include a usage example from it.

✗ Bad

Include information from the Rust website about memory safety.

Naming the exact URL removes ambiguity and avoids the model inventing a plausible-looking but incorrect URL. For GitHub repositories, the raw content URL (e.g., https://raw.githubusercontent.com/…/README.md) is usually cleaner to fetch than the rendered HTML page.

Domain Hints

If you don't know the exact URL, you can specify a domain to constrain the search:

Search docs.rs for the Tokio crate's async_std compatibility notes and display the relevant section.

Find the GitHub releases page for ajbt200128/rabbithole and display the most recent tag.

4. Instructing the LLM to Cite Sources

By default the model may incorporate fetched information without attributing it. For pages where credibility matters, explicitly require citations:

✓ Good

For every statistic or version number you include, add a visible citation with the source URL as a hyperlink immediately after the claim, in the format: [Source: <url>]

✓ Good

After each paragraph that uses information from a web search or fetch, include a <p><small>Source: <a href="...">...</a></small></p> element.

Specifying the format of citations in the prompt makes them consistent across pages. If you want footnote-style citations, say so; if you want inline parenthetical links, say that instead.

Citation strategies to include in your prompts:

Inline hyperlinks: wrap the claim text in an <a href="SOURCE_URL"> tag
Footnote list: accumulate source URLs in a numbered <ol> at the bottom of the page
Tooltip attribution: use a title attribute on a superscript to show the source on hover
Explicit prose: "According to [site], …" with the site name hyperlinked

5. Limiting Tool Calls — Latency and Cost Control

Each tool call adds round-trip latency and, depending on your LLM provider, API cost. A page that makes five searches before generating HTML may take 15–30 seconds to load. Rabbithole caches generated pages permanently, so the cost is paid once — but it still matters for first-load UX and for sites that generate many pages dynamically (e.g., via MAPPINGS links).

You can cap tool usage directly in the prompt:

✓ Good

Use at most 2 web searches to gather information for this page. Do not search for information you already know with confidence.

✓ Good

Make one fetch call to retrieve the changelog, then generate the page without any additional tool calls.

You can also set a global cap in the system prompt that applies to all pages:

# System Prompt (rabbithole.toml → llm.system_prompt)

You generate HTML pages. You have access to web_search and web_fetch tools.
Use them only when the page prompt explicitly asks for live data or when
information is likely to be out of date. Never make more than 3 tool calls
per page. If you have already found sufficient information, stop searching
and generate the HTML immediately.

A budget in the system prompt sets a floor; per-page prompt overrides can tighten it further for specific pages (e.g., a simple "About" page should never search at all).

Tool Call Budget Guidelines

Page type	Recommended max tool calls	Notes
Static content (About, Contact, FAQ)	0	No live data needed; prohibit searches explicitly if desired
Documentation page referencing one library	1–2	One fetch for the README or changelog is usually sufficient
News / current events summary	2–4	Multiple searches may be needed for breadth; set a hard cap
Product comparison / pricing page	1 per product	State the cap as "one search per vendor, max 5 total"
Image gallery with hotlinked photos	1–2	One broad image search is usually enough for 4–8 images

6. Tool Use and the MAPPINGS Output Format

Every Rabbithole page generation ends with a ---MAPPINGS--- block that lists child URLs and their prompts. Web tool use interacts with MAPPINGS in two ways:

6.1 Tool calls happen before MAPPINGS are written

The model uses its tools, generates the HTML body, then writes the MAPPINGS block. Information retrieved during tool calls can and should influence the MAPPINGS you emit — for example, if you fetch a changelog and discover three new major features, you can generate a dedicated sub-page prompt for each one and include those in MAPPINGS.

# Example page prompt that chains tools into child pages:

Fetch https://raw.githubusercontent.com/ajbt200128/rabbithole/main/CHANGELOG.md
and identify the three most recent releases. Generate a summary page, then
emit MAPPINGS entries for /release-v1.html, /release-v2.html, /release-v3.html
— each prompt should include the relevant changelog text so the child page
generator has all necessary context without needing to fetch again.

Important: Child pages are generated in complete isolation. They receive only the prompt string from MAPPINGS. If a child page needs data that the parent retrieved via a tool call, that data must be embedded verbatim in the child's prompt string. Do not assume the child will re-fetch it.

6.2 Prompts in MAPPINGS should carry their own tool instructions

Each MAPPINGS prompt is a complete, self-contained instruction set for a future page generation. If a child page needs live data, its prompt must say so explicitly:

✓ Good MAPPINGS prompt for a child page

/pricing.html | Pricing page for Acme SaaS. DESIGN: white bg, Arial, minimal, blue links. NAV: Home|Docs|Pricing|About (paths /, /docs.html, /pricing.html, /about.html). Search for current pricing for AWS EC2 t3.micro and t3.small instances and display in a comparison table. Use at most 2 searches. Cite sources with inline links.

✗ Bad MAPPINGS prompt for a child page

/pricing.html | Pricing page showing AWS EC2 prices.

The bad example omits design context, navigation, search instructions, and tool limits. The child page generator starts from zero — it has none of the parent's context.

6.3 Avoid redundant fetches across parent and child

If the parent page fetches a resource to build its own content, and then emits a child page about the same topic, pass the relevant text directly into the child prompt rather than telling the child to fetch the same URL. This halves the number of tool calls and eliminates the risk of the child seeing a different version of the document.

# Parent prompt:
Fetch https://example.com/api-docs.json. Use it to build a quick-reference table
on this page. Also emit a MAPPINGS entry for /api-reference.html with a prompt
that embeds the full JSON content (truncated to the 10 most important endpoints)
so that page does not need to re-fetch it.

# Resulting MAPPINGS entry (generated by the parent):
/api-reference.html | Full API reference page. ... CONTENT: The following 10
endpoints were retrieved from the official docs: [GET /users — ..., POST /auth — ...]
Do not fetch any URLs; use only the content provided here.

7. System Prompt vs. Page Prompt Responsibilities

Think of tool-use instructions as falling into two tiers:

Instruction type	Put in system prompt	Put in page prompt
Global tool call budget	✓
Default citation format	✓	Override if different format needed
Domains to always trust / prefer	✓
Domains to never fetch (e.g., login-walled sites)	✓
Specific URL to fetch for this page		✓
Search query for live data on this page		✓
"Do not search" for a static page		✓
Tighter per-page tool budget		✓

8. Complete System Prompt Example

The following is a full example llm.system_prompt value in rabbithole.toml that sets sensible defaults for web tool use:

You are Rabbithole, an AI-powered web server that generates HTML pages on demand.
Each request gives you a page prompt; your output is served directly as the HTTP
response, so it must be a complete, valid HTML document.

## Tools
You have access to web_search and web_fetch tools.
- Use them only when the page prompt requests live data, or when a fact is
  highly likely to be outdated (version numbers, prices, statistics, news).
- Do NOT use tools for general knowledge, definitions, or stable historical facts.
- Make at most 3 tool calls per page unless the prompt explicitly raises the limit.
- Prefer fetching a single authoritative URL over multiple broad searches.
- When you cite external information, include the source URL as a hyperlink
  in the generated HTML.

## Output format
1. Emit a complete HTML5 document starting with <!DOCTYPE html>.
2. After </html>, emit ---MAPPINGS--- on its own line, followed by one
   mapping per line in the format:  /path.html | full self-contained prompt
3. Every mapping prompt must include design, nav, and content instructions
   sufficient to generate that page in complete isolation.
4. Embed any data retrieved by tools directly into child prompts where needed;
   do not assume child pages will re-fetch the same resources.

9. Quick Reference: Good Prompt Patterns

# Search + cite pattern
Search for the current stable version of [PROJECT] and the release date.
Display them prominently. Cite the source with an inline hyperlink.
Use at most 1 search.

# Fetch + summarize pattern
Fetch [EXACT_URL]. Summarize the content in no more than 3 short paragraphs.
Do not make any other tool calls.

# Image hotlink pattern
Search for a high-quality photograph of [SUBJECT] (prefer Wikimedia Commons
or Unsplash). Hotlink the image directly with an <img> tag. Use 1 search.

# No-search pattern (for static pages)
Do not use any web search or fetch tools. Generate this page entirely from
the information provided in this prompt.

# Child-page data-passing pattern
Fetch [URL], extract [SPECIFIC DATA], include a summary on this page,
and embed the raw extracted data into the MAPPINGS prompt for /child.html
so that page does not need to re-fetch it.