Local Search Skill

Local web search with automatic engine fallback. Direct HTML scraping of DuckDuckGo / Bing / Google · resilient to rate-limiting · canonical result schema.

This skill calls the public search engines directly from your machine. No API key, no SDK, no network hop. If one engine is rate-limiting you, the orchestrator silently falls through to the next — so the call usually succeeds on the first try, even from datacenter IPs where Google is blocked.

Each result is a SearchFunctionResultItem with the canonical 7 fields (url, name, snippet, host_name, rank, date, favicon) plus three optional extension fields: source_engine, raw_html, score. Consumer code written against any similarly-shaped web-search skill works here without changes.

✨ Highlights

Zero cloud SDK dependency — only cheerio (HTML parser) and Node's built-in fetch.
Three engines, automatic fallback — DuckDuckGo → Bing → Google when engine: "auto" (default).
Locale-aware — --locale en-US / zh-CN / ja-JP / any BCP-47 tag. Critical for non-US IPs where Bing otherwise serves localized results even for English queries.
Recency filter — --recency-days 7 restricts to past-week results. DDG supports exact N days; Bing/Google use day/week/month buckets.
Canonical result schema — url / name / snippet / host_name / rank / date / favicon fields, identical in shape to other web-search skills in the ClawHub registry.
CLI + SDK — use tsx bin/web-search.ts <query> for one-offs, or import { search } from "local-search" in code.
Pure TypeScript ESM — runs on Node 18+ via tsx, or zero-config via bun.

🚀 Quick Start

# 1. Install (one runtime dep: cheerio)
cd skills/local-search && npm install

# 2. Search (auto-fallback across DDG → Bing → Google)
tsx bin/web-search.ts "artificial intelligence"

# 3. Pin a specific engine + locale
tsx bin/web-search.ts "machine learning" --engine bing --locale en-US --num 5

# 4. Recent news as JSON
tsx bin/web-search.ts "AI breakthroughs" --recency-days 7 --json -o ai_news.json

Programmatic use:

import { search } from "local-search";

const outcome = await search("What is the capital of France?", { num: 5 });
if (outcome.success) {
  console.log(`Answered by ${outcome.engine} in ${outcome.elapsedMs}ms`);
  outcome.results.forEach(r => console.log(`- ${r.name}\n  ${r.url}`));
}

🎯 When to Use This Skill

Use local-search whenever the user needs information that lives on the public web — beyond what's in the model's training data:

Real-time information: current news, stock prices, weather, sports scores
Latest documentation: framework release notes, API changes, recently published papers
Fact-checking: verify a claim against live web sources
Research: gather multiple sources on a topic, compare perspectives
Content discovery: find tutorials, blog posts, recent talks
Competitive / market analysis: competitor announcements, industry trends
Academic lookup: find papers, citations, author pages

🔄 Engine Comparison

Engine	Endpoint	API key	Residential IP	Datacenter IP	Recency
DuckDuckGo	`html.duckduckgo.com/html/` (GET)	none	✅ high	⚠️ rate-limits under load	`df=d<N>` (exact days)
Bing	`www.bing.com/search`	none	✅ high	✅ high	`freshness=d1\|w1\|m1` (bucketed)
Google	`www.google.com/search`	none	⚠️ medium	❌ low (enablejs wall)	`tbs=qdr:d\|w\|m\|y` (bucketed)

engine: "auto" tries them in order DuckDuckGo → Bing → Google, returning the first non-empty result set. From a datacenter IP the effective chain is DDG → Bing (Google is usually blocked); from a residential IP all three are viable. Edit AUTO_ENGINE_ORDER in src/engines/index.ts to change priority.

📦 Installation Path

Recommended Location: {project_path}/skills/local-search

Extract this skill package to the above path in your project.

Prerequisites

Node.js >= 18 (uses the built-in global fetch).
One npm dependency: cheerio (HTML parser).
A working internet connection — the skill calls the public search engines directly.

Install once in the skill directory:

cd {project_path}/skills/local-search
npm install            # or: bun install / pnpm install

The skill is pure TypeScript ESM. You can run it via tsx (recommended, dev-friendly) or bun (zero-config). A runtime-agnostic node --experimental-strip-types bin/web-search.ts also works on Node 22+.

Why Local?

Aspect	Cloud-search skills	This skill (`local-search`)
Search backend	Cloud function / API	Direct HTTP to DuckDuckGo / Bing / Google
API key / SDK	Required	None
Network path	Client → cloud → search engine	Client → search engine
Failure handling	SDK-dependent	Auto-fallback across 3 engines
Result schema	`SearchFunctionResultItem` (7 fields)	Same 7 fields + 3 optional extension fields
CLI	SDK-specific	`tsx bin/web-search.ts`

Architecture

local-search/
├── SKILL.md                 ← you are here
├── LICENSE.txt
├── package.json             ← declares the `cheerio` dep
├── tsconfig.json
├── bin/
│   └── web-search.ts        ← CLI entry
├── src/
│   ├── index.ts             ← public SDK exports
│   ├── search.ts            ← orchestrator with auto-fallback
│   ├── types.ts             ← SearchFunctionResultItem + options
│   └── engines/
│       ├── _shared.ts       ← fetch / parse helpers
│       ├── duckduckgo.ts    ← POST https://html.duckduckgo.com/html/
│       ├── bing.ts          ← GET https://www.bing.com/search
│       ├── google.ts        ← GET https://www.google.com/search
│       └── index.ts         ← engine registry + AUTO_ENGINE_ORDER
└── scripts/
    └── web_search.ts        ← quick-start example

The only runtime dependency is cheerio (HTML parser), declared in package.json. Node 18+'s built-in fetch covers the network layer — no other SDK is imported anywhere in this package.

CLI Usage

The CLI lives at bin/web-search.ts. Run it via tsx, bun, or any TS-aware runner.

Basic Search

# Default: auto engine, 10 results, human-readable output
tsx bin/web-search.ts "artificial intelligence"

# Limit number of results
tsx bin/web-search.ts "machine learning" --num 5
# (short option works too)
tsx bin/web-search.ts "machine learning" -n 5

Pin a Specific Engine

# Force DuckDuckGo only
tsx bin/web-search.ts "latest tech news" --engine duckduckgo
# Force Bing only
tsx bin/web-search.ts "latest tech news" --engine bing
# Force Google only (highest quality, most likely to be rate-limited)
tsx bin/web-search.ts "latest tech news" --engine google
# Default — try DDG → Bing → Google, return first non-empty
tsx bin/web-search.ts "latest tech news" --engine auto

Recency Filter

# Results from last 7 days
tsx bin/web-search.ts "cryptocurrency news" --num 10 --recency-days 7
# (short option)
tsx bin/web-search.ts "cryptocurrency news" -r 7

Save Results to JSON

# Write JSON to a file (human-readable banner is suppressed)
tsx bin/web-search.ts "climate change research" --num 5 --json -o search_results.json

# Or pipe JSON to stdout
tsx bin/web-search.ts "AI breakthroughs" --num 3 --recency-days 1 --json > ai_news.json

Quiet Mode (Scripts / Piping)

# Suppress the "engine / timing" banner — just print the results
tsx bin/web-search.ts "react hooks" --quiet

CLI Parameters

Flag	Short	Description
`--num <N>`	`-n`	Number of results (default: 10, max: 50)
`--engine <id>`	`-e`	`duckduckgo` \| `bing` \| `google` \| `auto` (default: `auto`)
`--recency-days <N>`	`-r`	Restrict to results from last N days (default: 0 = no filter)
`--timeout <ms>`	—	Per-engine timeout in ms (default: 8000)
`--json`	—	Emit JSON instead of human-readable output
`--output <path>`	`-o`	Write JSON to file instead of stdout
`--pretty`	—	Pretty-print JSON (default on; pass `--no-pretty` to disable — not yet supported, omit `--json` instead)
`--quiet`	`-q`	Suppress engine-info banner
`--help`	`-h`	Show help

Search Result Structure

Each result item is a SearchFunctionResultItem:

interface SearchFunctionResultItem {
  // ----- Canonical fields (shared shape across ClawHub web-search skills) -----
  url: string;          // Full URL of the result
  name: string;         // Title of the page
  snippet: string;      // Preview text / description
  host_name: string;    // Domain name (e.g. "en.wikipedia.org")
  rank: number;         // 1-indexed ranking within the engine's result set
  date: string;         // Publication / update date; "N/A" when unknown
  favicon: string;      // Favicon URL (Google S2)

  // ----- Extension fields (new in local-search) -----
  source_engine: "duckduckgo" | "bing" | "google";
  raw_html?: string;    // Original HTML of the snippet element (optional)
  score?: number;       // Heuristic relevance score [0, 100] (optional)
}

SDK Usage

If you're writing TypeScript/JavaScript code (not running the CLI), import from src/index.ts:

Simple Search

import { search } from "local-search";

const outcome = await search("What is the capital of France?", { num: 5 });

if (outcome.success) {
  console.log(`Engine used: ${outcome.engine}`);
  console.log(`Tried in order: ${outcome.enginesTried.join(" → ")}`);
  console.log(`Elapsed: ${outcome.elapsedMs}ms`);
  for (const item of outcome.results) {
    console.log(`- ${item.name}  [${item.source_engine}]`);
    console.log(`  ${item.url}`);
    console.log(`  ${item.snippet}`);
  }
} else {
  console.error("Search failed:", outcome.error);
  if (outcome.errors) {
    // Auto mode: see which engines tried and what they reported
    for (const e of outcome.errors) {
      console.error(`  ${e.engine}:`, e.error);
    }
  }
}

Throw-on-Failure Variant

import { searchOrThrow, AllEnginesFailedError } from "local-search";

try {
  const results = await searchOrThrow("JavaScript frameworks", { num: 10 });
  console.log(results);
} catch (err) {
  if (err instanceof AllEnginesFailedError) {
    console.error("All engines failed:", err.errors);
  } else {
    console.error("Single-engine error:", err);
  }
}

Pin a Specific Engine

import { search } from "local-search";

// Skip the fallback chain — only use DuckDuckGo.
const outcome = await search("quantum computing applications", {
  num: 8,
  engine: "duckduckgo",
});

Recency Filter

const outcome = await search("genomics research", {
  num: 5,
  recency_days: 30, // past 30 days
});

Custom Fetch (Tests / Edge Runtimes)

const outcome = await search("hello world", {
  fetchImpl: myMockFetch, // any WhatWG-fetch-compatible function
  userAgent: "my-bot/1.0",
  timeoutMs: 5000,
});

Engine Backends

DuckDuckGo (`duckduckgo`)

Endpoint: https://html.duckduckgo.com/html/ (POST form q=...)
No API key. Subject to rough rate limits — sustained hammering returns empty result sets.
Recency: df=d<N> form field.
Best for: default first try; rarely blocked outright; respects privacy.

Bing (`bing`)

Endpoint: https://www.bing.com/search?q=...&count=...
No API key. HTML layout is stable and easy to parse.
Recency: freshness=d1 / w1 / m1 bucket (Bing has no arbitrary-N-days URL filter).
Best for: second fallback; good English-language results; occasional locale quirks (mitigated by ensearch=1).

Google (`google`)

Endpoint: https://www.google.com/search?q=...&num=...
No API key, BUT Google is the most aggressive at blocking scrapers. From datacenter / cloud IPs Google typically returns a "please enable JS" redirect page (/httpservice/retry/enablejs) with zero parseable results — the engine surfaces this as a soft failure and the auto chain falls through. From residential IPs, Google HTML scraping often works fine.
Recency: tbs=qdr:d|w|m|y bucket.
Best for: last-resort fallback on residential IPs; will usually fail on cloud IPs (where DDG+Bing already cover the need).

Auto-Fallback Strategy

When engine: "auto" (the default), the orchestrator tries engines in this order:

DuckDuckGo — least aggressive blocking, finest-grained date filter
Bing — usually stable from any IP, including datacenter
Google — best result quality when reachable; often blocked on datacenter IPs

It returns the first engine that yields ≥ 1 result. If all three fail, the outcome is { success: false, errors: [...] } with every engine's error attached — or, in searchOrThrow mode, an AllEnginesFailedError.

Practical reliability table (observed during testing):

Caller IP type	DDG	Bing	Google	Effective auto chain
Residential	high	high	medium	DDG → Bing → Google (3 viable)
Datacenter / cloud	medium (rate-limits under load)	high	low (enablejs wall)	effectively DDG → Bing

The order is defined in src/engines/index.ts (AUTO_ENGINE_ORDER); edit that array if you want a different priority.

Common Use Cases

Real-time Information Retrieval: Current news, stock prices, weather
Research & Analysis: Gather information on specific topics
Content Discovery: Find articles, tutorials, documentation
Competitive Analysis: Research competitors and market trends
Fact Checking: Verify information against web sources
SEO & Content Research: Analyze search results for content strategy
News Aggregation: Collect news from various sources
Academic Research: Find papers, studies, and academic content

Troubleshooting

Issue: Required dependency 'cheerio' is not installed

Fix: cd skills/local-search && npm install cheerio

Issue: All search engines failed (auto mode)

Diagnosis: The outcome's errors array lists each engine's failure. Common causes:
- No internet connection
- All three engines rate-limiting you (rare, but happens under sustained load)
- Corporate proxy / firewall blocking search domains
Fix: Wait a minute and retry; or pin a single engine to get a clearer error.

Issue: DuckDuckGo returns "No results parsed"

Cause: DDG silently serves an empty page when rate-limited. The skill surfaces this as a soft failure so auto mode can fall through to Bing.
Fix: Switch to --engine bing or wait.

Issue: Google returns "consent / captcha page"

Cause: Google is blocking your IP / region.
Fix: Use DuckDuckGo or Bing. Google is the most fragile backend by design.

Issue: Results from a single engine look stale / inconsistent

Cause: Each engine ranks results differently. The auto chain returns whichever engine answered first — not a merged set.
Fix: If you want merged & deduplicated results across engines, post-process results from multiple search() calls (one per engine).

Issue: Cannot find module 'local-search' when importing as SDK

Fix: Either add the skill to your project's package.json workspaces, or import via a relative path: import { search } from "./skills/local-search/src/index.js".

Performance Tips

Reuse the process: Each search() call is stateless, but process startup (TS compile via tsx) costs ~200ms. For batch jobs, write a small Node script that loops over queries instead of spawning the CLI per query.
Pin a single engine when you don't need fallback — saves the latency of probing multiple engines.
Lower num: Each engine over-fetches then truncates; smaller num means slightly less parsing work.
Parallel independent queries: Use Promise.all([...search(q1), search(q2), search(q3)]) — they hit different engines / endpoints concurrently.
Result filtering client-side: All engines return ~10 results even if you ask for 5; the orchestrator truncates, but if you want richer filtering (domain, date range, snippet length), do it in your code.

Security Considerations

Input Validation: Sanitize user search queries before passing them in (the engines handle URL encoding, but your downstream code should still validate).
Rate Limiting: The skill itself does not rate-limit. If you wrap it in a service, add a rate limiter at that layer.
No API Key Storage: There are no API keys to leak. The only secret-like thing is your IP address.
Privacy: Your queries go directly to the chosen search engine. Whether that's more or less private than routing through a cloud function depends on your threat model.
URL Validation: Validate result.url before redirecting end users (the skill returns the engine's raw URL; some engines occasionally surface tracking redirects).
User-Agent Spoofing: The skill sends a Chrome desktop User-Agent by default. Override with userAgent if your use case requires honesty.

Reference Scripts

A minimal example lives at scripts/web_search.ts:

# Default query
tsx scripts/web_search.ts

# Custom query + num
tsx scripts/web_search.ts "your query" 5

It's intentionally tiny — a smoke-test you can eyeball, not a feature showcase.

Remember

Zero cloud SDK dependency — only cheerio and Node's built-in fetch.
Canonical result schema — url / name / snippet / host_name / rank / date / favicon fields, plus three optional extension fields (source_engine, raw_html, score).
Auto-fallback across DuckDuckGo → Bing → Google when engine: "auto" (the default).
CLI: tsx bin/web-search.ts <query> [--num N] [--engine auto|duckduckgo|bing|google] [--recency-days N] [--json] [-o file.json].
SDK: import { search, searchOrThrow } from "local-search".
First-time setup: cd skills/local-search && npm install.
Quick test: tsx scripts/web_search.ts.

Acknowledgements

The canonical SearchFunctionResultItem shape was inspired by z-ai-web-dev-sdk's MIT-licensed web-search skill. All engine backend code in this package is original.

Ask in your favorite AI

說明文件

Local Search Skill

✨ Highlights

🚀 Quick Start

🎯 When to Use This Skill

🔄 Engine Comparison

📦 Installation Path

Prerequisites

Why Local?

Architecture

CLI Usage

Basic Search

Pin a Specific Engine

Recency Filter

Save Results to JSON

Quiet Mode (Scripts / Piping)

CLI Parameters

Search Result Structure

SDK Usage

Simple Search

Throw-on-Failure Variant

Pin a Specific Engine

Recency Filter

Custom Fetch (Tests / Edge Runtimes)

Engine Backends

DuckDuckGo (duckduckgo)

Bing (bing)

Google (google)

Auto-Fallback Strategy

Common Use Cases

Troubleshooting

Performance Tips

Security Considerations

Reference Scripts

Remember

Acknowledgements

相關技能

steipete/notion

affaan-m/seo

affaan-m/brand-voice

affaan-m/crosspost

affaan-m/x-api

affaan-m/content-engine

DuckDuckGo (`duckduckgo`)

Bing (`bing`)

Google (`google`)