Community寫作與編輯github.com

Sakurakilove/local-search

Web search skill that runs entirely on the user machine — no Z.AI / cloud SDK dependency. DuckDuckGo / Bing / Google HTML scraping with automatic engine fallback.

相容平台Claude Code~Codex CLI~Cursor
npx skills add Sakurakilove/local-search

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

說明文件

Local Search Skill

Local web search with automatic engine fallback. Direct HTML scraping of DuckDuckGo / Bing / Google · resilient to rate-limiting · canonical result schema.

ClawHub Version License: MIT Node

This skill calls the public search engines directly from your machine. No API key, no SDK, no network hop. If one engine is rate-limiting you, the orchestrator silently falls through to the next — so the call usually succeeds on the first try, even from datacenter IPs where Google is blocked.

Each result is a SearchFunctionResultItem with the canonical 7 fields (url, name, snippet, host_name, rank, date, favicon) plus three optional extension fields: source_engine, raw_html, score. Consumer code written against any similarly-shaped web-search skill works here without changes.

✨ Highlights

  • Zero cloud SDK dependency — only cheerio (HTML parser) and Node's built-in fetch.
  • Three engines, automatic fallback — DuckDuckGo → Bing → Google when engine: "auto" (default).
  • Locale-aware--locale en-US / zh-CN / ja-JP / any BCP-47 tag. Critical for non-US IPs where Bing otherwise serves localized results even for English queries.
  • Recency filter--recency-days 7 restricts to past-week results. DDG supports exact N days; Bing/Google use day/week/month buckets.
  • Canonical result schemaurl / name / snippet / host_name / rank / date / favicon fields, identical in shape to other web-search skills in the ClawHub registry.
  • CLI + SDK — use tsx bin/web-search.ts <query> for one-offs, or import { search } from "local-search" in code.
  • Pure TypeScript ESM — runs on Node 18+ via tsx, or zero-config via bun.

🚀 Quick Start

# 1. Install (one runtime dep: cheerio)
cd skills/local-search && npm install

# 2. Search (auto-fallback across DDG → Bing → Google)
tsx bin/web-search.ts "artificial intelligence"

# 3. Pin a specific engine + locale
tsx bin/web-search.ts "machine learning" --engine bing --locale en-US --num 5

# 4. Recent news as JSON
tsx bin/web-search.ts "AI breakthroughs" --recency-days 7 --json -o ai_news.json

Programmatic use:

import { search } from "local-search";

const outcome = await search("What is the capital of France?", { num: 5 });
if (outcome.success) {
  console.log(`Answered by ${outcome.engine} in ${outcome.elapsedMs}ms`);
  outcome.results.forEach(r => console.log(`- ${r.name}\n  ${r.url}`));
}

🎯 When to Use This Skill

Use local-search whenever the user needs information that lives on the public web — beyond what's in the model's training data:

  • Real-time information: current news, stock prices, weather, sports scores
  • Latest documentation: framework release notes, API changes, recently published papers
  • Fact-checking: verify a claim against live web sources
  • Research: gather multiple sources on a topic, compare perspectives
  • Content discovery: find tutorials, blog posts, recent talks
  • Competitive / market analysis: competitor announcements, industry trends
  • Academic lookup: find papers, citations, author pages

🔄 Engine Comparison

EngineEndpointAPI keyResidential IPDatacenter IPRecency
DuckDuckGohtml.duckduckgo.com/html/ (GET)none✅ high⚠️ rate-limits under loaddf=d<N> (exact days)
Bingwww.bing.com/searchnone✅ high✅ highfreshness=d1|w1|m1 (bucketed)
Googlewww.google.com/searchnone⚠️ medium❌ low (enablejs wall)tbs=qdr:d|w|m|y (bucketed)

engine: "auto" tries them in order DuckDuckGo → Bing → Google, returning the first non-empty result set. From a datacenter IP the effective chain is DDG → Bing (Google is usually blocked); from a residential IP all three are viable. Edit AUTO_ENGINE_ORDER in src/engines/index.ts to change priority.

📦 Installation Path

Recommended Location: {project_path}/skills/local-search

Extract this skill package to the above path in your project.

Prerequisites

  • Node.js >= 18 (uses the built-in global fetch).
  • One npm dependency: cheerio (HTML parser).
  • A working internet connection — the skill calls the public search engines directly.

Install once in the skill directory:

cd {project_path}/skills/local-search
npm install            # or: bun install / pnpm install

The skill is pure TypeScript ESM. You can run it via tsx (recommended, dev-friendly) or bun (zero-config). A runtime-agnostic node --experimental-strip-types bin/web-search.ts also works on Node 22+.

Why Local?

AspectCloud-search skillsThis skill (local-search)
Search backendCloud function / APIDirect HTTP to DuckDuckGo / Bing / Google
API key / SDKRequiredNone
Network pathClient → cloud → search engineClient → search engine
Failure handlingSDK-dependentAuto-fallback across 3 engines
Result schemaSearchFunctionResultItem (7 fields)Same 7 fields + 3 optional extension fields
CLISDK-specifictsx bin/web-search.ts

Architecture

local-search/
├── SKILL.md                 ← you are here
├── LICENSE.txt
├── package.json             ← declares the `cheerio` dep
├── tsconfig.json
├── bin/
│   └── web-search.ts        ← CLI entry
├── src/
│   ├── index.ts             ← public SDK exports
│   ├── search.ts            ← orchestrator with auto-fallback
│   ├── types.ts             ← SearchFunctionResultItem + options
│   └── engines/
│       ├── _shared.ts       ← fetch / parse helpers
│       ├── duckduckgo.ts    ← POST https://html.duckduckgo.com/html/
│       ├── bing.ts          ← GET https://www.bing.com/search
│       ├── google.ts        ← GET https://www.google.com/search
│       └── index.ts         ← engine registry + AUTO_ENGINE_ORDER
└── scripts/
    └── web_search.ts        ← quick-start example

The only runtime dependency is cheerio (HTML parser), declared in package.json. Node 18+'s built-in fetch covers the network layer — no other SDK is imported anywhere in this package.

CLI Usage

The CLI lives at bin/web-search.ts. Run it via tsx, bun, or any TS-aware runner.

Basic Search

# Default: auto engine, 10 results, human-readable output
tsx bin/web-search.ts "artificial intelligence"

# Limit number of results
tsx bin/web-search.ts "machine learning" --num 5
# (short option works too)
tsx bin/web-search.ts "machine learning" -n 5

Pin a Specific Engine

# Force DuckDuckGo only
tsx bin/web-search.ts "latest tech news" --engine duckduckgo
# Force Bing only
tsx bin/web-search.ts "latest tech news" --engine bing
# Force Google only (highest quality, most likely to be rate-limited)
tsx bin/web-search.ts "latest tech news" --engine google
# Default — try DDG → Bing → Google, return first non-empty
tsx bin/web-search.ts "latest tech news" --engine auto

Recency Filter

# Results from last 7 days
tsx bin/web-search.ts "cryptocurrency news" --num 10 --recency-days 7
# (short option)
tsx bin/web-search.ts "cryptocurrency news" -r 7

Save Results to JSON

# Write JSON to a file (human-readable banner is suppressed)
tsx bin/web-search.ts "climate change research" --num 5 --json -o search_results.json

# Or pipe JSON to stdout
tsx bin/web-search.ts "AI breakthroughs" --num 3 --recency-days 1 --json > ai_news.json

Quiet Mode (Scripts / Piping)

# Suppress the "engine / timing" banner — just print the results
tsx bin/web-search.ts "react hooks" --quiet

CLI Parameters

FlagShortDescription
--num <N>-nNumber of results (default: 10, max: 50)
--engine <id>-educkduckgo | bing | google | auto (default: auto)
--recency-days <N>-rRestrict to results from last N days (default: 0 = no filter)
--timeout <ms>Per-engine timeout in ms (default: 8000)
--jsonEmit JSON instead of human-readable output
--output <path>-oWrite JSON to file instead of stdout
--prettyPretty-print JSON (default on; pass --no-pretty to disable — not yet supported, omit --json instead)
--quiet-qSuppress engine-info banner
--help-hShow help

Search Result Structure

Each result item is a SearchFunctionResultItem:

interface SearchFunctionResultItem {
  // ----- Canonical fields (shared shape across ClawHub web-search skills) -----
  url: string;          // Full URL of the result
  name: string;         // Title of the page
  snippet: string;      // Preview text / description
  host_name: string;    // Domain name (e.g. "en.wikipedia.org")
  rank: number;         // 1-indexed ranking within the engine's result set
  date: string;         // Publication / update date; "N/A" when unknown
  favicon: string;      // Favicon URL (Google S2)

  // ----- Extension fields (new in local-search) -----
  source_engine: "duckduckgo" | "bing" | "google";
  raw_html?: string;    // Original HTML of the snippet element (optional)
  score?: number;       // Heuristic relevance score [0, 100] (optional)
}

SDK Usage

If you're writing TypeScript/JavaScript code (not running the CLI), import from src/index.ts:

Simple Search

import { search } from "local-search";

const outcome = await search("What is the capital of France?", { num: 5 });

if (outcome.success) {
  console.log(`Engine used: ${outcome.engine}`);
  console.log(`Tried in order: ${outcome.enginesTried.join(" → ")}`);
  console.log(`Elapsed: ${outcome.elapsedMs}ms`);
  for (const item of outcome.results) {
    console.log(`- ${item.name}  [${item.source_engine}]`);
    console.log(`  ${item.url}`);
    console.log(`  ${item.snippet}`);
  }
} else {
  console.error("Search failed:", outcome.error);
  if (outcome.errors) {
    // Auto mode: see which engines tried and what they reported
    for (const e of outcome.errors) {
      console.error(`  ${e.engine}:`, e.error);
    }
  }
}

Throw-on-Failure Variant

import { searchOrThrow, AllEnginesFailedError } from "local-search";

try {
  const results = await searchOrThrow("JavaScript frameworks", { num: 10 });
  console.log(results);
} catch (err) {
  if (err instanceof AllEnginesFailedError) {
    console.error("All engines failed:", err.errors);
  } else {
    console.error("Single-engine error:", err);
  }
}

Pin a Specific Engine

import { search } from "local-search";

// Skip the fallback chain — only use DuckDuckGo.
const outcome = await search("quantum computing applications", {
  num: 8,
  engine: "duckduckgo",
});

Recency Filter

const outcome = await search("genomics research", {
  num: 5,
  recency_days: 30, // past 30 days
});

Custom Fetch (Tests / Edge Runtimes)

const outcome = await search("hello world", {
  fetchImpl: myMockFetch, // any WhatWG-fetch-compatible function
  userAgent: "my-bot/1.0",
  timeoutMs: 5000,
});

Engine Backends

DuckDuckGo (duckduckgo)

  • Endpoint: https://html.duckduckgo.com/html/ (POST form q=...)
  • No API key. Subject to rough rate limits — sustained hammering returns empty result sets.
  • Recency: df=d<N> form field.
  • Best for: default first try; rarely blocked outright; respects privacy.

Bing (bing)

  • Endpoint: https://www.bing.com/search?q=...&count=...
  • No API key. HTML layout is stable and easy to parse.
  • Recency: freshness=d1 / w1 / m1 bucket (Bing has no arbitrary-N-days URL filter).
  • Best for: second fallback; good English-language results; occasional locale quirks (mitigated by ensearch=1).

Google (google)

  • Endpoint: https://www.google.com/search?q=...&num=...
  • No API key, BUT Google is the most aggressive at blocking scrapers. From datacenter / cloud IPs Google typically returns a "please enable JS" redirect page (/httpservice/retry/enablejs) with zero parseable results — the engine surfaces this as a soft failure and the auto chain falls through. From residential IPs, Google HTML scraping often works fine.
  • Recency: tbs=qdr:d|w|m|y bucket.
  • Best for: last-resort fallback on residential IPs; will usually fail on cloud IPs (where DDG+Bing already cover the need).

Auto-Fallback Strategy

When engine: "auto" (the default), the orchestrator tries engines in this order:

  1. DuckDuckGo — least aggressive blocking, finest-grained date filter
  2. Bing — usually stable from any IP, including datacenter
  3. Google — best result quality when reachable; often blocked on datacenter IPs

It returns the first engine that yields ≥ 1 result. If all three fail, the outcome is { success: false, errors: [...] } with every engine's error attached — or, in searchOrThrow mode, an AllEnginesFailedError.

Practical reliability table (observed during testing):

Caller IP typeDDGBingGoogleEffective auto chain
ResidentialhighhighmediumDDG → Bing → Google (3 viable)
Datacenter / cloudmedium (rate-limits under load)highlow (enablejs wall)effectively DDG → Bing

The order is defined in src/engines/index.ts (AUTO_ENGINE_ORDER); edit that array if you want a different priority.

Common Use Cases

  1. Real-time Information Retrieval: Current news, stock prices, weather
  2. Research & Analysis: Gather information on specific topics
  3. Content Discovery: Find articles, tutorials, documentation
  4. Competitive Analysis: Research competitors and market trends
  5. Fact Checking: Verify information against web sources
  6. SEO & Content Research: Analyze search results for content strategy
  7. News Aggregation: Collect news from various sources
  8. Academic Research: Find papers, studies, and academic content

Troubleshooting

Issue: Required dependency 'cheerio' is not installed

  • Fix: cd skills/local-search && npm install cheerio

Issue: All search engines failed (auto mode)

  • Diagnosis: The outcome's errors array lists each engine's failure. Common causes:
    • No internet connection
    • All three engines rate-limiting you (rare, but happens under sustained load)
    • Corporate proxy / firewall blocking search domains
  • Fix: Wait a minute and retry; or pin a single engine to get a clearer error.

Issue: DuckDuckGo returns "No results parsed"

  • Cause: DDG silently serves an empty page when rate-limited. The skill surfaces this as a soft failure so auto mode can fall through to Bing.
  • Fix: Switch to --engine bing or wait.

Issue: Google returns "consent / captcha page"

  • Cause: Google is blocking your IP / region.
  • Fix: Use DuckDuckGo or Bing. Google is the most fragile backend by design.

Issue: Results from a single engine look stale / inconsistent

  • Cause: Each engine ranks results differently. The auto chain returns whichever engine answered first — not a merged set.
  • Fix: If you want merged & deduplicated results across engines, post-process results from multiple search() calls (one per engine).

Issue: Cannot find module 'local-search' when importing as SDK

  • Fix: Either add the skill to your project's package.json workspaces, or import via a relative path: import { search } from "./skills/local-search/src/index.js".

Performance Tips

  1. Reuse the process: Each search() call is stateless, but process startup (TS compile via tsx) costs ~200ms. For batch jobs, write a small Node script that loops over queries instead of spawning the CLI per query.
  2. Pin a single engine when you don't need fallback — saves the latency of probing multiple engines.
  3. Lower num: Each engine over-fetches then truncates; smaller num means slightly less parsing work.
  4. Parallel independent queries: Use Promise.all([...search(q1), search(q2), search(q3)]) — they hit different engines / endpoints concurrently.
  5. Result filtering client-side: All engines return ~10 results even if you ask for 5; the orchestrator truncates, but if you want richer filtering (domain, date range, snippet length), do it in your code.

Security Considerations

  1. Input Validation: Sanitize user search queries before passing them in (the engines handle URL encoding, but your downstream code should still validate).
  2. Rate Limiting: The skill itself does not rate-limit. If you wrap it in a service, add a rate limiter at that layer.
  3. No API Key Storage: There are no API keys to leak. The only secret-like thing is your IP address.
  4. Privacy: Your queries go directly to the chosen search engine. Whether that's more or less private than routing through a cloud function depends on your threat model.
  5. URL Validation: Validate result.url before redirecting end users (the skill returns the engine's raw URL; some engines occasionally surface tracking redirects).
  6. User-Agent Spoofing: The skill sends a Chrome desktop User-Agent by default. Override with userAgent if your use case requires honesty.

Reference Scripts

A minimal example lives at scripts/web_search.ts:

# Default query
tsx scripts/web_search.ts

# Custom query + num
tsx scripts/web_search.ts "your query" 5

It's intentionally tiny — a smoke-test you can eyeball, not a feature showcase.

Remember

  • Zero cloud SDK dependency — only cheerio and Node's built-in fetch.
  • Canonical result schemaurl / name / snippet / host_name / rank / date / favicon fields, plus three optional extension fields (source_engine, raw_html, score).
  • Auto-fallback across DuckDuckGo → Bing → Google when engine: "auto" (the default).
  • CLI: tsx bin/web-search.ts <query> [--num N] [--engine auto|duckduckgo|bing|google] [--recency-days N] [--json] [-o file.json].
  • SDK: import { search, searchOrThrow } from "local-search".
  • First-time setup: cd skills/local-search && npm install.
  • Quick test: tsx scripts/web_search.ts.

Acknowledgements

The canonical SearchFunctionResultItem shape was inspired by z-ai-web-dev-sdk's MIT-licensed web-search skill. All engine backend code in this package is original.

相關技能

steipete/notion

Notion CLI/API for pages, Markdown content, data sources, files, comments, search, Workers, and raw API calls.

community

affaan-m/seo

Audit, plan, and implement SEO improvements across technical SEO, on-page optimization, structured data, Core Web Vitals, and content strategy. Use when the user wants better search visibility, SEO remediation, schema markup, sitemap/robots work, or keyword mapping.

community

affaan-m/brand-voice

Build a source-derived writing style profile from real posts, essays, launch notes, docs, or site copy, then reuse that profile across content, outreach, and social workflows. Use when the user wants voice consistency without generic AI writing tropes.

community

affaan-m/crosspost

Multi-platform content distribution across X, LinkedIn, Threads, and Bluesky. Adapts content per platform using content-engine patterns. Never posts identical content cross-platform. Use when the user wants to distribute content across social platforms.

community

affaan-m/x-api

X/Twitter API integration for posting tweets, threads, reading timelines, search, and analytics. Covers OAuth auth patterns, rate limits, and platform-native content posting. Use when the user wants to interact with X programmatically.

community

affaan-m/content-engine

Create platform-native content systems for X, LinkedIn, TikTok, YouTube, newsletters, and repurposed multi-platform campaigns. Use when the user wants social posts, threads, scripts, content calendars, or one source asset adapted cleanly across platforms.

community