CommunityRedacción y edicióngithub.com

PenghaoJiang/auto-paper-collecter

📚🔭 Your personal research radar — an LLM-powered tool that auto-aggregates the latest papers for your keywords across arXiv / Crossref / Semantic Scholar / GitHub / RSS.

Compatible conClaude CodeCodex CLI~Cursor
npx skills add PenghaoJiang/auto-paper-collecter

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Documentación

auto-paper-collecter (skill)

A self-hosted research-literature radar that runs inside a coding agent. The Python scripts do only the deterministic work (API fetch, dedup, render, email). YOU — the assistant running this skill — do all the judgement work: query expansion, computer-science relevance filtering, Chinese summaries, and hot-topic synthesis. That means no AI API key is needed — whichever model is running this skill (Claude in Claude Code, GPT in Codex, …) is the LLM.

Layout

skill/
├── SKILL.md
├── scripts/   common.py · fetch.py · render.py · notify.py   (stdlib only)
├── state/     config.json · (queries/candidates/curated/trends/seen .json)
└── digests/   YYYY-MM-DD.md  +  .html

Run scripts from scripts/: cd skill/scripts && python3 <script>.py

Config — state/config.json

  • keywords: up to ~3 topic strings to track.
  • domain: the field to constrain relevance to (default computer science).
  • sources: toggle arXiv / Crossref / Semantic Scholar / GitHub / HuggingFace / PapersWithCode / RSS.
  • lookback_days: how far back to fetch (dedup stops repeats anyway).
  • max_per_source, rss_feeds.

When the user asks to change keywords / sources / field, edit this file and confirm the change back to them.

Optional env vars (never stored in the repo): SEMANTIC_SCHOLAR_KEY (lifts S2 rate limits), GITHUB_TOKEN (lifts GitHub limits), SMTP_* / EMAIL_TO (email), and push channels — TELEGRAM_BOT_TOKEN/TELEGRAM_CHAT_ID, SLACK_WEBHOOK_URL, WECHAT_WEBHOOK (企业微信群机器人) or SERVERCHAN_KEY (Server酱).

The run pipeline — follow IN ORDER

1 · Read config & expand queries (you)

Read state/config.json. For each keyword, think of 2–3 associative English search queries — synonyms, full forms, adjacent sub-topics — so recall isn't limited to the literal term (e.g. C2Rust["C2Rust", "C-to-Rust translation", "migrating legacy C code to Rust"]). Write them to state/queries.json as {"<keyword>": ["q1", "q2", ...], ...}.

2 · Fetch candidates (script)

cd skill/scripts && python3 fetch.py

Fetches every enabled source for those queries, drops anything already in state/seen.json or older than lookback_days, and writes state/candidates.json. If it reports 0 candidates, tell the user "暂无新文献" and stop (nothing else to do).

3 · Filter relevance & summarize (you)

Read state/candidates.json. For each item decide: is it (a) computer-science and (b) genuinely on-topic for its topic keyword? Drop the rest (medical "translation", finance "AI", random GitHub star-lists, etc.). For every kept item write a concise Chinese summary and assemble state/curated.json — a list of objects:

{"source","topic","title","url","venue","authors","published",
 "tldr":"一句话核心 (<=60字)","method":"方法简述 (<=80字)",
 "contributions":["核心贡献1","核心贡献2"]}

Keep papers first, GitHub repos last (they are a supplementary signal). If a source gave a tldr already, you may build on it.

GitHub items are repos, not papers — don't over-summarize them. Use the repo description (its abstract) as the tldr and leave method/contributions empty. fetch.py already keeps only repos with ≥10 stars, ranked by stars, so they tend to be substantive (course / framework / awesome-list), not personal noise.

4 · Hot-topic synthesis (you, optional but recommended)

Cluster the kept items into a handful of coarse CS sub-fields (自然语言处理 / 计算机视觉 / 系统与编译 …; merge aggressively). Write state/trends.json: {"top": [{"name","delta": <count>, "summary": "<=80字方向总结", "papers": ["title", ...]}, ... up to 3]}.

5 · Render the digest (script)

cd skill/scripts && python3 render.py

Writes digests/YYYY-MM-DD.md + .html from curated.json (+ trends.json) and records everything shown into seen.json so it won't repeat.

6 · Notify (script, optional)

cd skill/scripts && python3 notify.py     # emails the HTML digest if SMTP_* env is set

7 · Report back (you)

Tell the user how many papers were kept, the top hot directions, and the digest path. Offer to open the HTML or adjust keywords.

Notes

  • Scripts are pure Python stdlib — no pip install required.
  • fetch.py already filters garbage future dates and de-duplicates across runs.
  • This skill is the agent-driven counterpart of the project's FastAPI web dashboard; both share the same sources and pipeline philosophy.

Skills relacionados

steipete/notion

Notion CLI/API for pages, Markdown content, data sources, files, comments, search, Workers, and raw API calls.

community

affaan-m/seo

Audit, plan, and implement SEO improvements across technical SEO, on-page optimization, structured data, Core Web Vitals, and content strategy. Use when the user wants better search visibility, SEO remediation, schema markup, sitemap/robots work, or keyword mapping.

community

affaan-m/brand-voice

Build a source-derived writing style profile from real posts, essays, launch notes, docs, or site copy, then reuse that profile across content, outreach, and social workflows. Use when the user wants voice consistency without generic AI writing tropes.

community

affaan-m/crosspost

Multi-platform content distribution across X, LinkedIn, Threads, and Bluesky. Adapts content per platform using content-engine patterns. Never posts identical content cross-platform. Use when the user wants to distribute content across social platforms.

community

affaan-m/x-api

X/Twitter API integration for posting tweets, threads, reading timelines, search, and analytics. Covers OAuth auth patterns, rate limits, and platform-native content posting. Use when the user wants to interact with X programmatically.

community

affaan-m/content-engine

Create platform-native content systems for X, LinkedIn, TikTok, YouTube, newsletters, and repurposed multi-platform campaigns. Use when the user wants social posts, threads, scripts, content calendars, or one source asset adapted cleanly across platforms.

community