Reddit Skill
Overview
Use this skill to collect reproducible Reddit research from one subreddit or r/all through OAuth-backed PRAW. Favor raw artifacts first, then analysis, so the same run can be re-ranked, summarized, imported, or translated later without re-fetching Reddit.
Language Policy
Mirror the user's language for human-facing outputs. If the user writes in Chinese or the surrounding project context is Chinese, write generated reports, notes, Markdown indexes, review queues, summaries, and recommendations in Chinese.
Keep raw data schemas stable: JSON keys, CSV headers, Reddit titles, post bodies, comments, URLs, and API field names may stay in their original language. When running the bundled script in Chinese context, pass --language zh so index.md and CLI-facing messages use Chinese labels.
Required Setup
Install the Python dependency if it is missing:
python3 -m pip install praw
On first use, make credential setup the next step before collection. The easiest flow is:
python3 scripts/setup_reddit_env.py
This asks the user for:
REDDIT_CLIENT_ID=your_reddit_client_id
REDDIT_CLIENT_SECRET=your_reddit_client_secret
and writes them to ~/.config/reddit-skill/.env, which scripts/reddit_mine_posts.py reads automatically. The collector also reads project-local .env files and process environment variables.
If the user already has a Reddit OAuth app, tell them to reuse the existing app ID as REDDIT_CLIENT_ID and the app secret as REDDIT_CLIENT_SECRET.
If the user does not have credentials yet, explain that https://www.reddit.com/prefs/apps is a legacy OAuth app page that may still work after login, but may not be the only current onboarding path. When it works, create a script app, use any app name, set redirect URI to http://localhost:8080, then copy the app ID as REDDIT_CLIENT_ID and secret as REDDIT_CLIENT_SECRET. If the legacy page does not allow app creation, guide the user to Reddit's current Data API signup/request flow in the official Reddit Help developer pages.
Do not store Reddit credentials inside the skill repo. Store them in env vars, a project .env, or ~/.config/reddit-skill/.env.
Collection Workflow
- Clarify the target subreddit and ranking dimension. Common choices:
--listing top --time-filter allfor all-time winners.--listing top --time-filter monthfor recent proven hits.--listing hotfor what is currently moving.--listing newor--listing risingfor early discovery.--listing search --query "keyword" --search-sort comments --time-filter yearfor topic mining.
- Run
scripts/reddit_mine_posts.pywith an explicit--out-dirunder the current project's research area. - Preserve the generated raw files:
posts.jsonl: one normalized post per line.comments.jsonl: one normalized comment per line when comments are requested.posts.csv: spreadsheet-friendly post table.index.md: human-readable top-post index.summary.json: run metadata and counts.
- Analyze from the generated artifacts. Prefer re-reading JSONL/CSV over re-querying Reddit unless the user asks for fresher data.
Quick Commands
Run these from the directory containing this SKILL.md, or replace scripts/reddit_mine_posts.py with the absolute path to the installed skill's script.
Collect the top all-time posts in one subreddit:
python3 scripts/reddit_mine_posts.py \
--subreddit LocalLLaMA \
--listing top \
--time-filter all \
--language zh \
--limit 100 \
--include-comments \
--max-comments 8 \
--out-dir research/reddit/localllama-top-all
Mine the hottest posts across r/all:
python3 scripts/reddit_mine_posts.py \
--subreddit all \
--listing hot \
--language zh \
--limit 100 \
--out-dir research/reddit/all-hot
Search a subreddit by topic and rank by comment volume:
python3 scripts/reddit_mine_posts.py \
--subreddit SaaS \
--listing search \
--query "voice agent" \
--search-sort comments \
--time-filter year \
--language zh \
--limit 75 \
--include-comments \
--out-dir research/reddit/saas-voice-agent-comments
Analysis Guidance
- Rank posts by both
scoreandnum_comments; high comments can be more useful for pain mining than high score. - For content strategy, inspect title patterns first, then comment patterns.
- Keep Reddit posting or replying as a human-reviewed queue unless the user explicitly asks otherwise.
- Treat public anonymous
.jsonendpoints as unreliable when they return403or sparse data. Prefer PRAW with OAuth credentials. - For DevRel or product lead mining, score intent from the title, selftext, and top comments together.
Resources
scripts/setup_reddit_env.py: interactive setup helper forREDDIT_CLIENT_IDandREDDIT_CLIENT_SECRET.scripts/reddit_mine_posts.py: PRAW collector for subreddit listings, search, comments, and research artifacts.references/reddit-research-patterns.md: field meanings, output conventions, and follow-up analysis patterns.
Read the reference file when the user asks for scoring, research synthesis, reusable workspace layout, or artifact interpretation beyond simple collection.