Skill Miner
Turn the signal buried in past coding-agent sessions into new and existing "Agent Skills". A session earns a skill only when it clears this gate — repeatable AND (room to do it better OR a repeated ask):
- Repeatable — the task or knowledge recurs, or clearly could. Past recurrence is the strongest evidence, but a plausibly-recurring task counts; a genuine one-off never does. (Required.)
- Room to do it better — there's meaningful feedback, so next time beats the
first:
- Environment feedback — what running the work revealed: failures, recurring quirks of this environment, roundabout paths and their shortcuts, hard-won heuristics. These are literally skills.
- User feedback — the user is the expert and their feedback is rare; mine it to pick their brain (a correction, a better way they knew, a fact we got wrong).
- A repeated ask — the user keeps requesting the same thing; extract it once so they don't have to ask again.
Pipeline: discover → distill (cheap LLM, per conversation) → group, judge & generate.
0. Prerequisites
A small LLM must be reachable. Check it first:
python3 scripts/llm.py --check # prints the backend it will use, or how to enable one
Set one of OPENAI_API_KEY / GEMINI_API_KEY / OPENROUTER_API_KEY, or have a
claude / codex / gemini CLI on PATH. Override model with SKILL_MINER_MODEL,
force a backend with SKILL_MINER_LLM.
1–2. Run the pipeline (scripts do the heavy, cheap-LLM work)
Keep your working directory at the launch directory: all artifacts go under ./out.
OUT=./out
python3 scripts/discover.py --since 35 --out $OUT/inventory.jsonl # find transcripts across agents
python3 scripts/extract.py --inventory $OUT/inventory.jsonl --out $OUT/conversations.jsonl --workers 10
discover.pywalks each adapter incommon.py:ADAPTERS(Claude Code, Codex, opencode; add more there) and keeps sessions from the last--sincedays. Start with at most 100 conversations unless the user asks for more. Add--limit 15toextract.pyto sample fast.extract.pyparses each conversation and emits one JSONL row:agent, session_id, path, main_dir, datetime, n_user_turns, topics[], tasks[], skills_used[], env_feedback[], user_feedback[].main_dir= the deepest folder where most work happened (from edited / read / mentioned files), not the launch cwd.topicsare the scope (usually one — doubling as the headline — for grouping).tasks,env_feedback, anduser_feedbackare short arrays the cheap LLM copies from the user's own words (each[]for the typical session) — they map to the gate's three dimensions. The cheap call sees only user turns, soenv_feedbackis what the user said about what ran, not raw tool output — verify it in step 4.
3. Group the conversations
Read conversations.jsonl — one row per session. At ≤100 rows the whole file fits
in context, so read it. (If you mined many more, skim with jq/grep for rows with
non-empty env_feedback / user_feedback first, then read the interesting slices.)
Ignore rows whose topics is <error> or trivial/empty.
Cluster by task archetype — the KIND of work, not directory alone (e.g. "CI deploy debugging", "settings UI iteration", "dependency migration") — to judge dimension 1 (repeatable): which tasks recur or clearly could. Then, across the rows, gather the other two dimensions:
- dimension 2 —
env_feedbackanduser_feedbackthat point at a way to do a task better next time (lead with environment feedback). A single hard-won lesson is a lead even without a cluster around it. - dimension 3 — a detail that recurs across many sessions'
tasks(the user keeps asking for the same thing).
Set aside one-off setup steps and lone offhand remarks. Keep short notes: named
groups, each with its conversation count, dominant main_dir(s), the feedback /
repeated asks it carries, and a few example paths to open as evidence next.
4. Judge & generate
Read references/quality-bar.md and apply its gate: repeatable AND (feedback OR a
repeated ask).
- Know the landscape. Check your existing skills for what can be extended —
especially one whose feedback says it fell short; if a candidate overlaps it, open
that
SKILL.mdand edit it in place. To find skill file paths:
Use it only to find files to edit; your context stays the source of truth for the landscape. If a candidate centers on a specific repo, check what Markdown docs already exist there (python3 scripts/skills_inventory.pyREADME.md, design docs, plans); if the recurring knowledge belongs in those docs instead of a skill, recommend that to the user in your report. - Confirm against transcripts. For each lead, open 1–3 example transcripts (the
pathfield) to confirm the signal is real and to capture the specifics — the exact rule, the exact error and workaround, the better path the user pointed to, the repeated ask. Read the user's words around any correction and around[skill used: X]markers. The condensed rows show only user turns, so the transcript is where you verify what the environment actually did. Separate a recurring quirk (write it down) from one-time setup (a one-line gotcha at most). - Decide, against the gate. A lead becomes a skill only if it is repeatable AND has either room to do it better (feedback) or a repeated ask. The three dimensions carry equal weight — don't drop a hard-won lesson because the task appeared once if it clearly recurs, and don't ship a repeatable task that has nothing to improve and no repeated ask.
- Generate / modify ≤ 5 skills. Write each as concrete content: the rule, the
trigger, the steps/commands, the quirk + workaround — with file paths relative
to the repo root (
src/pages/Run.tsx), never machine-specific absolutes (/home/<user>/…) copied from a transcript. Where it lands:- Extending an existing skill → edit it in place at the skill path. Don't fork or copy it.
- A brand-new skill → write it under
~/.agents/skills/generated-skills/<name>/, staged for the user to review and promote — don't drop new skills straight into a live home.
- Report. List what you created/modified and, for each, which dimension(s) it came from. For each notable rejected candidate, say in one line which dimension it missed. Include any repo docs you think should be updated instead of, or in addition to, the skill changes.
Extending to other agents
Each agent stores transcripts differently. To support a new one (Gemini CLI,
Cursor, …) add a (discover_fn, parse_fn) pair to common.py:ADAPTERS that yields
the normalized record (_norm(...)). The rest of the pipeline is unchanged. Cursor
(sqlite state.vscdb) is not yet wired up — opencode's adapter is a DB-backed
(SQLite) template to follow.
Notes
- Read-only over your transcripts; the only writes are the
--outartifacts and the skills you choose to generate or improve — new skills land staged under~/.agents/skills/generated-skills/, edits go in place to the existing skill. - Sub-agent / review threads (Claude Code sidechains, Codex
thread_source: subagent) are filtered out — only real user conversations are mined.
Viewing the results
Run standalone, the results are plain files: the artifacts under ./out, and the
skills you changed (edited in place, or staged under generated-skills/). This skill
also ships with Skill Studio, a desktop
app where the user watches the run live, reviews each new skill staged under
generated-skills/ as a Proposed card to Accept or Discard, and sees
in-place edits as git-tracked changes to save as a version or revert.