Community艺术与设计github.com

ne11nn/comparison-lab

Generate genuinely distinct, research-backed variants of anything and compare them on one shareable, dependency-free board. A Claude Code Agent Skill.

兼容平台Claude Code~Codex CLI~Cursor
npx skills add ne11nn/comparison-lab

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

文档


name: comparison-lab description: Use when someone asks to A/B test, compare approaches/designs/options/variants, build a comparison board or comparison lab, pick the best of several solutions, or evaluate several solutions side by side. Generates genuinely distinct, research-backed variants of anything — solution approaches, UI/component designs, test/eval strategies, generated media (code, images, audio, text) — and renders each as a sticky-note card on one shareable board with objective badges, a pick + feedback loop, and a raw-vs-processed toggle to isolate pipeline confounds. Triggers: "A/B test", "compare", "comparison lab", "side by side", "variants", "which approach/design/option", "pick the best", "evaluate options". argument-hint: [what to compare]

What this skill does

Turns "compare some options for X" into a real instrument. You generate several genuinely distinct, research-backed variants of X, render each in its own panel on one shareable comparison board, and hand it to a human to compare, pick winners, and leave feedback. Their picks come back to you and you deepen the winners.

It works for anything: alternative solution approaches, UI/component designs, test/eval strategies, and generated media (code, images, audio, text). The board is a dependency-free standalone HTML file, so it drops into any project.

$ARGUMENTS is what to compare. If it is vague (compare what, varying which dimension, for which subjects), ask 1-2 sharpening questions before generating.

Core principles (the backbone — do not skip)

  1. Genuine, distinct variants — not knob-tweaks. Each variant is a real, named technique. If two variants differ only by a parameter value, they are one approach sampled twice — collapse them. Test: if you can describe two variants with the same sentence and only swap a number, they are not distinct.
  2. Research IS the calibration step. When the domain has a literature or established practices, research the actual named techniques BEFORE designing variants — use WebSearch / docs / references/. Guessing the menu from imagination over- or under-shoots. (See reference.md §1 for per-domain research prompts.)
  3. Controlled comparison. Freeze every input; vary ONLY the one dimension under test. Same seed/scenario/prompt/copy across every card in a row. State the frozen inputs in the row's goal. One row = one varied dimension.
  4. Objective badges. Where anything is measurable, attach a color-coded metric badge per panel (distance-from-target, pass-rate, latency, bundle size, contrast ratio, …) so picking is evidence-based, not vibes.
  5. Isolate confounds — multiple views of one artifact. When more than one view of the same output is worth seeing, give the card a content.views toggle (e.g. Notation / Piano-roll / Audio / Raw / Diff). The classic case is RAW vs PROCESSED — render the raw artifact next to the processed one so the human can tell which stage of a translate/recolor/normalize/serialize/compress pipeline caused a bad result; content.raw is the 2-view shortcut for it. Never fake a toggle. Only add a view when it genuinely differs — if processed and raw come out identical (a faithful transform), drop the toggle. A no-op toggle is worse than none; the board auto-collapses byte-identical views and auto-hides the toggle at ≤1 view.
  6. Metric-vs-artifact honesty. Render the real artifact next to its metric badges and let disagreements show. If the notation plainly has a lowered 3rd but the "distance" badge reads 0, that contradiction is the finding — it means the metric is ignoring something. Surfacing it is the point; do not hide the artifact behind the number or tune the number to match.
  7. Real-content gate. Every panel shows REAL output — never a stub, placeholder, or byte-size check. Verify the actual content renders. For audio, the content must actually SOUND — verified with the sample source blocked (reference.md §9 offline gate) — not merely render.
  8. One shareable surface. Output a single board the human opens (a file or link), not scattered artifacts.
  9. Checkpoint discipline. Build the riskiest / most representative slice first (one subject, one row, with real content + audio/playback if relevant). Show the human and get feedback BEFORE scaling to every dimension.
  10. Close the loop. The human picks winners and leaves feedback ON the board; that export comes back to you to refine. Then deepen the winners.
  11. Per-subject winners + fusion. When the best variant differs per test subject, surface the per-subject winners and add a fusion/synthesis round that combines the winning traits, rather than forcing one global winner. The point of multiple subjects is to expose that a single approach rarely wins everywhere — honor that instead of averaging it away.

Workflow

1. Clarify the comparison. From $ARGUMENTS, pin down: what is being compared, the single dimension that varies per row, and the controlled subjects/seeds (2 is a good default). Ask only if genuinely ambiguous.

2. Research the real variants. If the domain has established techniques, look them up first (reference.md §1 has research prompts). Produce a short list of 2-6 named, genuinely distinct approaches per row. Reject any two that differ only by a parameter.

3. Generate real content for each variant. For each (subject, row, variant), produce the actual artifact and choose its content kind: html (UI/text), code, image, iframe, audio (incl. WebAudio synth), or passfail (tests/evals). See reference.md §2 for the modality table. To offer several views of one output, set content.views to a list of named views (the first is the default; identical views collapse, the toggle auto-hides at ≤1) — or use the content.raw 2-view shortcut where a pipeline could hide the truth (principle 5). Attach objective badges wherever something is measurable, and show them next to the real artifact so contradictions surface (principles 4, 6).

Rendering real domain artifacts (notation, charts, diagrams): the reliable path is to render to an <svg> in a headless page, take svg.outerHTML, and embed it via the html content kind (inline SVG — NOT an <img> data-URI, which drops the SVG namespace and renders broken). Glyph renderers (e.g. VexFlow/Bravura) need the font inlined as a base64 @font-face on BOTH the render page and the board, with document.fonts.ready awaited before rendering and before screenshotting. template/render-artifacts.example.mjs is a copy-and-adapt harness for exactly this; reference.md §6 has the full recipe and the gotchas.

4. Build the board from the template. Copy template/comparison-lab.html into the project (somewhere servable, or anywhere for file:// use). Write your dataset into a data.json next to it using the schema in template/README.md. Do NOT hand-roll a new UI — the template already ships the sticky-note board, badges, toggle, buttons, and export.

  • Served over http(s): data.json is fetched automatically.
  • Double-clicked (file://): paste the dataset into the <script id="inline-data"> block in the HTML (fetch of a sibling file is blocked on file://).

5. Verify it renders real content. Open the board and confirm every panel shows actual output (not the placeholder), badges are colored, any views/raw toggle works where present (and is absent where there is no genuine second view), and Pick/Feedback/Export work. If you have headless-browser access, screenshot it — on a phone width too (the board is mobile-first). Fix anything that reads as a draft.

6. Checkpoint — share the riskiest slice. Hand the human the board (link or file) with the first subject/row fully built. Get feedback before scaling to the rest. This is principle 9 — do not silently build everything.

7. Collect picks + deepen winners. The human picks winners and leaves notes, then clicks Export picks + feedback (Markdown or JSON). Take that back and deepen the chosen variants — refine, harden, or productionize what they picked.

The board UI (what the template ships)

A clean white board of sticky-note cards — high-contrast black text, subtle shadow, pastel header strip per card, sitting straight and aligned. Calibrated against real sticky-note / kanban boards.

  • Header: title, frozen intro, a subject/seed switcher (buttons that flip the controlled input), a badge legend, Reset, and Export picks + feedback.
  • Each card (sticky note): variant label, a one-line technique blurb, a generic content slot (renders inline HTML/SVG, <img>, sandboxed <iframe>, <audio>/synth, <pre><code>, or a pass/fail checklist — driven by the data), color-coded metric badges, and an optional named-view toggle (a pill row) when the data provides content.views or content.raw.
  • Buttons per card: ★ Pick (toggles a winner highlight), ✎ Feedback (inline note), plus content actions that follow the active view (▶ Play, ⤢ Open, ⧉ Copy).
  • Export dumps the human's picks + notes as copyable Markdown or JSON. Picks + notes persist in localStorage.
  • Zero dependencies, zero network — works by opening the file. Mobile-first: cards collapse to one column on a phone, tap targets are ≥40px, the subject switcher stays reachable in the sticky header, body text stays readable. Mouse- and touch-friendly, respects reduced-motion.

Files

FilePurpose
template/comparison-lab.htmlThe board. Self-contained (inline CSS + vanilla JS), mobile-first. Copy as-is; do not rebuild.
template/data.sample.jsonWorking dataset proving generality (html, code, passfail, image+raw, an N-view content.views toggle, audio across 2 subjects). Copy → data.json, edit.
template/render-artifacts.example.mjsCopy-and-adapt harness for rendering real domain artifacts (notation/charts/diagrams) into the board — bundle your renderer, inline a glyph font, extract inline SVG headless, inject + screenshot. Edit the ADAPT THIS blocks.
template/README.mdHow to populate data.json; the full schema, field by field (incl. content.views); the export format.
reference.mdVariant-design deep dive + per-domain research prompts (§1), the modality rendering table (§2), badges (§3), schema pointer (§4), the worked music example incl. the raw-vs-processed story (§5), rendering real artifacts + the headless harness (§6), the content.views schema + metric-vs-artifact lesson (§7), and a mobile-first checklist (§8).

Notes

  • This is an approach skill (like brainstorming) — slash-invocable AND model-invocable. Do not set disable-model-invocation.
  • Copy the HTML template verbatim — all the UI work is done. Your job is the data: genuine variants + real content + badges.
  • Do NOT ship a board of near-duplicate variants. If you cannot name each variant as a distinct technique, you have not done step 2.
  • Do NOT fake content. A placeholder panel defeats the purpose (principle 7).
  • Do NOT ship a no-op toggle. Only add a views/raw view when it genuinely differs — identical views auto-collapse, but design for a real difference (principle 5).
  • For larger interactive previews, prefer iframe with srcdoc over a giant html string. For rendered SVG, embed via the html kind, not an <img> data-URI (reference.md §6).
  • Keep generated media inline (data: URIs, WebAudio synth) so the board stays a single shareable, offline file.

相关技能