claw-score
Use this skill when working on the OpenClaw maturity scorecard in this repo.
This is the openclaw-local version of the maintainer claw-score workflow:
it keeps the taxonomy and scorecard concepts, but excludes discrawl and the old
committed inventory/ report tree.
Authority
This skill owns the operational workflow for:
taxonomy.yamldocs/maturity-scores.yamldocs/concepts/qa-e2e-automation.mdqa/scenarios/index.yaml
Keep person-specific, maintainer-private, Discord archive, and discrawl facts
out of this repo. If a score needs private evidence, use the redacted
qa-evidence.json artifact shape generated by OpenClaw QA workflows.
Source Model
taxonomy.yamlis the hand-edited source of truth for surfaces, levels, QA profiles, categories, feature coverage IDs, docs refs, LTS overrides, and completeness-instruction paths.- Feature
coverageIdsare ANDed proof targets, not aliases. A feature may list multiple IDs when each ID proves part of one capability. - Coverage IDs use dotted
namespace.behaviorform, with lowercase alphanumeric/dash segments. Profile, surface, and category IDs may remain dashed or dotted. - Keep categories and feature names unique, product-shaped, and broader than raw coverage IDs. Do not promote generic IDs into standalone feature names.
- Avoid duplicate coverage-ID bundles under different feature names in one category.
docs/maturity-scores.yamlis the aggregate score source committed in this repo. It is the only committed score data; do not add generated inventory directories.- There is no committed maturity-doc renderer or
pnpm maturity:*script in this repo. Do not invent generated scorecard files; update the source YAML and current docs directly. qa-evidence.jsonartifacts provide per-run QA scorecard evidence. They can enrich generated artifact docs, but they are not committed as inventory.
Commands
Run from the openclaw repo root.
Validate YAML structure after source edits:
node <<'NODE'
const fs = require("node:fs");
const YAML = require("yaml");
for (const file of ["taxonomy.yaml", "docs/maturity-scores.yaml", "qa/scenarios/index.yaml"]) {
YAML.parse(fs.readFileSync(file, "utf8"));
}
NODE
Check docs when touching docs prose:
pnpm check:docs
Run focused QA/profile checks when changing coverage IDs or profile membership:
pnpm openclaw qa coverage --json
Scoring Workflow
When asked to score or refresh a surface:
- Read the surface in
taxonomy.yaml. - Read the surface completeness rubric under
.agents/skills/claw-score/references/completeness/. - Gather public repo evidence from docs, source, tests, and QA scenario metadata.
- Prefer existing
qa-evidence.jsonartifacts for executed proof. Do not use discrawl or unredacted private archives. - Update
docs/maturity-scores.yamlonly when the score change is backed by public or redacted artifact evidence. - Run the YAML validation command from this skill.
- Run
pnpm check:docsif docs prose changed, and focused QA coverage checks if coverage IDs or profile membership changed.
For subjective score changes, make the smallest defensible edit and leave the
evidence path in the PR or task summary. Keep manual prose in current docs and
keep score data in docs/maturity-scores.yaml.
Default Completeness Process
Completeness is scored against the intended operator-visible workflow for each
category, not against test breadth or implementation quality. The completeness
reference files under references/completeness/ define the category scope and
any surface-specific variation from this default process.
By default, Completeness measures how fully OpenClaw exposes the intended surface capability set to the user, operator, author, or maintainer persona for that surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform, provider, channel, security, or lifecycle variants where they apply.
Treat Surface-Specific Scoring Questions and Surface-Specific Guidance as
higher-priority instructions for that surface. The surface instructions may
flesh out, narrow, or intentionally conflict with the default ideas here; when
they do, follow the surface instructions and make the score rationale reflect
that surface-specific instruction. If a reference file does not include
surface-specific questions or guidance, apply this default process to the
surface's Category Scope.
For each category, ask:
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
Default guidance:
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and category evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
Default Completeness bands:
Lovable(95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.Stable(80-95): the expected workflow set is broadly present, with only bounded missing branches.Beta(70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.Alpha(50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.Experimental(0-50): the category exposes only fragments of the intended capability.
Score Semantics
- Coverage: public or redacted proof that the feature is exercised by docs, tests, QA scenarios, live lanes, or release evidence.
- Quality: reliability, maintainability, operator safety, and regression confidence for the category.
- Completeness: how much of the intended operator-visible workflow exists for the category. Use the default completeness process plus any surface-specific variation before changing this score.
- LTS: derived from score thresholds and
human_lts_override; do not hand-edit generated Markdown to change LTS status.
Bands:
Lovable: 95-100Stable: 80-95Beta: 70-80Alpha: 50-70Experimental: 0-50
Artifacts
Do not add the maintainer repo's docs/kevinslin/maturity-scorecard/inventory/
tree to openclaw. Evidence-enriched scorecard outputs belong in short-lived
artifacts, not committed generated docs, unless this repo adds an explicit
renderer/check workflow first.