CommunityVídeo e Animaçãogithub.com

chemny/remotion-short-video

把文章、选题或观点材料转换成带脚本、TTS、字幕、Remotion 模板、封面和发布文案的短视频制作 skill。

Funciona comClaude CodeCodex CLI~Cursor
npx skills add chemny/remotion-short-video

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Documentação

Short Video Maker Skill

Use this skill to convert user-provided articles, ideas, or topics into a publish-ready vertical short video package. The skill should keep creative reasoning and deterministic rendering separate:

  • The active agent handles understanding, research, scripting, storyboarding, and quality judgment.
  • The bundled workflow and Remotion project handle structured files, media assets, timing, rendering, cover export, and packaging.

This separation lets the skill run inside Claude Code, Codex, OpenClaw, or another agent without requiring an internal LLM API. If an LLM API is available, it can be used as an optional backend for standalone or batch runs.

Default Target

Unless the user specifies otherwise, use these defaults:

  • Platform: Xiaohongshu and Douyin compatible
  • Format: vertical video, 1080x1920
  • Duration: 90-130 seconds
  • FPS: 30
  • Language: Chinese
  • Style: knowledge explainer or opinion analysis
  • Visual approach: AI images or sourced stills, animated typography, simple data graphics, keyword-highlight captions
  • Outputs: video.mp4, cover.png, script.md, captions.srt, publish.md, metadata.json

Core Workflow

Follow this sequence:

  1. Parse the input.

    • Identify whether the user gave an article, a rough idea, a topic, or a partial script.
    • Extract topic, audience, platform, tone, expected duration, and constraints.
    • Ask only if the missing information materially changes the output; otherwise use the defaults.
  2. Research and analyze.

    • For current, factual, financial, legal, medical, technical, or news-like topics, verify with reliable sources before writing the script.
    • Capture the angle, audience pain point, core claim, supporting evidence, risk notes, and recommended narrative structure.
    • Save or produce an analysis.json-compatible structure.
    • If the input is a long article, long narration, video podcast script, or podcast.txt, read references/long-to-short-rules.md and extract one short-video angle before scripting.
  3. Write the short-video script.

    • Build for a 90-130 second spoken video, not an article summary.
    • Use a strong hook in the first 3-6 seconds.
    • Keep one central thesis and 2-4 supporting points.
    • Include voiceover, on-screen caption text, visual direction, emotional tone, and estimated duration per scene.
  4. Convert the script into a storyboard.

    • Split the video into scenes.
    • Each scene should include narration, caption text, visual prompt or asset direction, layout type, motion style, and transition intent.
    • Prefer 5-9 scenes for a 2 minute video.
  5. Generate audio and timing.

    • Treat narration audio as the master timeline.
    • Generate TTS audio or instruct the user/agent to generate it using the configured TTS provider.
    • Default to Edge TTS for public-friendly local runs. Use Volcengine or HTTP TTS only when the user provides their own credentials in environment variables.
    • Use transcription or forced alignment to produce word-level or sentence-level timestamps.
    • Use those timestamps to build captions and scene boundaries.
    • Before final TTS for Chinese narration, read references/pronunciation-rules.md and create a job-local phonemes.json when names, polyphones, or English terms need overrides.
  6. Prepare visuals.

    • Use AI-generated still images, sourced images/video, or Remotion-native graphics.
    • For MVP work, prefer still images plus motion, typography, charts, and transitions over AI-generated video clips.
    • Check licensing and factual fidelity when using sourced assets.
  7. Build video-plan.json.

    • Convert analysis, script, captions, audio, visual assets, style, and cover plan into a single Remotion input file.
    • Use the schema in references/video-plan-schema.md.
    • Read references/platform-rules.md before filling publish.
    • Read references/preference-rules.md if a local preference file exists or the user asks to save defaults.
  8. Render and package.

    • Generate a first-frame preview first when the user has not explicitly confirmed full rendering.
    • Render the video through Remotion.
    • Render the cover still.
    • Export the publish package with script, captions, metadata, and platform copy.
  9. Quality check.

    • Run scripts/validate-plan.mjs and scripts/audit-timing.mjs before render.
    • Verify duration, audio presence, caption timing, missing assets, unreadable text, aspect ratio, cover readability, and platform publish rules.
    • If issues are detected, fix the plan or assets and render again.

Recommended File Layout

For each video job, create a job folder:

jobs/<slug>/
  input.md
  analysis.json
  script.json
  storyboard.json
  audio/
    voiceover.mp3
    bgm.mp3
  captions/
    captions.json
    captions.srt
  assets/
    scene-01.png
    scene-02.png
  video-plan.json
  output/
    video.mp4
    cover.png
    script.md
    publish.md
    metadata.json
  phonemes.json

Dependency Policy

Required:

  • Node.js
  • Remotion
  • FFmpeg and ffprobe
  • TTS provider or local TTS. Edge TTS is the public default; Volcengine and HTTP adapters require user-provided credentials.
  • Caption alignment or transcription capability

Recommended:

  • Image generation or image sourcing capability
  • Search API or browser research capability

Optional:

  • LLM API for standalone or batch execution
  • AI video generation API
  • Stock media API
  • Automatic publishing API

Agent vs API Responsibility

When running inside Claude Code, Codex, or OpenClaw:

  • Use the active agent model for content understanding, research synthesis, script writing, and storyboard planning.
  • Do not require an internal LLM API unless the user asks for standalone/batch automation.
  • Prefer deterministic scripts for validation, media copying, Remotion rendering, and packaging.

When running standalone:

  • Read provider keys from environment variables.
  • Keep provider choice configurable.
  • Never hardcode API keys.

Quality Bar

The result is acceptable only if:

  • Video is vertical 1080x1920 or the requested aspect ratio.
  • Duration is close to target, normally 90-130 seconds.
  • Voiceover, captions, and scene timing are aligned.
  • Captions are readable on mobile.
  • Visuals support the narration instead of being generic decoration.
  • Cover is readable at small thumbnail size.
  • publish.md includes platform-ready title, body copy, tags, and optional comment prompt.
  • Pronunciation risks have been handled by rewriting or a job-local phonemes.json.
  • A first-frame preview has been generated before full rendering unless the user explicitly requested a render without preview.

References

  • Read references/mvp-spec.md before designing or implementing the first version.
  • Read references/video-plan-schema.md before generating Remotion input data.
  • Read references/platform-rules.md before generating publish copy.
  • Read references/pronunciation-rules.md before final TTS for Chinese narration.
  • Read references/preference-rules.md before reading or writing reusable defaults.
  • Read references/long-to-short-rules.md when adapting a long script, article, or podcast into short videos.
  • Use examples/input.md as the first smoke-test input.

Habilidades Relacionadas

qianwen-ai/qianwen-image-generation

[QianWen] Generate and edit images using Wan and Qwen Image models. Supports text-to-image, image editing (style transfer, subject consistency, text rendering), and interleaved text-image output. TRIGGER when: user wants to create illustrations, product images, artistic designs, posters, text-to-image generation, edit/transform existing images, apply style transfer, generate images based on reference photos, interleaved text-image content, mentions Wan/Qwen Image models/AI art creation, or explicitly invokes this skill by name (e.g. use qianwen-image-generation). DO NOT TRIGGER when: user wants to understand/analyze existing images or OCR (use qianwen-vision), video generation (use qianwen-video-generation), text-only tasks.

community

gycdsj/accompaniment-generator

OpenClaw skill: 从YouTube或本地音频文件分离人声和伴奏,生成纯伴奏音乐

community

Simuiu/obsidian-inbox

Local-first Codex skill for turning videos, articles, web pages, and local media into structured Obsidian notes with raw archives and clickable indexes.

community

Azdmjiny/html-ppt-creator

Codex skill for creating polished, self-contained HTML slide presentations

community

coreyhaines31/image

When the user wants to create, generate, edit, or optimize images for marketing — blog heroes, social graphics, product mockups, profile banners, listing visuals, or brand assets. Also use when the user mentions 'AI image generation,' 'generate an image,' 'create a graphic,' 'product mockup,' 'hero image,' 'social media graphic,' 'banner image,' 'cover photo,' 'profile banner,' 'listing screenshot,' 'Flux,' 'Flux Kontext,' 'Midjourney,' 'DALL-E,' 'GPT Image,' 'ChatGPT Images,' 'Ideogram,' 'Gemini image,' 'Nano Banana,' 'Recraft,' 'Stable Diffusion,' 'Canva,' 'Figma,' 'image optimization,' 'compress images,' 'WebP,' or 'OG image.' Use this for general-purpose marketing image creation and optimization. For paid ad image creative and platform-specific ad specs, see ad-creative. For video production, see video.

community

Sheshiyer/skill-clusters

Hub-and-spoke agent-skill clusters, one per stack (Astro·GSAP·Remotion, Tauri, …). Installable via skills.sh.

community