Communitygithub.com

legenxxx/video-digest

Turn any video into a searchable text digest — transcript, on-screen text & links, and contact-sheet montages. 100% local, no cloud.

video-digest 是什么?

video-digest is a Claude Code agent skill that turn any video into a searchable text digest — transcript, on-screen text & links, and contact-sheet montages. 100% local, no cloud.

兼容平台~Claude Code~Codex CLI~Cursor
npx skills add legenxxx/video-digest

在你喜欢的 AI 中提问

打开一个已预加载此 Agent Skill 的新对话。

文档

video-digest 是做什么的?

Overview

Turn any video into a single searchable Markdown digest. For each clip the skill produces three things and stitches them into one digest.md:

  1. Transcript — the spoken words (faster-whisper, on-device).
  2. On-screen text — every unique line of text visible in the frames (OCR), with URLs and @handles pulled out separately.
  3. Contact-sheet montages — grids of frames spanning the whole clip, so you can read its visual story at a glance.

Everything runs locally. No network calls, no accounts, no telemetry. The core needs only ffmpeg; transcript and OCR are optional and degrade gracefully.

When to use

  • "What's in this video / these 200 videos?" without watching them.
  • Building a searchable catalogue of reels, talks, tutorials, or footage.
  • Recovering the links and handles flashed on screen in a clip.
  • Getting a fast visual overview (montages) of long or many recordings.

Not for: live streams, editing/cutting video, or generating video. This reads video and emits text + image summaries.

Quick reference

scripts/check-deps.sh                    # what's installed + how to get the rest
scripts/video-digest.sh clip.mp4         # one video → ./video-digest-out/clip/digest.md
scripts/video-digest.sh --out ~/digests *.mp4     # batch a whole folder
scripts/video-digest.sh --lang en talk.mov        # hint the transcript language
scripts/video-digest.sh --no-transcript reel.mp4  # skip audio, just visuals+text
scripts/video-digest.sh --ocr-engine tesseract x.mp4   # force a specific OCR engine
FlagPurposeDefault
-o, --out DIRoutput root./video-digest-out
--fps NOCR frame sampling rate2
--montage-frames Nframes spread across the clip90
--tile CxRmontage grid3x3
--ocr-engine Eauto|vision|tesseract|noneauto
--whisper-model Mtiny|base|small|medium(+.en)small
--whisper-python Pinterpreter that has faster-whisperauto-detected
--lang CODEtranscript language hintauto-detect
--no-transcript / --no-ocr / --no-montageskip a stage
--forcere-run stages even if cachedoff

Run scripts/video-digest.sh --help for the full list.

How it works

The entry point scripts/video-digest.sh orchestrates three small, independent tools you can also run on their own:

StageScriptEngine
Montagesscripts/montage.shffmpeg tile filter
On-screen textscripts/ocr.sh (+ ocr_vision.swift)macOS Vision (default) or tesseract
Transcriptscripts/transcribe.pyfaster-whisper

Each stage is resumable — completed work is marked .done / cached and skipped on re-runs unless you pass --force. The digest is always rebuilt from whatever stages are present, so partial runs still produce a useful file.

OCR engine selection is automatic: on macOS it uses the built-in Vision framework (no install, on-device), otherwise tesseract if present, otherwise it skips OCR with a warning. See references/setup.md for installing optional backends and references/output-format.md for the exact output layout.

Common mistakes

  • Empty transcript → faster-whisper isn't installed in the interpreter being used. Run scripts/check-deps.sh, then pip install faster-whisper, or point at an existing env with --whisper-python /path/to/python.
  • No on-screen text on Linux → install tesseract (apt install tesseract-ocr); the Vision engine is macOS-only.
  • Huge montage count on short clips → lower --montage-frames (e.g. 30).
  • Wrong language transcript → pass --lang (e.g. --lang sk), or use a non-.en model for multilingual audio.

Example

examples/sample-digest/ contains a tiny generated clip and the digest.md it produces — transcript, one detected URL, one @handle, and a montage sheet. Reproduce it with (the transcript line needs faster-whisper installed):

scripts/video-digest.sh --montage-frames 9 examples/sample-digest/sample.mp4

相关技能