video-digest 是做什么的？

Overview

Turn any video into a single searchable Markdown digest. For each clip the skill produces three things and stitches them into one digest.md:

Transcript — the spoken words (faster-whisper, on-device).
On-screen text — every unique line of text visible in the frames (OCR), with URLs and @handles pulled out separately.
Contact-sheet montages — grids of frames spanning the whole clip, so you can read its visual story at a glance.

Everything runs locally. No network calls, no accounts, no telemetry. The core needs only ffmpeg; transcript and OCR are optional and degrade gracefully.

When to use

"What's in this video / these 200 videos?" without watching them.
Building a searchable catalogue of reels, talks, tutorials, or footage.
Recovering the links and handles flashed on screen in a clip.
Getting a fast visual overview (montages) of long or many recordings.

Not for: live streams, editing/cutting video, or generating video. This reads video and emits text + image summaries.

Quick reference

scripts/check-deps.sh                    # what's installed + how to get the rest
scripts/video-digest.sh clip.mp4         # one video → ./video-digest-out/clip/digest.md
scripts/video-digest.sh --out ~/digests *.mp4     # batch a whole folder
scripts/video-digest.sh --lang en talk.mov        # hint the transcript language
scripts/video-digest.sh --no-transcript reel.mp4  # skip audio, just visuals+text
scripts/video-digest.sh --ocr-engine tesseract x.mp4   # force a specific OCR engine

Flag	Purpose	Default
`-o, --out DIR`	output root	`./video-digest-out`
`--fps N`	OCR frame sampling rate	`2`
`--montage-frames N`	frames spread across the clip	`90`
`--tile CxR`	montage grid	`3x3`
`--ocr-engine E`	`auto`\|`vision`\|`tesseract`\|`none`	`auto`
`--whisper-model M`	`tiny`\|`base`\|`small`\|`medium`(+`.en`)	`small`
`--whisper-python P`	interpreter that has faster-whisper	auto-detected
`--lang CODE`	transcript language hint	auto-detect
`--no-transcript` / `--no-ocr` / `--no-montage`	skip a stage	—
`--force`	re-run stages even if cached	off

Run scripts/video-digest.sh --help for the full list.

How it works

The entry point scripts/video-digest.sh orchestrates three small, independent tools you can also run on their own:

Stage	Script	Engine
Montages	`scripts/montage.sh`	`ffmpeg` tile filter
On-screen text	`scripts/ocr.sh` (+ `ocr_vision.swift`)	macOS Vision (default) or tesseract
Transcript	`scripts/transcribe.py`	faster-whisper

Each stage is resumable — completed work is marked .done / cached and skipped on re-runs unless you pass --force. The digest is always rebuilt from whatever stages are present, so partial runs still produce a useful file.

OCR engine selection is automatic: on macOS it uses the built-in Vision framework (no install, on-device), otherwise tesseract if present, otherwise it skips OCR with a warning. See references/setup.md for installing optional backends and references/output-format.md for the exact output layout.

Common mistakes

Empty transcript → faster-whisper isn't installed in the interpreter being used. Run scripts/check-deps.sh, then pip install faster-whisper, or point at an existing env with --whisper-python /path/to/python.
No on-screen text on Linux → install tesseract (apt install tesseract-ocr); the Vision engine is macOS-only.
Huge montage count on short clips → lower --montage-frames (e.g. 30).
Wrong language transcript → pass --lang (e.g. --lang sk), or use a non-.en model for multilingual audio.

Example

examples/sample-digest/ contains a tiny generated clip and the digest.md it produces — transcript, one detected URL, one @handle, and a montage sheet. Reproduce it with (the transcript line needs faster-whisper installed):

scripts/video-digest.sh --montage-frames 9 examples/sample-digest/sample.mp4

legenxxx/video-digest

video-digest 是什么？

在你喜欢的 AI 中提问

文档

video-digest 是做什么的？

Overview

When to use

Quick reference

How it works

Common mistakes

Example

相关技能

steipete/video-frames

steipete/summarize

affaan-m/motion-patterns

affaan-m/motion-advanced

affaan-m/motion-ui

affaan-m/angular-developer