When to Use
User wants to create a talking-head / digital-human video from a portrait photo and an audio file (口播视频, 数字人生成, photo talk). Triggers: "生成口播视频", "make this photo talk", "照片说话", "数字人视频", "sadtalker", "talking head", "avatar video from photo". Applies to macOS (Apple Silicon or Intel) only.
Procedure
- Locate the scripts: they live in the same directory as this SKILL.md. Find them with:
ls "$(dirname <path-to-this-SKILL.md>)"— you'll see setup.sh and generate.sh alongside this file. - Check prerequisites: ffmpeg, conda env sadtalker, ~/SadTalker/checkpoints/. If anything is missing, run setup.sh (first time only, ~15 min, ~2.5GB download).
- Confirm the user has provided a portrait photo (.jpg/.png) and speech audio file (.wav/.mp3). If audio is in another format, convert with ffmpeg first.
- Run generation:
bash generate.sh <photo> <audio> [--enhancer gfpgan] [--still] [--preprocess full|resize|crop]. This invokesconda run -n sadtalker python inference.pyunder the hood. - The output .mp4 appears in ~/SadTalker/results//. Open it with:
open <path>(macOS). Report the path to the user.
Pitfalls
- Python MUST be 3.10 inside the conda env. 3.8 misses Apple Silicon wheels, 3.11+ has no scikit-image==0.19.3 wheel.
- 8GB RAM: close Chrome/IDE before running. SadTalker is pure CPU on Mac — no GPU path exists.
- dlib must be installed separately on Mac:
conda run -n sadtalker pip install dlib— this is the #1 M1 error. - Install torch WITHOUT CUDA suffix: plain
pip install torch torchvision torchaudio. If torch.version contains '+cu', reinstall. - ffmpeg via brew, not conda. Conda's ffmpeg sometimes lacks needed codecs.
- First run downloads ~2GB checkpoints. In China, set HF_ENDPOINT=https://hf-mirror.com first.
- GFPGAN enhancer (
--enhancer gfpgan) roughly doubles processing time. Skip for quick previews. - --still mode expects full-body source photos when combined with --preprocess full.
- Audio must be .wav or .mp3. Other formats error with 'Header missing'. Convert with ffmpeg:
ffmpeg -i input.m4a output.wav.
Verification
- conda run -n sadtalker python -c 'import torch; print(torch.version)' — must NOT contain '+cu'
- ls ~/SadTalker/checkpoints/ — must contain auido2exp_00300-model.pth, auido2pose_00140-model.pth, epoch_20.pth, and other .pth files
- conda run -n sadtalker python -c 'import dlib; print("ok")' — must print 'ok' (no Illegal Hardware Instruction)
- Smoke test: cd ~/SadTalker && conda run -n sadtalker python inference.py --driven_audio examples/driven_audio/bus_chinese.wav --source_image examples/source_image/full_body_1.png — produces output in results/