Name: haoyiyin/sadtalker-mac
Author: Community

When to Use

User wants to create a talking-head / digital-human video from a portrait photo and an audio file (口播视频, 数字人生成, photo talk). Triggers: "生成口播视频", "make this photo talk", "照片说话", "数字人视频", "sadtalker", "talking head", "avatar video from photo". Applies to macOS (Apple Silicon or Intel) only.

Procedure

Locate the scripts: they live in the same directory as this SKILL.md. Find them with: ls "$(dirname <path-to-this-SKILL.md>)" — you'll see setup.sh and generate.sh alongside this file.
Check prerequisites: ffmpeg, conda env sadtalker, ~/SadTalker/checkpoints/. If anything is missing, run setup.sh (first time only, ~15 min, ~2.5GB download).
Confirm the user has provided a portrait photo (.jpg/.png) and speech audio file (.wav/.mp3). If audio is in another format, convert with ffmpeg first.
Run generation: bash generate.sh <photo> <audio> [--enhancer gfpgan] [--still] [--preprocess full|resize|crop]. This invokes conda run -n sadtalker python inference.py under the hood.
The output .mp4 appears in ~/SadTalker/results//. Open it with: open <path> (macOS). Report the path to the user.

Pitfalls

Python MUST be 3.10 inside the conda env. 3.8 misses Apple Silicon wheels, 3.11+ has no scikit-image==0.19.3 wheel.
8GB RAM: close Chrome/IDE before running. SadTalker is pure CPU on Mac — no GPU path exists.
dlib must be installed separately on Mac: conda run -n sadtalker pip install dlib — this is the #1 M1 error.
Install torch WITHOUT CUDA suffix: plain pip install torch torchvision torchaudio. If torch.version contains '+cu', reinstall.
ffmpeg via brew, not conda. Conda's ffmpeg sometimes lacks needed codecs.
First run downloads ~2GB checkpoints. In China, set HF_ENDPOINT=https://hf-mirror.com first.
GFPGAN enhancer (--enhancer gfpgan) roughly doubles processing time. Skip for quick previews.
--still mode expects full-body source photos when combined with --preprocess full.
Audio must be .wav or .mp3. Other formats error with 'Header missing'. Convert with ffmpeg: ffmpeg -i input.m4a output.wav.

Verification

conda run -n sadtalker python -c 'import torch; print(torch.version)' — must NOT contain '+cu'
ls ~/SadTalker/checkpoints/ — must contain auido2exp_00300-model.pth, auido2pose_00140-model.pth, epoch_20.pth, and other .pth files
conda run -n sadtalker python -c 'import dlib; print("ok")' — must print 'ok' (no Illegal Hardware Instruction)
Smoke test: cd ~/SadTalker && conda run -n sadtalker python inference.py --driven_audio examples/driven_audio/bus_chinese.wav --source_image examples/source_image/full_body_1.png — produces output in results/

haoyiyin/sadtalker-mac

Ask in your favorite AI

Documentation

When to Use

Procedure

Pitfalls

Verification

Related Skills

steipete/video-frames

steipete/summarize

affaan-m/motion-patterns

affaan-m/motion-advanced

affaan-m/motion-ui

affaan-m/angular-developer