Communitygithub.com

AaronAust1n/china-productivity-skills

13 Agent Skills for Chinese internet data retrieval, AI research monitoring, and content production — filling the gap in PilotDeck skill ecosystem

Was ist china-productivity-skills?

china-productivity-skills is a Codex agent skill that 13 Agent Skills for Chinese internet data retrieval, AI research monitoring, and content production — filling the gap in PilotDeck skill ecosystem.

Funktioniert mit~Claude CodeCodex CLI~Cursor
npx skills add AaronAust1n/china-productivity-skills

In Ihrer bevorzugten KI fragen

Öffnet einen neuen Chat, in dem dieser Agent-Skill bereits geladen ist.

Dokumentation

VoxCPM Text-to-Speech

Synthesize speech from Chinese/English/multilingual text using OpenBMB's VoxCPM. Two model tiers, three input modes, thirty languages, nine Chinese dialects. Apache-2.0 licensed.

Verified 2026-07-04: voxcpm 2.0.3 on PyPI, dependency graph resolves cleanly (torch ≥ 2.5, transformers ≥ 4.36, gradio 6.x). PyPI reachable. Model weights host on Hugging Face (openbmb/VoxCPM2, openbmb/VoxCPM-0.5B).

This environment (QoderWork sandbox) has no GPU and cannot reach huggingface.co directly — audio generation is not exercised here. Deploy on a machine with CUDA / Apple Silicon / a HF mirror to actually synthesize.

Prerequisites

ComponentRequirement
Python≥ 3.10, < 3.13 (officially tested)
PyTorch≥ 2.5.0
CUDA≥ 12.0 (recommended); MPS or CPU also work (slower)
GPU VRAM5 GB for VoxCPM-0.5B, 8 GB for VoxCPM2 (2B)
Disk~2 GB per model (weights auto-download from HF)
NetworkAccess to huggingface.co (or set HF_ENDPOINT to a mirror)

Install:

pip install voxcpm

For China deployments where HF is slow, use ModelScope mirror:

export HF_ENDPOINT=https://hf-mirror.com
# or use ModelScope directly:
pip install modelscope

Quick Start

CLI (simplest)

# Basic synthesis to WAV
voxcpm design --text "你好,欢迎收听三分钟 AI 快报。" --output greeting.wav

# With voice-style natural-language description
voxcpm design --text "(年轻女声,温柔甜美) 今天天气真好,我们来聊聊 AI。" --output warm.wav

# English
voxcpm design --text "Welcome to today's AI briefing." --output en.wav

# Batch
voxcpm batch --input-file lines.txt --output-dir out/

Python SDK (programmatic)

from voxcpm import VoxCPM
import soundfile as sf

model = VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)

wav = model.generate(
    text="你好,这是一段测试语音。",
    cfg_value=2.0,           # classifier-free guidance strength (higher = more expressive)
    inference_timesteps=10,  # diffusion steps (more = better quality, slower)
    seed=42                  # reproducibility
)
sf.write("demo.wav", wav, model.tts_model.sample_rate)  # 24 kHz output

Streaming synthesis (for long text)

for chunk in model.generate_streaming(text="很长的一段文本..."):
    # chunk is np.ndarray of PCM samples
    process(chunk)

Input Modes

1. Design (natural-language voice control)

Wrap voice description in () at the start of the text:

"(年轻女声,语气甜美,稍快语速) 今天要给大家介绍一款有意思的 AI 工具。"
"(中年男声,稳重、专业) 欢迎收听本期节目。"
"(带四川口音的女生,轻松) 这个咋回事嘛!"

Available cues (composable):

  • Voice type: 男声 / 女声 / 童声 / 老年男声 / 中年女声 ...
  • Tone: 温柔 / 严肃 / 兴奋 / 平静 / 忧郁 / 慵懒 ...
  • Pace: 快语速 / 慢语速 / 稍快 / 稍慢
  • Style: 播音腔 / 口语化 / 说书人 / 儿童剧 / 广告腔
  • Dialect: 四川话 / 粤语 / 吴语 / 东北话 / 河南话 / 陕西话 / 山东话 / 天津话 / 闽南话

2. Voice cloning (zero-shot)

Provide a 3-30 second reference audio + its transcript:

wav = model.generate(
    text="要合成的新文本。",
    prompt_wav_path="reference.wav",
    prompt_text="参考音频对应的原文(准确抄写)"
)

The generated voice matches the reference speaker.

3. Combined (clone + style tweak)

wav = model.generate(
    text="(稍快,兴奋) 要合成的新文本。",
    reference_wav_path="reference.wav"  # softer form of clone; keeps timbre but style overrides
)

Common Workflows

End-to-end: script → audio

Pair with podcast-scriptwriter. Given a script:

A(轻松):欢迎收听《三分钟 AI 快报》。
B(认真):本周动作不小。
import re
from voxcpm import VoxCPM
import soundfile as sf, numpy as np

model = VoxCPM.from_pretrained("openbmb/VoxCPM2", load_denoiser=False)
voices = {
    'A': "reference_female.wav",   # or use "(年轻女声,语气理性犀利)" as prefix
    'B': "reference_male.wav",
}

segments = []
for line in open('script.txt', encoding='utf-8'):
    m = re.match(r'^([AB])(([^)]+)):(.+)$', line.strip())
    if not m: continue
    speaker, tone, text = m.groups()
    styled = f"({tone}) {text}"
    wav = model.generate(text=styled, prompt_wav_path=voices[speaker], cfg_value=2.0, inference_timesteps=10)
    segments.append(wav)
    segments.append(np.zeros(int(0.4 * model.tts_model.sample_rate)))  # 0.4s pause

full = np.concatenate(segments)
sf.write('episode.wav', full, model.tts_model.sample_rate)

Multi-language podcast

Feed the same script translated to en/ja/ko/fr; VoxCPM auto-detects language per line. Keep the same reference audio to preserve host identity across languages.

High-throughput vLLM serving

For volume, use the OpenAI-compatible vLLM Omni server:

vllm serve openbmb/VoxCPM2 --omni --port 8000

Then hit /v1/audio/speech like OpenAI TTS.

Fallback: edge-tts (no GPU)

If VoxCPM is impossible in the environment (no GPU, HF blocked, tight time), fall back to Microsoft Edge TTS (free, no API key, 400+ voices):

pip install edge-tts
edge-tts --voice zh-CN-XiaoxiaoNeural --text "你好世界" --write-media hello.mp3

Trade-offs: no voice cloning, fixed voice pool, network-dependent, not open-source. Acceptable stop-gap.

Limitations

  • GPU strongly recommended — CPU works but ~10-50× slower.
  • First-run downloads ~2 GB from HF; use HF_ENDPOINT for a mirror if slow.
  • CUDA 12+ requires recent driver — older drivers may need CUDA 11 build; check torch.cuda.is_available().
  • Long text (> 2 min): use generate_streaming and stitch chunks; pure generate may hit VRAM ceiling.
  • English-only clone: works but sample quality slightly lower than Chinese-native.
  • Dialects require prefix format(粤语) 早晨。 not 粤语:早晨。.
  • No word-level timing output — for karaoke use whisper-like STT alignment on the produced WAV.
  • Sample rate is fixed at 24 kHz — resample downstream if needed.
  • Python 3.13+ untested — pin to 3.10-3.12.

Failure Modes

SymptomCauseAction
torch.cuda.OutOfMemoryErrorModel too big for GPUSwitch to openbmb/VoxCPM-0.5B, or use CPU
Model download stallsHF connectivitySet HF_ENDPOINT=https://hf-mirror.com
Robotic / metallic voicecfg_value too highTry 1.5-2.5; sweet spot is 2.0
Voice clone doesn't matchReference too short/noisyUse 5-15 s clean sample; enable load_denoiser=True
Language auto-detected wrongText mixedSplit by language, generate per-segment
CLI voxcpm: command not foundNot in venv PATHpip install voxcpm in the active env; or python -m voxcpm ...

References

Verwandte Skills