Communitygithub.com

haoyiyin/text-to-bgm

AI Agent Skill: Generate background music from text descriptions, 100% locally on Apple Silicon Macs

Compatible conClaude Code~Codex CLI~Cursor
npx skills add haoyiyin/text-to-bgm

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Documentación

text-to-bgm

Generate background music (BGM) from text descriptions using local AI on Apple Silicon Macs. Powered by mlx-audiocraft — Meta's MusicGen ported to Apple's MLX framework.

  • 100% offline — no cloud, no API keys, no internet after model download
  • Private — all generation happens on-device
  • Fast — MLX GPU acceleration on M1/M2/M3/M4

When to Use

Use this skill when the user asks to:

  • Generate background music / BGM / instrumental music
  • Create ambient music, lo-fi beats, cinematic soundtrack
  • Make audio for videos, games, podcasts, or study/focus sessions
  • Any text-to-music request on macOS Apple Silicon

Do NOT use for:

  • Sound effects → use audiogen-mlx (included in mlx-audiocraft)
  • Vocals / singing → MusicGen is weak on vocals
  • Full production songs → consider ACE-Step for longer, structured music

Requirements

  • macOS with Apple Silicon (M1, M2, M3, M4)
  • Python 3.10+ (3.11 recommended)
  • RAM: 8 GB minimum for small model, 16 GB+ for large models
  • Disk: 1.2 GB (small) to ~6.5 GB (stereo-large) per model

Installation

# One-line install
pip install mlx-audiocraft

# Verify
musicgen-mlx --help

Or use the included setup script:

bash setup.sh

Usage

Basic

musicgen-mlx "lo-fi chill study beats with soft piano" -d 30 -o bgm.wav

Full CLI Reference

musicgen-mlx [-h] [-m MODEL] [-o FILE] [-d SEC] [--no-open]
             [--top-k K] [--temperature T] [--cfg-coef C]
             prompt

Arguments:
  prompt              Text description of the music to generate

Options:
  -m, --model NAME    HuggingFace model (default: facebook/musicgen-small)
  -o, --output FILE   Output WAV path (default: ./musicgen_output.wav)
  -d, --duration SEC  Duration in seconds (default: 8)
  --no-open           Don't open the file after generation
  --top-k K           Top-k sampling (default: 250)
  --temperature T     Sampling temperature (default: 1.0)
  --cfg-coef C        Classifier-free guidance coefficient (default: 3.0)

Models

ModelSizeQualitySpeed (M4 Max)Use Case
facebook/musicgen-small300M / 1.2GBGood1.3x realtimeDefault choice
facebook/musicgen-medium1.5B / 3.4GBBetter~0.6x realtimeBetter quality
facebook/musicgen-large3.3B / 5.5GBBest (mono)~0.4x realtimeHigh quality
facebook/musicgen-stereo-large3.3B / 6.5GBBest (stereo)~0.3x realtimeFinal output

Generation Parameters

ParameterRangeEffect
--temperature0.5–1.5Higher = more creative/varied. Lower = more predictable
--top-k100–500Higher = more diverse note choices
--cfg-coef1.0–7.0Higher = stricter prompt adherence. Default 3.0 is balanced

Using the Wrapper Script

For convenience, use generate_bgm.sh:

bash generate_bgm.sh "cinematic orchestral epic trailer music" -d 45

Options: -m <model>, -d <seconds>, -t <temperature>, -o <output>

Python API

import mlx.core as mx
import numpy as np
import soundfile as sf
from audiocraft_mlx.models.musicgen import MusicGen

# Load model
mg = MusicGen.get_pretrained("facebook/musicgen-small")

# Set generation parameters
mg.set_generation_params(duration=30, temperature=1.0, top_k=250, cfg_coef=3.0)

# Generate
audio = mg.generate(["lo-fi jazz hop with smooth saxophone and chill drums"])

# Save
wav = np.array(audio[0])  # [channels, samples]
sf.write("output.wav", wav.T, mg.sample_rate)

Prompt Engineering

See prompts.md for a comprehensive guide with examples.

Quick tips:

  • Start with genre/style: "lo-fi hip hop", "cinematic orchestral", "ambient electronic"
  • Add instruments: "warm piano", "soft synth pads", "acoustic guitar"
  • Describe mood/atmosphere: "relaxing", "uplifting", "mysterious", "energetic"
  • Include production qualities: "vinyl crackle", "reverb", "80 BPM", "sidechain compression"
  • End with purpose: "perfect for study", "background for podcast", "game menu music"

⚠️ MusicGen works best with English prompts and instrumental music.

Procedure (for AI Agents)

  1. Check installation: Run musicgen-mlx --help. If missing, install via pip install mlx-audiocraft.
  2. Gather requirements: Ask the user about genre, mood, duration. If vague, suggest popular BGM styles.
  3. Select model: Default to facebook/musicgen-small (fast, 300M). Use facebook/musicgen-medium for better quality or facebook/musicgen-stereo-large for best results.
  4. Craft prompt: Build a 10–30 word English description following the prompt guide in prompts.md.
  5. Set duration: 15–30s for loops, 30–60s for short BGM, 60–120s for longer pieces.
  6. Generate: Run musicgen-mlx "<prompt>" -m <model> -d <sec> -o <path>.wav
  7. Report: Tell the user the output file path. Offer to regenerate with adjusted parameters if needed.

Pitfalls

  • ⚠️ First run downloads models — 1.2–6.5 GB from HuggingFace. Warn the user.
  • ⚠️ Apple Silicon only — does NOT work on Intel Macs.
  • ⚠️ Default 8s duration — always specify -d for BGM.
  • ⚠️ Instrumental only — weak on vocals/lyrics.
  • ⚠️ Non-commercial license — CC-BY-NC 4.0. Not for commercial use.
  • ⚠️ RAM hungry — 8GB machines should use small/medium models only.
  • ⚠️ Slow on large models — stereo-large is ~0.3x realtime. A 60s generation takes ~3 minutes.

Verification

# 1. Check CLI
musicgen-mlx --help

# 2. Quick test
musicgen-mlx "piano melody" -d 5 -o /tmp/test_bgm.wav

# 3. Verify output
file /tmp/test_bgm.wav
# Expected: RIFF (little-endian) data, WAVE audio

# 4. Listen
open /tmp/test_bgm.wav

Files

FilePurpose
SKILL.mdThis file — skill definition and reference
setup.shOne-command installation script
generate_bgm.shWrapper for easy agent-friendly generation
prompts.mdPrompt engineering guide with genre templates

Skills relacionados