text-to-bgm
Generate background music (BGM) from text descriptions using local AI on Apple Silicon Macs. Powered by mlx-audiocraft — Meta's MusicGen ported to Apple's MLX framework.
- 100% offline — no cloud, no API keys, no internet after model download
- Private — all generation happens on-device
- Fast — MLX GPU acceleration on M1/M2/M3/M4
When to Use
Use this skill when the user asks to:
- Generate background music / BGM / instrumental music
- Create ambient music, lo-fi beats, cinematic soundtrack
- Make audio for videos, games, podcasts, or study/focus sessions
- Any text-to-music request on macOS Apple Silicon
Do NOT use for:
- Sound effects → use
audiogen-mlx(included in mlx-audiocraft) - Vocals / singing → MusicGen is weak on vocals
- Full production songs → consider ACE-Step for longer, structured music
Requirements
- macOS with Apple Silicon (M1, M2, M3, M4)
- Python 3.10+ (3.11 recommended)
- RAM: 8 GB minimum for small model, 16 GB+ for large models
- Disk: 1.2 GB (small) to ~6.5 GB (stereo-large) per model
Installation
# One-line install
pip install mlx-audiocraft
# Verify
musicgen-mlx --help
Or use the included setup script:
bash setup.sh
Usage
Basic
musicgen-mlx "lo-fi chill study beats with soft piano" -d 30 -o bgm.wav
Full CLI Reference
musicgen-mlx [-h] [-m MODEL] [-o FILE] [-d SEC] [--no-open]
[--top-k K] [--temperature T] [--cfg-coef C]
prompt
Arguments:
prompt Text description of the music to generate
Options:
-m, --model NAME HuggingFace model (default: facebook/musicgen-small)
-o, --output FILE Output WAV path (default: ./musicgen_output.wav)
-d, --duration SEC Duration in seconds (default: 8)
--no-open Don't open the file after generation
--top-k K Top-k sampling (default: 250)
--temperature T Sampling temperature (default: 1.0)
--cfg-coef C Classifier-free guidance coefficient (default: 3.0)
Models
| Model | Size | Quality | Speed (M4 Max) | Use Case |
|---|---|---|---|---|
facebook/musicgen-small | 300M / 1.2GB | Good | 1.3x realtime | Default choice |
facebook/musicgen-medium | 1.5B / 3.4GB | Better | ~0.6x realtime | Better quality |
facebook/musicgen-large | 3.3B / 5.5GB | Best (mono) | ~0.4x realtime | High quality |
facebook/musicgen-stereo-large | 3.3B / 6.5GB | Best (stereo) | ~0.3x realtime | Final output |
Generation Parameters
| Parameter | Range | Effect |
|---|---|---|
--temperature | 0.5–1.5 | Higher = more creative/varied. Lower = more predictable |
--top-k | 100–500 | Higher = more diverse note choices |
--cfg-coef | 1.0–7.0 | Higher = stricter prompt adherence. Default 3.0 is balanced |
Using the Wrapper Script
For convenience, use generate_bgm.sh:
bash generate_bgm.sh "cinematic orchestral epic trailer music" -d 45
Options: -m <model>, -d <seconds>, -t <temperature>, -o <output>
Python API
import mlx.core as mx
import numpy as np
import soundfile as sf
from audiocraft_mlx.models.musicgen import MusicGen
# Load model
mg = MusicGen.get_pretrained("facebook/musicgen-small")
# Set generation parameters
mg.set_generation_params(duration=30, temperature=1.0, top_k=250, cfg_coef=3.0)
# Generate
audio = mg.generate(["lo-fi jazz hop with smooth saxophone and chill drums"])
# Save
wav = np.array(audio[0]) # [channels, samples]
sf.write("output.wav", wav.T, mg.sample_rate)
Prompt Engineering
See prompts.md for a comprehensive guide with examples.
Quick tips:
- Start with genre/style: "lo-fi hip hop", "cinematic orchestral", "ambient electronic"
- Add instruments: "warm piano", "soft synth pads", "acoustic guitar"
- Describe mood/atmosphere: "relaxing", "uplifting", "mysterious", "energetic"
- Include production qualities: "vinyl crackle", "reverb", "80 BPM", "sidechain compression"
- End with purpose: "perfect for study", "background for podcast", "game menu music"
⚠️ MusicGen works best with English prompts and instrumental music.
Procedure (for AI Agents)
- Check installation: Run
musicgen-mlx --help. If missing, install viapip install mlx-audiocraft. - Gather requirements: Ask the user about genre, mood, duration. If vague, suggest popular BGM styles.
- Select model: Default to
facebook/musicgen-small(fast, 300M). Usefacebook/musicgen-mediumfor better quality orfacebook/musicgen-stereo-largefor best results. - Craft prompt: Build a 10–30 word English description following the prompt guide in prompts.md.
- Set duration: 15–30s for loops, 30–60s for short BGM, 60–120s for longer pieces.
- Generate: Run
musicgen-mlx "<prompt>" -m <model> -d <sec> -o <path>.wav - Report: Tell the user the output file path. Offer to regenerate with adjusted parameters if needed.
Pitfalls
- ⚠️ First run downloads models — 1.2–6.5 GB from HuggingFace. Warn the user.
- ⚠️ Apple Silicon only — does NOT work on Intel Macs.
- ⚠️ Default 8s duration — always specify
-dfor BGM. - ⚠️ Instrumental only — weak on vocals/lyrics.
- ⚠️ Non-commercial license — CC-BY-NC 4.0. Not for commercial use.
- ⚠️ RAM hungry — 8GB machines should use small/medium models only.
- ⚠️ Slow on large models — stereo-large is ~0.3x realtime. A 60s generation takes ~3 minutes.
Verification
# 1. Check CLI
musicgen-mlx --help
# 2. Quick test
musicgen-mlx "piano melody" -d 5 -o /tmp/test_bgm.wav
# 3. Verify output
file /tmp/test_bgm.wav
# Expected: RIFF (little-endian) data, WAVE audio
# 4. Listen
open /tmp/test_bgm.wav
Files
| File | Purpose |
|---|---|
SKILL.md | This file — skill definition and reference |
setup.sh | One-command installation script |
generate_bgm.sh | Wrapper for easy agent-friendly generation |
prompts.md | Prompt engineering guide with genre templates |