text-to-bgm

Generate background music (BGM) from text descriptions using local AI on Apple Silicon Macs. Powered by mlx-audiocraft — Meta's MusicGen ported to Apple's MLX framework.

100% offline — no cloud, no API keys, no internet after model download
Private — all generation happens on-device
Fast — MLX GPU acceleration on M1/M2/M3/M4

When to Use

Use this skill when the user asks to:

Generate background music / BGM / instrumental music
Create ambient music, lo-fi beats, cinematic soundtrack
Make audio for videos, games, podcasts, or study/focus sessions
Any text-to-music request on macOS Apple Silicon

Do NOT use for:

Sound effects → use audiogen-mlx (included in mlx-audiocraft)
Vocals / singing → MusicGen is weak on vocals
Full production songs → consider ACE-Step for longer, structured music

Requirements

macOS with Apple Silicon (M1, M2, M3, M4)
Python 3.10+ (3.11 recommended)
RAM: 8 GB minimum for small model, 16 GB+ for large models
Disk: 1.2 GB (small) to ~6.5 GB (stereo-large) per model

Installation

# One-line install
pip install mlx-audiocraft

# Verify
musicgen-mlx --help

Or use the included setup script:

bash setup.sh

Usage

Basic

musicgen-mlx "lo-fi chill study beats with soft piano" -d 30 -o bgm.wav

Full CLI Reference

musicgen-mlx [-h] [-m MODEL] [-o FILE] [-d SEC] [--no-open]
             [--top-k K] [--temperature T] [--cfg-coef C]
             prompt

Arguments:
  prompt              Text description of the music to generate

Options:
  -m, --model NAME    HuggingFace model (default: facebook/musicgen-small)
  -o, --output FILE   Output WAV path (default: ./musicgen_output.wav)
  -d, --duration SEC  Duration in seconds (default: 8)
  --no-open           Don't open the file after generation
  --top-k K           Top-k sampling (default: 250)
  --temperature T     Sampling temperature (default: 1.0)
  --cfg-coef C        Classifier-free guidance coefficient (default: 3.0)

Models

Model	Size	Quality	Speed (M4 Max)	Use Case
`facebook/musicgen-small`	300M / 1.2GB	Good	1.3x realtime	Default choice
`facebook/musicgen-medium`	1.5B / 3.4GB	Better	~0.6x realtime	Better quality
`facebook/musicgen-large`	3.3B / 5.5GB	Best (mono)	~0.4x realtime	High quality
`facebook/musicgen-stereo-large`	3.3B / 6.5GB	Best (stereo)	~0.3x realtime	Final output

Generation Parameters

Parameter	Range	Effect
`--temperature`	0.5–1.5	Higher = more creative/varied. Lower = more predictable
`--top-k`	100–500	Higher = more diverse note choices
`--cfg-coef`	1.0–7.0	Higher = stricter prompt adherence. Default 3.0 is balanced

Using the Wrapper Script

For convenience, use generate_bgm.sh:

bash generate_bgm.sh "cinematic orchestral epic trailer music" -d 45

Options: -m <model>, -d <seconds>, -t <temperature>, -o <output>

Python API

import mlx.core as mx
import numpy as np
import soundfile as sf
from audiocraft_mlx.models.musicgen import MusicGen

# Load model
mg = MusicGen.get_pretrained("facebook/musicgen-small")

# Set generation parameters
mg.set_generation_params(duration=30, temperature=1.0, top_k=250, cfg_coef=3.0)

# Generate
audio = mg.generate(["lo-fi jazz hop with smooth saxophone and chill drums"])

# Save
wav = np.array(audio[0])  # [channels, samples]
sf.write("output.wav", wav.T, mg.sample_rate)

Prompt Engineering

See prompts.md for a comprehensive guide with examples.

Quick tips:

Start with genre/style: "lo-fi hip hop", "cinematic orchestral", "ambient electronic"
Add instruments: "warm piano", "soft synth pads", "acoustic guitar"
Describe mood/atmosphere: "relaxing", "uplifting", "mysterious", "energetic"
Include production qualities: "vinyl crackle", "reverb", "80 BPM", "sidechain compression"
End with purpose: "perfect for study", "background for podcast", "game menu music"

⚠️ MusicGen works best with English prompts and instrumental music.

Procedure (for AI Agents)

Check installation: Run musicgen-mlx --help. If missing, install via pip install mlx-audiocraft.
Gather requirements: Ask the user about genre, mood, duration. If vague, suggest popular BGM styles.
Select model: Default to facebook/musicgen-small (fast, 300M). Use facebook/musicgen-medium for better quality or facebook/musicgen-stereo-large for best results.
Craft prompt: Build a 10–30 word English description following the prompt guide in prompts.md.
Set duration: 15–30s for loops, 30–60s for short BGM, 60–120s for longer pieces.
Generate: Run musicgen-mlx "<prompt>" -m <model> -d <sec> -o <path>.wav
Report: Tell the user the output file path. Offer to regenerate with adjusted parameters if needed.

Pitfalls

⚠️ First run downloads models — 1.2–6.5 GB from HuggingFace. Warn the user.
⚠️ Apple Silicon only — does NOT work on Intel Macs.
⚠️ Default 8s duration — always specify -d for BGM.
⚠️ Instrumental only — weak on vocals/lyrics.
⚠️ Non-commercial license — CC-BY-NC 4.0. Not for commercial use.
⚠️ RAM hungry — 8GB machines should use small/medium models only.
⚠️ Slow on large models — stereo-large is ~0.3x realtime. A 60s generation takes ~3 minutes.

Verification

# 1. Check CLI
musicgen-mlx --help

# 2. Quick test
musicgen-mlx "piano melody" -d 5 -o /tmp/test_bgm.wav

# 3. Verify output
file /tmp/test_bgm.wav
# Expected: RIFF (little-endian) data, WAVE audio

# 4. Listen
open /tmp/test_bgm.wav

Files

File	Purpose
`SKILL.md`	This file — skill definition and reference
`setup.sh`	One-command installation script
`generate_bgm.sh`	Wrapper for easy agent-friendly generation
`prompts.md`	Prompt engineering guide with genre templates

haoyiyin/text-to-bgm

Ask in your favorite AI

Documentation

text-to-bgm

When to Use

Requirements

Installation

Usage

Basic

Full CLI Reference

Models

Generation Parameters

Using the Wrapper Script

Python API

Prompt Engineering

Procedure (for AI Agents)

Pitfalls

Verification

Files

Related Skills

steipete/video-frames

steipete/summarize

affaan-m/motion-patterns

affaan-m/motion-advanced

affaan-m/motion-ui

affaan-m/angular-developer