Date Created: 2025-05-08
By: 16BitMiker
[ BACK.. ]
Setting up a high-performance text-to-speech system like F5-TTS on Apple Silicon might seem daunting, but with the right steps, itβs surprisingly smooth. I recently configured F5-TTS on a newly refurbished Mac Studio with the M2 Ultra chip, and Iβm here to walk you through the entire installation processβwhat worked, what didnβt, and how to get everything running beautifully on macOS.
Letβs dive in. π§ π
F5-TTS is a state-of-the-art, non-autoregressive text-to-speech system developed by SWivid. It uses flow matching and Diffusion Transformers (DiT) to synthesize high-quality speech quickly and efficiently. It supports real-time inference, multilingual output, and even voice cloningβall in an open-source package.
Hereβs the software and hardware context:
Machine: Mac Studio (M2 Ultra, 128GB RAM)
OS: macOS Sonoma 14.4+
Python: 3.13 (though 3.10 is recommended)
Package Manager: Homebrew
Virtual Environment: venv
Letβs break it down step by step.
βx/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
eval "$(/opt/homebrew/bin/brew shellenv)"
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc
# Install Python 3.10 (recommended) or use existing Python 3.13
brew install python@3.10
brew install git ffmpeg wget
xxxxxxxxxx
mkdir -p $HOME/git
cd $HOME/git
git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
xxxxxxxxxx
python3 -m venv f5tts_env
source f5tts_env/bin/activate
F5-TTS uses a pyproject.toml
, so install it in editable mode:
xxxxxxxxxx
pip install --upgrade pip
pip install -e .
xxxxxxxxxx
pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio
Then verify GPU support:
xxxxxxxxxx
python -c "import torch; print('MPS Available:', torch.backends.mps.is_available())"
β Output:
xxxxxxxxxx
MPS Available: True
xxxxxxxxxx
mkdir -p ckpts/F5TTS_v1_Base
wget -P ckpts/F5TTS_v1_Base https://huggingface.co/SWivid/F5-TTS/resolve/main/F5TTS_v1_Base/model_1250000.safetensors
π Note: This is a ~1.3 GB file and may take a few seconds depending on your internet speed.
First, grab a sample reference voice:
xxxxxxxxxx
curl -L https://github.com/SWivid/F5-TTS/raw/main/examples/reference.wav -o reference.wav
Then run the inference:
xxxxxxxxxx
f5-tts_infer-cli \
--model F5TTS_v1_Base \
--ref_audio reference.wav \
--gen_text "This is F5-TTS running on the M2 Ultra Mac." \
--output_dir results
β Result:
The output audio is saved in the results/
folder.
The voice matches the reference sample remarkably well.
Want a user-friendly way to test your voices?
xxxxxxxxxx
f5-tts_infer-gradio --port 7860
Then open your browser and visit:
Youβll get a full-featured web interface with:
Reference audio upload
Text input
Model selection
Output playback
F5-TTS officially recommends Python 3.10 due to some library constraints. I used Python 3.13 without issues, but if you run into errors (especially with librosa
or numpy
), consider downgrading.
If inference fails with a checkpoint error, double-check your directory:
xxxxxxxxxx
ls ckpts/F5TTS_v1_Base/
# Should show: model_1250000.safetensors
If f5-tts_infer-cli
or f5-tts_infer-gradio
doesn't work, try running the Python modules directly:
xxxxxxxxxx
python -m f5_tts.infer.examples.cli
python -m f5_tts.infer.examples.gradio_app
If youβve rebooted your Mac or closed your terminal and want to relaunch F5-TTS, just follow these quick steps:
β Open Terminal and navigate to your project:
xxxxxxxxxx
cd $HOME/git/F5-TTS
β Reactivate your virtual environment:
xxxxxxxxxx
source f5tts_env/bin/activate
β (Optional) Verify access to CLI tools:
xxxxxxxxxx
f5-tts_infer-cli --help
β Run Inference Again:
xxxxxxxxxx
f5-tts_infer-cli \
--model F5TTS_v1_Base \
--ref_audio reference.wav \
--gen_text "Welcome back! This is F5-TTS after a reboot." \
--output_dir results
β Or launch the Gradio interface:
xxxxxxxxxx
f5-tts_infer-gradio --port 7860
π§ Pro Tip: If you frequently reboot, consider adding an alias to your .zshrc
:
xxxxxxxxxx
echo 'alias f5tts="cd ~/git/F5-TTS && source f5tts_env/bin/activate"' >> ~/.zshrc
Then you can just type:
xxxxxxxxxx
f5tts
And you're ready to roll. π
β
PyTorch MPS support was plug-and-play
β
CLI tools launched without issue
β
Model and reference audio integration was seamless
β
The Gradio UI worked out-of-the-box
β
Voice cloning was fast and expressive
Setting up F5-TTS on an Apple Silicon Mac Studio was not only possibleβit was enjoyable. Performance was smooth, and the installation process was relatively painless thanks to modern Python packaging and MPS support.
If youβre looking to explore real-time voice synthesis, prototyping audiobooks, or building a personalized voice assistant, F5-TTS is a solid open-source starting point.