Date Created: 2025-05-11
By: 16BitMiker
[ BACK.. ]
Running large language models locally on your Mac might sound intimidatingβbut thanks to projects like LLaMA-Factory, itβs more feasible than ever. This guide walks you through every step required to get LLaMA-Factory up and running on macOS, from Python installation to launching the WebUI and chatting with powerful LLMs like Mistral and TinyLlama.
But why install LLaMA-Factory in the first place?
π§© Letβs take a moment to understand the motivation behind using it.
LLaMA-Factory is a powerful training and inference framework built on top of Hugging Face Transformers. It supports:
π§ͺ Fine-tuning pre-trained models using LoRA (Low-Rank Adaptation)
π₯οΈ Local inference with a friendly WebUI
π Exporting models in formats compatible with Ollama, llama.cpp, or Hugging Face Hub
This makes it an ideal tool if you're looking to:
Customize open-weight models like Mistral or Llama 2 using your own datasets
Experiment with lightweight fine-tuning techniques like LoRA on consumer-grade hardware
Deploy the result to low-latency inference engines like Ollama
Ollama is a fast, local-first runtime for running LLMs using quantized model formats like GGUF. However, Ollama doesnβt currently support training or fine-tuning models directly.
Thatβs where LLaMA-Factory comes in:
You use LLaMA-Factory to train or LoRA-adapt a base model like Mistral-7B
Then, you export the result to GGUF format
Finally, you load it into Ollama for fast, local inference
This workflow gives you the best of both worlds: flexible training and blazing-fast deployment.
We'll cover the full LoRA training and export process in a future postβbut for now, letβs get your environment ready.
macOS ships with a system Python, but weβll install Python 3.10 separately using Homebrew to avoid conflicts and ensure compatibility with LLaMA-Factory.
βx# Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Update Homebrew
brew update
# Install Python 3.10
brew install python@3.10
# Confirm installation
python3.10 --version
xxxxxxxxxx
# Ensure Python 3.10 is used in your shell
echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
xxxxxxxxxx
# Create a directory for virtual environments
mkdir -p ~/venvs
# Create one for llama-factory
python3.10 -m venv ~/venvs/llama-stable
# Activate it
source ~/venvs/llama-stable/bin/activate
# Confirm version
python --version
xxxxxxxxxx
mkdir -p ~/git
cd ~/git
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
xxxxxxxxxx
pip install --upgrade pip setuptools wheel
xxxxxxxxxx
pip install -e .
xxxxxxxxxx
pip install pydantic==2.10.6
pip install "gradio>=4.38.0,<=5.21.0"
pip uninstall -y llamafactory
pip install llamafactory==0.9.2
xxxxxxxxxx
pip install torch torchvision torchaudio
pip install transformers datasets accelerate
pip install einops sentencepiece protobuf
xxxxxxxxxx
# CLI tool location
find ~/venvs/llama-stable/bin -name "llama*"
# Inspect package
ls -la ~/venvs/llama-stable/lib/python3.10/site-packages/llamafactory/
xxxxxxxxxx
# From repo root
llamafactory-cli webui --share --no-inbrowser
Output should include:
xxxxxxxxxx
Running on local URL: http://0.0.0.0:7860
Running on public URL: https://xxxx.gradio.live
Open the public .gradio.live
URL
Or visit http://localhost:7860
Go to the βModelβ tab
Select and download a model like:
π TinyLlama
π§ Phi-2
π Mistral-7B
Switch to the βChatβ tab
Load the downloaded model
Adjust parameters:
Temp: 0.7
Top P: 0.9
Max Tokens: 1024
Type a prompt and hit Enter to see your model respond.
xxxxxxxxxx
python3.10 -m venv ~/venvs/llama-fresh
source ~/venvs/llama-fresh/bin/activate
pip install --upgrade pip
pip install pydantic==2.10.6
pip install llamafactory==0.9.2
pip install torch
xxxxxxxxxx
# Apple Silicon
export PYTORCH_MPS_ENABLE=1
# Intel Macs
export OMP_NUM_THREADS=4
xxxxxxxxxx
llamafactory-cli webui --server-name 0.0.0.0 --port 7860 --share --no-inbrowser
xxxxxxxxxx
mkdir -p ~/models/llama
llamafactory-cli webui --model-path ~/models/llama --share --no-inbrowser
Hereβs a high-level outline of what weβll cover in a follow-up post:
β Prepare a dataset (JSON or CSV format)
β Use the WebUIβs βTrainβ tab
β Configure training parameters for LoRA
β Monitor training progress
β Export the model as GGUF or Hugging Face format
β Load it into Ollama for local inference
This approach enables fast, memory-efficient tuning of models like Mistral on your Macβeven without a dedicated GPU.
You now have LLaMA-Factory fully installed and ready on macOS, complete with dependency fixes and a working WebUI. More importantly:
π§ LLaMA-Factory is your launchpad for fine-tuning open LLMs
π§ It supports LoRA, the most practical tuning method for laptops
π¦ You can export your results to formats usable with Ollama
In our next post, weβll dive deep into the full LoRA training workflowβincluding dataset formatting, training configs, and GGUF export. Stay tuned!
Happy tuning! π§ π οΈ