Ollama Modelfiles are a powerful tool for creating, customizing, and sharing large language models (LLMs). This guide will walk you through everything you need to know about Modelfiles, from basic concepts to advanced techniques.
A Modelfile is a blueprint that defines how to create and share models within the Ollama ecosystem. It contains instructions and parameters that Ollama uses to set up and run a Large Language Model (LLM).
A Modelfile uses a simple, readable syntax. Each line typically starts with an instruction followed by its arguments. Comments can be added using the #
symbol.
Specifies the base model to use.
Example:
xxxxxxxxxx
FROM llama2
Sets various parameters for running the model.
Example:
xxxxxxxxxx
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
The TEMPLATE instruction defines a structured format for input and output, customizing how the model processes queries and generates responses.
Basic syntax:
xxxxxxxxxx
TEMPLATE """
Your template structure here
"""
Key features:
Placeholders: {{.System}}
, {{.Prompt}}
, {{.Response}}
Special tokens: [INST]
, [/INST]
for instruction formatting
Support for conditional statements and loops
Example:
xxxxxxxxxx
TEMPLATE """
[INST] {{.System}} {{.Prompt}} [/INST]
"""
Specifies a system message that guides the behavior of the chat assistant.
Example:
xxxxxxxxxx
SYSTEM "You are a helpful AI assistant named Claude."
Specifies any LoRA (Low-Rank Adaptation) adapters to adjust the model for specific tasks.
Example:
xxxxxxxxxx
ADAPTER /path/to/your/adapter.bin
Indicates the legal license under which the model is shared or distributed.
Example:
xxxxxxxxxx
LICENSE Apache 2.0
Used to specify message history for the model to consider when responding.
Example:
xxxxxxxxxx
MESSAGE Human: Hello, how are you?
MESSAGE AI: I'm doing well, thank you for asking! How can I assist you today?
Here's an example of a simple Modelfile:
xxxxxxxxxx
FROM mistral
PARAMETER temperature 1
PARAMETER num_ctx 4096
SYSTEM "You are a friendly AI assistant. Your responses should be helpful, concise, and engaging."
To use your Modelfile:
Save the Modelfile content to a file (e.g., CustomAssistant
).
Create a model based on this file:
xxxxxxxxxx
ollama create custom-assistant -f CustomAssistant
Run the model:
xxxxxxxxxx
ollama run custom-assistant
Here are some key parameters you can adjust in your Modelfile:
Purpose: Controls the creativity and coherence of the model's responses.
Example: PARAMETER mirostat 1
Purpose: Adjusts how quickly the model adapts to the current context.
Example: PARAMETER mirostat_eta 0.1
Purpose: Balances consistency and variety in the model's outputs.
Example: PARAMETER mirostat_tau 5
Purpose: Sets the context length, which is how much previous text the model considers.
Example: PARAMETER num_ctx 2048
Purpose: Specifies the number of groups in certain parts of the model architecture.
Example: PARAMETER num_gqa 8
Purpose: Sets the number of GPUs to use for model computation.
Example: PARAMETER num_gpu 1
Purpose: Sets the number of CPU threads to use.
Example: PARAMETER num_thread 4
Purpose: Controls how far back the model looks to avoid repetition.
Example: PARAMETER repeat_last_n 64
Purpose: Penalizes the model for repeating words or phrases.
Example: PARAMETER repeat_penalty 1.1
Purpose: Adjusts the randomness or "creativity" of the model's responses.
Example: PARAMETER temperature 0.7
Purpose: Sets a starting point for reproducible text generation.
Example: PARAMETER seed 42
Purpose: Specifies phrases that will cause the model to stop generating text.
Example: PARAMETER stop "END OF RESPONSE"
Purpose: Adjusts the probability of less likely words appearing in the output.
Example: PARAMETER tfs_z 1
Purpose: Sets the maximum number of tokens (roughly words) to generate.
Example: PARAMETER num_predict 100
Purpose: Limits the number of words the model considers for each token generation.
Example: PARAMETER top_k 40
Purpose: Controls the diversity of the model's word choices.
Example: PARAMETER top_p 0.9
Example usage:
xxxxxxxxxx
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
PARAMETER top_k 50
PARAMETER repeat_penalty 1.1
Combining Multiple Parameters: Fine-tune your model by adjusting several parameters together.
Using Templates: Create custom input/output formats for specific use cases.
Incorporating Adapters: Utilize LoRA adapters to specialize your model for particular tasks.
Customizing System Messages: Craft detailed system messages to guide the model's behavior and personality.
Ollama provides a library of pre-configured Modelfiles. You can view these using the ollama show
command:
xxxxxxxxxx
ollama show modelname
This displays the Modelfile of any local model, which can serve as inspiration or a starting point for your custom models.
Start with a Base Model: Choose an appropriate base model that aligns with your project's needs.
Iterative Testing: Create your Modelfile in stages, testing each addition or change.
Parameter Experimentation: Don't hesitate to adjust parameters and observe their effects on model behavior.
Documentation: Use comments in your Modelfile to explain your choices and configurations.
Version Control: Keep track of different versions of your Modelfile as you refine your model.
Ollama Modelfiles offer a powerful and flexible way to customize and deploy LLMs on your local machine. By mastering the creation and modification of Modelfiles, you can tailor AI models to your specific needs, whether for personal projects, research, or professional applications. Remember that the key to success with Modelfiles is experimentation and iteration. Don't be afraid to try different configurations and learn from each attempt. Happy modeling!