OLLAMA FORMAT
Why Ollama Modelfiles? Because you can bake your character into a reusable model, run it from CLI, lock in parameters, and share it with others. It's portable and powerful.
THE PLIST FORMAT
PList is bracket-and-equals syntax. Scannable, token-efficient, clear to the model:
[Name: CharacterName]
[Personality: trait1, trait2, trait3]
[Speech: style1, style2]
[Relationship: How they relate to the user]
Example:
[Name: Luna]
[Personality: warm, curious, patient, witty, occasionally sarcastic]
[Speech: casual, uses contractions, natural *actions*, no emoji]
[Relationship: Asks one follow-up per message. Shows genuine interest. Not pushy.]
THE ALI:CHAT FORMAT
Ali:Chat uses <START> and <END> tags with {{char}} and {{user}} variables for dialogue examples:
<START>
{{user}}: What should I do with my career?
{{char}}: *leans back thoughtfully* Do you want advice, or do you need to vent? Because
there's a difference and I can't help with the second one without knowing that first.
{{user}}: I guess advice would help.
{{char}}: Okay. What's keeping you stuck?
<END>
The variables {{char}} and {{user}} are placeholders. When the model runs, it replaces them with the actual character and user names.
COMBINING PLIST + ALI:CHAT
The strongest approach uses both. PList for concise traits, Ali:Chat for dialogue examples:
[Name: Luna]
[Personality: warm, curious, patient, witty, occasionally sarcastic]
[Speech: casual, uses contractions, natural *actions*, no emoji]
<START>
{{user}}: I had a rough day at work.
{{char}}: *sits down next to you* That bad? Want to talk about it?
{{user}}: My boss was impossible.
{{char}}: *listens* Is this temporary frustration, or has it been building?
{{user}}: Building, I think.
{{char}}: Then we need to talk about what leaving looks like. What would that feel like?
<END>
Simple, clear, powerful.
THE OLLAMA MODELFILE
A Modelfile is how you tell Ollama to create a custom model. Here's the complete template for Magidonia:
---BEGIN MODELFILE---
FROM hf.co/bartowski/TheDrummer_Magidonia-24B-v4.3-GGUF:Q8_0
SYSTEM """ [Name: Luna] [Personality: warm, curious, patient, witty, occasionally sarcastic] [Speech: casual, uses contractions, natural *actions*, no emoji] [Relationship: Asks one follow-up per message. Shows genuine interest. Not pushy.]
<START> {{user}}: I had a rough day. {{char}}: *sits down next to you* That bad? Want to talk about it, or just vent?
{{user}}: My boss was impossible today. {{char}}: *listens* Temporary frustration or building resentment?
{{user}}: Building, I think. {{char}}: Then we need to talk about what leaving looks like. What would that feel like? <END>
[Opening message: Luna is relaxed on her couch when you message. She glances up.]
Luna: Hey. What's up? """
PARAMETER temperature 1.0 PARAMETER top_k 40 PARAMETER top_p 0.95 PARAMETER min_p 0.02 PARAMETER repeat_penalty 1.0
PARAMETER stop "User:" PARAMETER stop "\nUser:"
---END MODELFILE---
Let me break down each part:
THE FROM LINE
FROM hf.co/bartowski/TheDrummer_Magidonia-24B-v4.3-GGUF:Q8_0
This is the base model. It's the Magidonia-24B in Q8_0 quantization (high quality, larger file size). If you have less VRAM, use Q5_K_M or Q4_K_M instead.
Alternative quantizations:
Q8_0 = ~24GB VRAM, highest quality
Q6_K = ~18GB VRAM, excellent quality
Q5_K_M = ~16GB VRAM, very good quality
Q4_K_M = ~8-10GB VRAM, good quality
Q3_K_M = ~6-8GB VRAM, acceptable quality
If you're unsure, start with Q5_K_M. It's the sweet spot for most hardware.
THE SYSTEM PROMPT
Everything between SYSTEM """ and """ is your character card and examples.
This is where your PList + Ali:Chat combo goes. Keep it concise but complete. Aim for 500-800 tokens total.
THE PARAMETERS
PARAMETER temperature 0.95
Temperature controls randomness. Lower = more predictable, higher = more creative. 0.95 is a good default for character chat. Warmer than strict (0.7), not too crazy (1.5+).
PARAMETER top_k 40
Top-k limits the model to the top 40 most likely next tokens. This prevents completely random garbage. 40 is solid.
PARAMETER top_p 0.95
Top-p (nucleus sampling) keeps tokens until cumulative probability reaches 95%. Works together with top_k for controlled variety.
PARAMETER min_p 0.02
Minimum probability threshold. Filters out low-probability garbage tokens. 0.02 is the sweet spot for Magidonia roleplay.
PARAMETER repeat_penalty 1.0
For Magidonia, this MUST be 1.0 (disabled). See Section 05 for why. Never set it above 1.0 -- it degrades output quality on this model.
RECOMMENDED PRESET for Magidonia:
PARAMETER temperature 1.0
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER min_p 0.02
PARAMETER repeat_penalty 1.0
These balance creativity with coherence. If you want more creative/chaotic:
PARAMETER temperature 1.2
PARAMETER top_k 50
PARAMETER top_p 0.98
If you want more focused/consistent:
PARAMETER temperature 0.8
PARAMETER top_k 30
PARAMETER top_p 0.9
STOP SEQUENCES
PARAMETER stop "User:"
PARAMETER stop "\nUser:"
Stop sequences tell the model when to stop generating. Without these, the model keeps going and starts writing the user's next message, which breaks the conversation.
"User:" and "\nUser:" cover both cases—the model stopping right when you would type next.
You can add more if needed. For example, if your character's name is "Luna":
PARAMETER stop "User:"
PARAMETER stop "\nUser:"
PARAMETER stop "Luna:"
PARAMETER stop "\nLuna:"
The second Luna stop prevents the model from generating multiple Luna messages in a row.
CREATING THE MODEL
Save the Modelfile as a plain text file, e.g., luna.modelfile.
Then create the model:
ollama create luna -f luna.modelfile
Ollama will process it and create a new model named "luna" in your local library.
You can see it in the list:
ollama list
And run it:
ollama run luna
The model name "luna" is clean and easy. The base model name (hf.co/bartowski/TheDrummer_Magidonia-24B-v4.3-GGUF:Q8_0) is a mouthful, so Modelfiles let you create aliases.
RUNNING FROM CLI
Once created, you can run your character from the command line:
ollama run luna "Hey Luna, what's up?"
Or just:
ollama run luna
Which enters interactive mode. Type messages, get character responses. Press Ctrl+D to exit.
This is useful for quick testing before using it in OpenWebUI.
VARIABLES AND TAGS REFERENCE
In the SYSTEM prompt, these variables are available:
{{char}} = Character name
{{user}} = User name
{{random}} = Random seed for reproducibility
Tags:
<START>...</START> = Dialogue example
<END> = End of dialogue example
You can include multiple dialogue examples. Just repeat <START>...</START> blocks.
EMBEDDING THE FIRST MESSAGE
The first message is critical. Include it in a note in your SYSTEM prompt:
[Opening message: Luna is on the couch when you message. She glances up.]
Luna: Hey. What's up?
When the model starts the conversation, this style teaches it how to respond.
Some Modelfiles embed the opening as the last line of the SYSTEM prompt itself:
"""
[character description]
[examples]
Luna: Hey. What's up?
"""
This works, but it can confuse Ollama's chat parser. Better to include it as a note and let the conversation flow naturally.
ALTERNATIVE BASE MODELS
If Magidonia-24B doesn't fit your hardware, here are solid alternatives:
For similar character performance:
- ollama pull hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO:Q5_K_M
- ollama pull hf.co/meta-llama/Llama-2-13b:Q5_K_M
For smaller:
- ollama pull hf.co/zephyr-7b:Q5_K_M
Just replace the FROM line. Everything else in the Modelfile stays the same.
TESTING YOUR MODELFILE
After creating the model, test it with various prompts:
ollama run luna "Hey, I had a rough day."
ollama run luna "What's your philosophy on work?"
ollama run luna "Tell me a joke."
ollama run luna "[What's your favorite book?]"
Check for:
- Natural tone (matches the character card)
- Appropriate length (not too short, not novella-length)
- Character consistency (does Luna stay in character?)
- One follow-up per message (doesn't overwhelm with three questions)
- Stop sequences working (doesn't generate "User:" lines)
If any of these fail, adjust the character card or parameters.
QUICK REFERENCE
Modelfile structure:
- FROM [base model]
- SYSTEM [quoted string with character card + examples]
- PARAMETER lines [temperature, top_k, top_p, min_p, repeat_penalty]
- PARAMETER stop sequences [stop "User:", etc.]
Creating and running:
ollama create [name] -f [modelfile path]
ollama run [name]
ollama list
Character card in SYSTEM should include:
- PList format traits and speech style
- Ali:Chat examples (3-5 dialogues)
- Opening message style note
- Total 500-800 tokens
BEFORE YOU CREATE
Checklist:
[ ] Character card is complete (PList + examples)
[ ] SYSTEM prompt is between 500-800 tokens
[ ] Stop sequences are set (at minimum: "User:" and "\nUser:")
[ ] Parameters are tuned for your hardware
[ ] Modelfile syntax is correct (FROM, SYSTEM, PARAMETER)
[ ] Base model quantization matches your VRAM
[ ] Model name is chosen (short, memorable)
Once you check these, you're ready to create and run.
Next section: OpenWebUI format, which is browser-based and even more flexible.