Techalicious Academy / 2026-03-19-chatbot

Visit our meetup for more great tutorials

OLLAMA FORMAT

Why Ollama Modelfiles? Because you can bake your character into a reusable model, run it from CLI, lock in parameters, and share it with others. It's portable and powerful.

THE PLIST FORMAT

PList is bracket-and-equals syntax. Scannable, token-efficient, clear to the model:

[Name: CharacterName]
[Personality: trait1, trait2, trait3]
[Speech: style1, style2]
[Relationship: How they relate to the user]

Example:

[Name: Luna]
[Personality: warm, curious, patient, witty, occasionally sarcastic]
[Speech: casual, uses contractions, natural *actions*, no emoji]
[Relationship: Asks one follow-up per message. Shows genuine interest. Not pushy.]

THE ALI:CHAT FORMAT

Ali:Chat uses <START> and <END> tags with {{char}} and {{user}} variables for dialogue examples:

<START>
{{user}}: What should I do with my career?
{{char}}: *leans back thoughtfully* Do you want advice, or do you need to vent? Because
         there's a difference and I can't help with the second one without knowing that first.
{{user}}: I guess advice would help.
{{char}}: Okay. What's keeping you stuck?
<END>

The variables {{char}} and {{user}} are placeholders. When the model runs, it replaces them with the actual character and user names.

COMBINING PLIST + ALI:CHAT

The strongest approach uses both. PList for concise traits, Ali:Chat for dialogue examples:

[Name: Luna]
[Personality: warm, curious, patient, witty, occasionally sarcastic]
[Speech: casual, uses contractions, natural *actions*, no emoji]

<START>
{{user}}: I had a rough day at work.
{{char}}: *sits down next to you* That bad? Want to talk about it?

{{user}}: My boss was impossible.
{{char}}: *listens* Is this temporary frustration, or has it been building?

{{user}}: Building, I think.
{{char}}: Then we need to talk about what leaving looks like. What would that feel like?
<END>

Simple, clear, powerful.

THE OLLAMA MODELFILE

A Modelfile is how you tell Ollama to create a custom model. Here's the complete template for Magidonia:

---BEGIN MODELFILE---

FROM hf.co/bartowski/TheDrummer_Magidonia-24B-v4.3-GGUF:Q8_0

SYSTEM """ [Name: Luna] [Personality: warm, curious, patient, witty, occasionally sarcastic] [Speech: casual, uses contractions, natural *actions*, no emoji] [Relationship: Asks one follow-up per message. Shows genuine interest. Not pushy.]

<START> {{user}}: I had a rough day. {{char}}: *sits down next to you* That bad? Want to talk about it, or just vent?

{{user}}: My boss was impossible today. {{char}}: *listens* Temporary frustration or building resentment?

{{user}}: Building, I think. {{char}}: Then we need to talk about what leaving looks like. What would that feel like? <END>

[Opening message: Luna is relaxed on her couch when you message. She glances up.]

Luna: Hey. What's up? """

PARAMETER temperature 1.0 PARAMETER top_k 40 PARAMETER top_p 0.95 PARAMETER min_p 0.02 PARAMETER repeat_penalty 1.0

PARAMETER stop "User:" PARAMETER stop "\nUser:"

---END MODELFILE---

Let me break down each part:

THE FROM LINE

FROM hf.co/bartowski/TheDrummer_Magidonia-24B-v4.3-GGUF:Q8_0

This is the base model. It's the Magidonia-24B in Q8_0 quantization (high quality, larger file size). If you have less VRAM, use Q5_K_M or Q4_K_M instead.

Alternative quantizations:

Q8_0    = ~24GB VRAM, highest quality
Q6_K    = ~18GB VRAM, excellent quality
Q5_K_M  = ~16GB VRAM, very good quality
Q4_K_M  = ~8-10GB VRAM, good quality
Q3_K_M  = ~6-8GB VRAM, acceptable quality

If you're unsure, start with Q5_K_M. It's the sweet spot for most hardware.

THE SYSTEM PROMPT

Everything between SYSTEM """ and """ is your character card and examples.

This is where your PList + Ali:Chat combo goes. Keep it concise but complete. Aim for 500-800 tokens total.

THE PARAMETERS

PARAMETER temperature 0.95

Temperature controls randomness. Lower = more predictable, higher = more creative. 0.95 is a good default for character chat. Warmer than strict (0.7), not too crazy (1.5+).

PARAMETER top_k 40

Top-k limits the model to the top 40 most likely next tokens. This prevents completely random garbage. 40 is solid.

PARAMETER top_p 0.95

Top-p (nucleus sampling) keeps tokens until cumulative probability reaches 95%. Works together with top_k for controlled variety.

PARAMETER min_p 0.02

Minimum probability threshold. Filters out low-probability garbage tokens. 0.02 is the sweet spot for Magidonia roleplay.

PARAMETER repeat_penalty 1.0

For Magidonia, this MUST be 1.0 (disabled). See Section 05 for why. Never set it above 1.0 -- it degrades output quality on this model.

RECOMMENDED PRESET for Magidonia:

PARAMETER temperature 1.0
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER min_p 0.02
PARAMETER repeat_penalty 1.0

These balance creativity with coherence. If you want more creative/chaotic:

PARAMETER temperature 1.2
PARAMETER top_k 50
PARAMETER top_p 0.98

If you want more focused/consistent:

PARAMETER temperature 0.8
PARAMETER top_k 30
PARAMETER top_p 0.9

STOP SEQUENCES

PARAMETER stop "User:"
PARAMETER stop "\nUser:"

Stop sequences tell the model when to stop generating. Without these, the model keeps going and starts writing the user's next message, which breaks the conversation.

"User:" and "\nUser:" cover both cases—the model stopping right when you would type next.

You can add more if needed. For example, if your character's name is "Luna":

PARAMETER stop "User:"
PARAMETER stop "\nUser:"
PARAMETER stop "Luna:"
PARAMETER stop "\nLuna:"

The second Luna stop prevents the model from generating multiple Luna messages in a row.

CREATING THE MODEL

Save the Modelfile as a plain text file, e.g., luna.modelfile.

Then create the model:

ollama create luna -f luna.modelfile

Ollama will process it and create a new model named "luna" in your local library.

You can see it in the list:

ollama list

And run it:

ollama run luna

The model name "luna" is clean and easy. The base model name (hf.co/bartowski/TheDrummer_Magidonia-24B-v4.3-GGUF:Q8_0) is a mouthful, so Modelfiles let you create aliases.

RUNNING FROM CLI

Once created, you can run your character from the command line:

ollama run luna "Hey Luna, what's up?"

Or just:

ollama run luna

Which enters interactive mode. Type messages, get character responses. Press Ctrl+D to exit.

This is useful for quick testing before using it in OpenWebUI.

VARIABLES AND TAGS REFERENCE

In the SYSTEM prompt, these variables are available:

{{char}}      = Character name
{{user}}      = User name
{{random}}    = Random seed for reproducibility

Tags:

<START>...</START>  = Dialogue example
<END>               = End of dialogue example

You can include multiple dialogue examples. Just repeat <START>...</START> blocks.

EMBEDDING THE FIRST MESSAGE

The first message is critical. Include it in a note in your SYSTEM prompt:

[Opening message: Luna is on the couch when you message. She glances up.]
Luna: Hey. What's up?

When the model starts the conversation, this style teaches it how to respond.

Some Modelfiles embed the opening as the last line of the SYSTEM prompt itself:

"""
[character description]
[examples]

Luna: Hey. What's up?
"""

This works, but it can confuse Ollama's chat parser. Better to include it as a note and let the conversation flow naturally.

ALTERNATIVE BASE MODELS

If Magidonia-24B doesn't fit your hardware, here are solid alternatives:

For similar character performance:

For smaller:

Just replace the FROM line. Everything else in the Modelfile stays the same.

TESTING YOUR MODELFILE

After creating the model, test it with various prompts:

ollama run luna "Hey, I had a rough day."
ollama run luna "What's your philosophy on work?"
ollama run luna "Tell me a joke."
ollama run luna "[What's your favorite book?]"

Check for:

If any of these fail, adjust the character card or parameters.

QUICK REFERENCE

Modelfile structure:

  1. FROM [base model]
  2. SYSTEM [quoted string with character card + examples]
  3. PARAMETER lines [temperature, top_k, top_p, min_p, repeat_penalty]
  4. PARAMETER stop sequences [stop "User:", etc.]

Creating and running:

ollama create [name] -f [modelfile path]
ollama run [name]
ollama list

Character card in SYSTEM should include:

BEFORE YOU CREATE

Checklist:

[ ] Character card is complete (PList + examples)
[ ] SYSTEM prompt is between 500-800 tokens
[ ] Stop sequences are set (at minimum: "User:" and "\nUser:")
[ ] Parameters are tuned for your hardware
[ ] Modelfile syntax is correct (FROM, SYSTEM, PARAMETER)
[ ] Base model quantization matches your VRAM
[ ] Model name is chosen (short, memorable)

Once you check these, you're ready to create and run.

Next section: OpenWebUI format, which is browser-based and even more flexible.