Techalicious Academy / 2026-03-19-chatbot

Visit our meetup for more great tutorials

CONTEXT MANAGEMENT

The magic of a good chatbot isn't just in the model—it's in how you manage what the model sees. Get this right, and your character stays consistent through long conversations. Get it wrong, and by message 50, your Victorian poet starts speaking like a 2024 gym bro.

This is about stop sequences, context windows, and the art of keeping your character's memory from decaying into noise.

STOP SEQUENCES: WHY YOUR CHATBOT KEEPS TALKING

Here's a problem nobody warns you about: without stop sequences, your model generates BOTH sides of the conversation.

You send: "User: Hi, what's your favorite book?"

Without stop sequences, it generates:

"That would have to be Moby Dick, because—"
AI: "Your favorite book? Well, that would have to be Moby Dick..."
User: "Oh yeah, I love that one too!"
AI: "Of course you do. Everyone with taste does..."

It keeps going until it hits max_tokens. You're reading a monologue where the model role-plays your responses too.

With stop sequences, it stops the MOMENT it sees the user's name coming. Clean. Professional. Predictable.

THE ESSENTIAL STOP SEQUENCES

At minimum, add these:

"User:"
"\nUser:"

These catch the model before it starts generating your next message.

If your character has a different name (like Mark Twain), add variants of that too:

"Mark:"
"\nMark:"
"{{user}}:"
"\n{{user}}:"

The backslash-n versions catch newline + name combinations. In practice, most modern frameworks (SillyTavern, OpenWebUI) handle this for you. But it's worth knowing what's under the hood.

SETTING STOP SEQUENCES ACROSS PLATFORMS

In Ollama Modelfile:

PARAMETER stop "User:"
PARAMETER stop "\nUser:"

You can add multiple PARAMETER stop lines. Ollama reads all of them.

In OpenWebUI API call:

{
  "model": "magidonia",
  "messages": [...],
  "stop": ["User:", "\nUser:"]
}

In OpenWebUI UI:

Advanced > Stop Sequences > Add "User:" and "\nUser:"

In SillyTavern:

Character Settings > Generation > Stop Sequences
Enter each on a separate line.

Pro tip: test with just "User:" first. That handles 95% of cases. Add more only if the model keeps generating your lines.

MULTIPLE CHARACTERS: EXPAND THE STOP LIST

If you're building a group chat scenario, add all character names:

PARAMETER stop "User:"
PARAMETER stop "\nUser:"
PARAMETER stop "Alice:"
PARAMETER stop "\nAlice:"
PARAMETER stop "Bob:"
PARAMETER stop "\nBob:"

The model will stop when it sees any of these approaching. This keeps conversations from becoming a monologue where one character narrates everyone else's lines.

THE CHAT TEMPLATE: MISTRAL V7-TEKKEN

Magidonia uses the Mistral V7-Tekken chat format under the hood. This is good news: it's widely supported, and Ollama handles it automatically.

You don't need to craft special prompt formatting. Ollama knows how to wrap your messages in the right tokens. Just use:

system: Your character description
user: User's message
assistant: Character's response

Ollama does the rest.

(This matters because some models use different formats—ChatML, Alpaca, LLaMA-2. But Magidonia is Tekken through and through.)

THE CONTEXT WINDOW PROBLEM

Here's where things get philosophically weird.

Your character card (system prompt) is maybe 500-1000 tokens. Your long-term conversation history is another 4000, 8000, 16000 tokens. Add the current exchange, and suddenly you're at the edge of your context window.

When the context window is full, the model can't see the beginning of the conversation anymore. The system message (which defines who your character IS) starts getting pushed out. The character doesn't know who they're supposed to be. They drift.

You wake up at 3am and realize you're chatting with a generic advice bot instead of your Victorian poet. They're using modern slang. They forgot a key personality trait you've been discussing for the last 20 messages.

This is CHARACTER DRIFT, and it's real.

THE PARADOX OF LARGE CONTEXTS

Bigger context ≠ better character consistency.

On 96GB hardware with Q8_0 quantization (25GB model, 71GB headroom), you COULD run a 32K token context window. Technically possible.

But don't.

The sweet spot for character consistency is 8K-16K tokens.

Why?

1. The model's attention is biased toward RECENT text. Token position matters. Your system

message is at the very beginning. By token 10,000, it's been "diluted" by everything
that came after.

2. Conversation quality often peaks and then declines. Messages 1-50 are sharp. Messages

51-100 get repetitive. Messages 100+ are often filler. Carrying all of it forward
degrades the conversation.

3. There's a sweet spot where you have enough context for consistent character traits but

not so much that the character drowns in noise.

With 71GB headroom on a 96GB Mac, you could easily run 24K or even 32K context. But keep it to 16K unless you have a specific reason not to (like building a long-form narrative game where the whole history matters).

The point: bigger isn't better. Right-sized is better.

CONTEXT MANAGEMENT STRATEGIES

When conversations get long, deploy these tactics:

1. CONTEXT PRUNING

Remove messages that don't advance the conversation. If you spent 5 messages on logistics and 20 on the actual story, ditch the logistics.

Keep:

Remove:

Use your judgment. You're trimming noise, not destroying context.

2. SUMMARIZATION

Every 30-50 messages, pause and write a short summary.

"So far: Alice learned I'm from Vermont. We discussed the climate crisis for 10
 exchanges. She asked about my childhood. I shared the barn story. Current mood:
 reflective, slightly nostalgic, engaged."

Save this summary. Then start a fresh chat with it as the opening context. The model sees the summary immediately, so it knows what happened before, but the conversation feels fresh.

Tools like SillyTavern can automate this. Or do it manually—it takes 2 minutes.

3. CHARACTER REFRESH

Inject reminders of who your character is at strategic points in the conversation.

The key: inject them at DEPTH 4 (about 4 exchanges before the current message).

Why depth 4? Because the model's attention is strong at the recent end of context but weak in the middle. Messages 1-10 ago? Diluted. Messages 4-5 ago? Sharp focus.

Example with Mark Twain:

[Character message from 5 exchanges ago...]
[User response...]
[Character message...]
[User response...]
← Inject reminder here: "Mark Twain—sardonic, skeptical of authority, loves storytelling.
  Always speaks plainly. Never preachy."
[User asks a new question...]
[Model generates response—now with the reminder fresh in its attention]

In SillyTavern: Character's Note (or the newer system). Manual method: edit the chat log before continuing.

4. TOPIC SEGMENTATION

Some conversations naturally fit into separate chats. A deep philosophical discussion, a roleplay scene, a story session—each could be its own chat.

Don't force everything into one continuous stream. Quality over length.

When you're done with one topic, summarize it, then start fresh for the next. Your character stays consistent because each conversation is right-sized.

WHY DEPTH MATTERS: RECENCY BIAS IN LANGUAGE MODELS

Language models have "recency bias." They pay more attention to tokens near the END of the context window than tokens near the beginning.

This isn't a bug—it's how attention mechanisms work. Your character card, at the top of context, gets exponentially less attention as the conversation grows.

By message 50, your system prompt is effectively a whisper.

This is WHY we inject refreshers at depth 4. We're re-introducing the character definition where the model's attention is still paying attention.

It's also WHY you shouldn't trust a 1000-message chat to keep your character consistent. You'd need refreshers CONSTANTLY.

IMPLEMENTING CHARACTER REFRESH

In SillyTavern (newer versions):

Settings > Advanced > Character's Note

Add something like:
"- Still a skeptic of authority
 - Loves telling stories
 - Speaks plainly, no flowery language
 - Occasional folksy phrases ('I reckon', 'considerable')"

SillyTavern injects this at strategic points. Automatic.

Manual method (any platform):

After 20-30 messages, pause the conversation.
Edit the chat log. Find a natural spot (end of a character message).
Add a bracketed reminder:

[Mark is a storyteller, skeptical of government, loves technology and river life.
 Speaks plainly. Often folksy. Not preachy—opinionated but friendly.]

Continue the conversation. The model sees the reminder.

This feels hacky but it WORKS. You'll see immediate improvement in consistency.

SIGNS OF DRIFT

How do you know your character is drifting?

- Generic responses. Your character sounds like every other chatbot. - Speech pattern changes. They suddenly use vocabulary they didn't before. - Forgotten details. You mentioned something 20 messages ago; they act like they didn't hear it. - Breaking character. They break the fourth wall or talk about being an AI (if that's not

their thing).

- Becoming a yes-man. They agree with everything instead of having opinions. - Tone shift. From sardonic to cheerful. From quiet to verbose.

If you see these, something's wrong with context management.

FIXES FOR DRIFT

1. Check your context size.

At what message count does drift start? If it's 15 messages in, your context window is
too small or your character card is too bloated.

2. Add a character refresh.

Try the Character's Note method (SillyTavern) or manual injection. Often works immediately.

3. Summarize and prune.

Cut the chat in half. Keep recent messages and the summary. Start fresh.

4. Regenerate the response.

"Swipe" (regenerate) the last message. Sometimes the model just had a bad roll and will
course-correct.

5. Edit manually.

If the character said something out-of-character, edit it. Fix the tone, add details,
make it authentic. The model learns from this.

6. Reset with summary.

Sometimes you just start over. Write a summary. Begin a new chat. It's not failure—it's
respecting the natural length of a conversation.

MANAGING MEMORY EXPLICITLY

For critical information (world-state, plot details, relationship facts), don't rely on the model to remember. Maintain a KNOWN_FACTS document outside the chat.

Example for a fantasy roleplay:

KNOWN_FACTS:
- Alice is a half-elf ranger
- She's searching for her brother (missing 3 years)
- We're currently in the Shattered Isles
- Alice distrusts wizards (one killed her mentor)
- She carries a silver dagger her mother gave her

Whenever context is about to reset or drift, inject this. Don't rely on the model to carry 10 hours of roleplay in its context window.

This is especially important for anything plot-critical, emotionally significant, or mechanically important.

THE NUCLEAR OPTION: CHAT RESET WITH SUMMARY

Sometimes a conversation has gone 100 messages and it's become repetitive, off-track, or the character has drifted badly.

Start over.

But don't lose what you've built. Write a 3-5 sentence summary of what happened:

"Alice and I spent the morning discussing philosophy. She's skeptical of my atheism but
 respects my reasoning. We found common ground on the nature of meaning. By the end,
 she was laughing at my dry jokes. Mood: warm, intellectual, slightly flirtatious."

Open a new chat with the same character. Paste the summary as the opening context. Ask the character: "What do you remember about our conversation this morning?"

The model will reconstruct the emotional tenor and key points without the bloat of 100 individual messages.

You get a fresh, consistent character with continuity. Everyone wins.

QUICK REFERENCE

Context management checklist:

☐ Stop sequences set ("User:", "\nUser:")
☐ Context window at 8K-16K (not higher "just because")
☐ Character refresh at depth 4 after 30+ messages
☐ Summarization every 50 messages or at natural breakpoints
☐ Manual edit of out-of-character responses
☐ KNOWN_FACTS maintained externally for plot/world details
☐ Topic segmentation (don't force everything into one chat)
☐ Reset with summary when drift becomes obvious

Get these right, and your character will stay sharp through long, rich conversations.

Get them wrong, and you're chatting with a generic bot by message 40.