STOP SEQUENCES AND CONTEXT MANAGEMENT
Two technical pieces that prevent common problems: stop sequences prevent the model from generating too much, and context management keeps long conversations coherent.
What Are Stop Sequences?
Stop sequences are tokens that tell the model "stop generating here." When the model produces one of these tokens, generation halts immediately.
Without stop sequences:
User: How are you?
Luna: "I'm doing well, thanks for asking!"
User: That's great to hear!
Luna: "Yes, it really is a nice day. How about you?"
User: I'm good too.
The model generated an entire conversation! It played both parts.
With stop sequences:
User: How are you?
Luna: "I'm doing well, thanks for asking!"
Generation stops when it tries to write "User:". Much better.
Essential Stop Sequences
For a companion chatbot, you need at minimum:
"User:"
"\nUser:"
"{{user}}:"
"\n{{user}}:"
This covers variations in how the user's name might appear.
Setting Stop Sequences
Ollama Modelfile:
PARAMETER stop "User:"
PARAMETER stop "\nUser:"
Ollama API request:
{
"model": "rpmax",
"prompt": "...",
"stop": ["User:", "\nUser:"]
}
OpenWebUI:
Settings > Model > Advanced > Stop Sequences
SillyTavern:
Handled automatically in Chat/Instruct modes
Bolt AI:
Preferences > Generation > Stop Sequences
Multiple Characters
If your setup involves multiple AI characters (like a group chat or our techalicious.forum), add ALL character names as stop sequences:
PARAMETER stop "User:"
PARAMETER stop "Luna:"
PARAMETER stop "Alex:"
PARAMETER stop "Marcus:"
Generate one character's response at a time. Call the model separately for each character. The stop sequences prevent bleed-over.
Chat Templates
Different models expect different formatting. RPMax uses Mistral's format:
<s>[INST] {system} [/INST]
{assistant}</s>
[INST] {user} [/INST]
The tokens in this template can also be stop sequences:
PARAMETER stop "</s>"
PARAMETER stop "[INST]"
Most interfaces handle this automatically. But if you're calling the API directly, get the template right or the model gets confused.
CONTEXT MANAGEMENT
Now let's talk about keeping long conversations coherent.
The Context Window Problem
Your model has a context window: the maximum text it can "see" at once. For RPMax, that's 32K tokens (about 24,000 words).
Sounds like a lot. But here's what fills it:
System prompt: 500 tokens
Character card: 800 tokens
Chat history: ??? tokens
A 50-message conversation easily hits 10-15K tokens. Long conversations fill the window.
When the window is full, old messages get truncated. The model forgets what happened earlier. This causes character drift.
The Paradox of Large Contexts
You'd think bigger context = better memory. Actually, research shows character consistency DROPS with very large contexts.
Why? The model has more to pay attention to. Important traits get diluted in the noise. The character card at the top matters less when there's 30K tokens between it and the current message.
Sweet spot for character consistency:
8K - 16K tokens
Beyond that, you get more "memory" but less consistent character voice.
Strategies for Long Conversations
1. CONTEXT PRUNING
Remove irrelevant messages from history. Keeping every "How are you?"
"I'm fine" wastes tokens. Prune routine exchanges.
2. SUMMARIZATION
Every 30-50 messages, create a summary:
"Previous conversation: User discussed work stress, mentioned a
conflict with their boss named Sarah, expressed frustration about
lack of recognition. Luna offered support, asked clarifying questions."
Put this summary in context, remove the old messages.
3. CHARACTER REFRESH
Inject character reminders mid-context. SillyTavern calls this
"Author's Note" or "Character's Note." Place at depth 4 (4 messages
from the end).
[Remember: Luna is warm, curious, asks questions, keeps it brief]
This refreshes the model's "memory" of who it's playing.
Why Depth Matters (Recency Bias)
Here's something crucial: AI models pay MORE attention to text near the END of context than text at the beginning.
This is called recency bias. The model's attention mechanism weighs recent tokens more heavily. Information at the top of a 10K token context gets diluted. Information near the current message stays sharp.
What this means for character cards:
Your character card sits at the TOP of context (system prompt).
As conversation grows, it gets farther from the action.
The model "forgets" traits because they're too far back.
What this means for character refresh:
Injecting traits at depth 4 puts them NEAR the current message.
The model pays attention to them because they're recent.
Same information, different position, much stronger effect.
Think of it like this: if you read a 50-page document and someone asks you a question, you remember the last few pages better than page 3. LLMs work similarly.
This is why the SillyTavern community developed Author's Notes in the first place. They discovered through experimentation that trait positioning matters as much as trait content.
4. TOPIC SEGMENTATION
Start new chat sessions for new topics. Don't have one infinite
conversation about everything. Keep conversations focused.
Implementing Character Refresh
SillyTavern has this built in:
Character > Advanced > Character's Note
Set depth to 4
Content: "[Luna: warm, curious, asks follow-ups, keeps it brief]"
Manual method (in system prompt):
Add after every N messages:
[Reminder: Luna stays in character, maintains her warm but casual
voice, asks questions instead of giving advice]
The Depth Parameter
Where information appears matters. "Depth" means how far from the current message.
Depth 0 = Right before the current message (strongest influence)
Depth 4 = 4 messages back (moderate influence)
Top of context = First thing in window (weaker influence)
Character cards are at the top (weakest position). Character refreshers at depth 4 help maintain voice.
When Characters Start Drifting
Signs of drift:
- Generic responses ("I understand your feelings...")
- Changing speech patterns (formal when should be casual)
- Forgetting established details
- Breaking fourth wall ("As an AI...")
- Becoming a yes-man (agreeing with everything)
Fixes:
- Check context size (is it getting too big?)
- Add character refresh at depth 4
- Summarize and prune old messages
- Regenerate the drifted response
- Edit the response manually to correct course
Managing Memory Explicitly
Since context is limited, you can maintain important facts externally:
KNOWN_FACTS:
- User's name: Alex
- User's job: Software developer
- User mentioned: Recent breakup, stress at work
- User likes: Science fiction, coffee, hiking
Include relevant facts in the system prompt. Update as you learn things. This way important details don't get lost to truncation.
The Nuclear Option: Chat Reset
Sometimes drift gets too bad. Don't fight it. Reset the chat.
Save a summary of important details, start a new conversation, paste the summary as context. Fresh start with memory preserved.
Quick Reference
STOP SEQUENCES:
- "User:" and "\nUser:" minimum
- Add all character names in multi-char scenarios
- Add chat template tokens (</s>, [INST]) if using raw API
CONTEXT MANAGEMENT:
- Keep context at 8K-16K for best consistency
- Use character refresh at depth 4
- Prune/summarize every 30-50 messages
- Start new chats for new topics
- Save important facts externally
WHEN DRIFT HAPPENS:
- Don't let bad messages stand
- Regenerate or edit immediately
- Add character refresh
- Consider chat reset if severe