PARAMETERS - THE SETTINGS THAT MATTER
This is where most people mess up. The default settings for general chat are WRONG for roleplay. Worse, some common "fixes" actively hurt RPMax performance.
The Counterintuitive Truth
You know how everyone says "use repetition penalty to avoid repetitive text"? With RPMax, that's wrong.
RPMax was TRAINED to avoid repetition. It has anti-repetition built into its weights. Adding repetition penalty on top makes it worse, not better. The model starts avoiding words it should use naturally.
This is the single most important thing to remember:
+---------------------------------------------------+
| DISABLE REPETITION PENALTY FOR RPMAX |
| Set repeat_penalty = 1.0 (which means OFF) |
+---------------------------------------------------+
Also disable related samplers like DRY and XTC if your interface has them. Let the model's native training handle it.
Recommended Settings
Here's the configuration that works:
Temperature: 1.0
Top K: 40
Top P: 0.95
Min P: 0.02
Repeat Penalty: 1.0 (disabled)
Max Tokens: 2048
Context Window: 16384
Let's break down what each does.
Temperature (1.0)
Controls randomness. Higher = more creative, lower = more predictable.
0.1 - Very focused, almost deterministic
0.5 - Coherent but safe
0.7 - Good balance (common default)
1.0 - Natural variation, creative
1.5 - Spicy, sometimes weird
2.0 - Chaos mode
For roleplay, we want 1.0. Characters should have natural variation in how they express things. Too low and they feel robotic. Too high and they get random.
Start at 1.0. If responses feel too wild, drop to 0.8. If too boring, try 1.2.
Top K (40)
Limits how many tokens the model considers for each word. At each generation step, it looks at the K most likely next tokens.
Top K = 40 means: consider only the 40 most probable next words
Higher K = more variety, possibly incoherent Lower K = more focused, possibly repetitive
40 is a good starting point. Increase to 60-100 if you want more creative language. Decrease to 20-30 for more focused output.
Top P (0.95)
Also called "nucleus sampling." Instead of taking the top K tokens, it takes tokens until their cumulative probability reaches P.
Top P = 0.95 means: take tokens until you have 95% probability mass
This adapts to the situation. When the model is confident, fewer tokens qualify. When uncertain, more tokens are considered.
0.95 is standard. Lower (0.9) for more focus, higher (0.98) for more variety. Works together with Top K, not instead of it.
Min P (0.02)
A newer sampler that filters out low-probability garbage. Any token with probability below Min P (relative to the best token) is excluded.
Min P = 0.02 means: ignore tokens less than 2% as likely as the best
This prevents the model from occasionally picking weird tokens that technically passed Top K/Top P but are still unlikely.
Tune in 0.005 increments:
- Responses too boring? Lower to 0.015
- Responses too weird? Raise to 0.025
Repeat Penalty (1.0 = OFF)
This penalizes the model for using words it already used. The theory is it prevents repetitive loops.
1.0 = No penalty (disabled)
1.1 = Slight penalty
1.2 = Moderate penalty
1.5 = Heavy penalty
For MOST models, 1.1-1.15 helps. For RPMax, keep it at 1.0. The model already handles this. Adding penalty makes it awkwardly avoid normal words like "the" and character names.
Max Tokens (2048)
Maximum length of the response in tokens. Set high and let the model decide how much to write based on context.
512 = Short responses (might cut off)
1024 = Medium responses
2048 = Long responses allowed
4096 = Very long (rarely needed)
For companions, 2048 gives the model room to breathe. Short exchanges will still be short. Long explanations have space.
Context Window (16384)
How much conversation history the model can see. Bigger = longer memory.
But here's the thing: bigger isn't always better.
Research shows character consistency actually DROPS with very long contexts. The model gets confused with too much information. Sweet spot is 8K-16K tokens.
8192 = ~6,000 words - good for most chats
16384 = ~12,000 words - recommended
32768 = ~24,000 words - maximum, but may hurt consistency
Start with 16384. Only go higher if you have specific long-context needs and are willing to manage character drift.
Setting Parameters in Different Tools
Ollama CLI (in Modelfile):
PARAMETER temperature 1.0
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1.0
PARAMETER num_ctx 16384
Ollama API (in request):
{
"model": "rpmax",
"prompt": "...",
"options": {
"temperature": 1.0,
"top_k": 40,
"top_p": 0.95,
"repeat_penalty": 1.0,
"num_ctx": 16384
}
}
OpenWebUI:
Settings > Models > [Select Model] > Advanced Parameters
Bolt AI:
Preferences > Models > [Select Model] > Parameters
Sampler Order
If your interface lets you set sampler order (like SillyTavern), use:
- Min P (truncation first)
- Top K
- Top P
- Temperature (last)
Apply truncation samplers first to remove garbage tokens, then temperature to control randomness of the remaining good options.
Quick Reference
Parameter | Value | Why
-----------------|--------|----------------------------------
Temperature | 1.0 | Natural variation
Top K | 40 | Balanced token selection
Top P | 0.95 | Adaptive probability cutoff
Min P | 0.02 | Filter low-quality tokens
Repeat Penalty | 1.0 | DISABLED - model handles it
Max Tokens | 2048 | Room for long responses
Context Window | 16384 | Memory without drift
If responses feel off, adjust in this order:
- Temperature (up = more creative, down = more focused)
- Min P (down = more creative, up = more focused)
- Top K (up = more variety)
Never touch repeat penalty for RPMax. Just leave it off.