QUICK REFERENCE - CHEAT SHEET
Everything you need on one page. Print this. Bookmark this.
MODEL
RPMax 22B:
ollama pull hf.co/bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF:Q6_K_L
RPMax 12B (lighter):
ollama pull hf.co/bartowski/Mistral-Nemo-12B-ArliAI-RPMax-v1.1-GGUF:Q6_K
PARAMETERS
Temperature: 1.0
Top K: 40
Top P: 0.95
Min P: 0.02
Repeat Penalty: 1.0 (DISABLED!)
Max Tokens: 2048
Context Window: 16384
MODELFILE TEMPLATE
FROM hf.co/bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF:Q6_K_L
PARAMETER temperature 1.0
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1.0
PARAMETER num_ctx 16384
PARAMETER stop "User:"
PARAMETER stop "\nUser:"
SYSTEM """
[Your character card here]
"""
Create: ollama create mychar -f mychar.modelfile Run: ollama run mychar
CHARACTER CARD FORMAT
[Name: CharacterName]
[Personality= trait1, trait2, trait3, trait4, trait5]
[Speech= style1, style2, style3]
Brief background sentence if needed.
<START>
{{user}}: Example user message
{{char}}: Example character response showing personality
<END>
<START>
{{user}}: Different scenario
{{char}}: Character handling it in their voice
<END>
<START>
{{user}}: Third scenario
{{char}}: Third example response
<END>
SCENE PROMPT STRUCTURE
[Character card at top]
Scene context (brief).
User: What the user said
CharacterName:
STOP SEQUENCES
Essential: "User:", "\nUser:"
Multi-char: Add all character names with colons
Chat template: "</s>", "[INST]" (if using raw API)
CHARACTER REFRESH
Place at depth 4 (4 messages from end):
[Remember: CharacterName is trait, trait, trait]
WHEN THINGS GO WRONG
Generic responses? Raise temp, improve examples
Too long? Shorter first message, lower max tokens
Too short? Longer examples, raise temp
Breaking character? Check model, add refresh, regenerate
Repetitive? Disable repeat penalty (set to 1.0)
Forgetting things? Prune context, save facts externally
Yes-man behavior? Add disagreement examples
POSITIVE FRAMING
BAD: "NEVER break character"
GOOD: "Stay fully in character"
BAD: "Don't be verbose"
GOOD: "Keep responses brief"
BAD: "Never give advice"
GOOD: "Ask questions instead of advising"
TOKEN ESTIMATES
1 token ≈ 4 characters or 0.75 words
100 tokens ≈ 75 words
1000 tokens ≈ 750 words
8K context ≈ 6,000 words
16K context ≈ 12,000 words
32K context ≈ 24,000 words
TEMPERATURE GUIDE
0.5 = Focused, consistent, less creative
0.7 = Balanced (common default)
1.0 = Natural variation (recommended)
1.2 = More creative, occasional oddness
1.5+ = Wild, unpredictable
OOC COMMANDS
[Shorter responses please]
[Stay more in character]
[Remember detail X]
[Let's shift topics]
ACTION FORMAT
*asterisks for actions* "quotes for dialogue"
Example:
*tilts head* "That's interesting. Tell me more."
INTERFACES
OpenWebUI: docker run -d -p 3000:8080 ...
Ollama CLI: ollama run modelname
Bolt AI: Mac App Store
COMMANDS
ollama serve Start Ollama server
ollama list List installed models
ollama pull <model> Download a model
ollama run <model> Interactive chat
ollama create name -f f Create from Modelfile
ollama rm <model> Delete a model
ollama ps List running models
ollama stop <model> Stop a model
MAINTENANCE SCHEDULE
Every 30-50 messages:
- Prune irrelevant exchanges
- Update external fact list
- Add character refresh if drifting
Every few sessions:
- Review character card
- Improve weak examples
- Check for repetitive patterns
DIAGNOSTIC CHECKLIST
[ ] Right model? (RPMax, not Instruct)
[ ] Repeat penalty off? (1.0)
[ ] Temperature good? (1.0)
[ ] Stop sequences set?
[ ] Context not overflowed?
[ ] Character card clear?
[ ] First message sets tone?
LINKS
Ollama: https://ollama.com
OpenWebUI: https://openwebui.com
RPMax Model: https://huggingface.co/ArliAI
Quantizations: https://huggingface.co/bartowski
Techalicious: https://techalicious.forum
REMEMBER
- Use roleplay models, not instruct models
- Disable repetition penalty for RPMax
- Show, don't tell - examples > rules
- Positive framing > negative rules
- First message sets the template
- Context management prevents drift
- Regenerate/edit bad responses immediately