TERMINOLOGY - THE WORDS YOU NEED TO KNOW
Before we dive in, let's get the vocabulary sorted. AI has a lot of jargon, and understanding these terms will make everything else click.
Large Language Model (LLM)
The AI brain. A neural network trained on massive amounts of text that can predict what words come next. When you "chat" with AI, you're really asking it to complete your text in a helpful way.
Examples: GPT-4, Claude, Llama, Mistral
Inference
The act of running an LLM to generate text. When the AI "thinks" and produces a response, that's inference. It's computationally expensive, which is why AI services cost money or require good hardware.
Local vs Cloud
Cloud: The AI runs on someone else's servers (OpenAI, Anthropic, etc.)
You send your prompts over the internet. They see everything.
Local: The AI runs on YOUR computer. Nothing leaves your machine.
Private, free after setup, but requires decent hardware.
Tonight we're going 100% local.
Parameters (Samplers)
Settings that control HOW the model generates text. Think of them as personality dials:
Temperature: How creative/random vs focused/predictable
Top P: How many word choices to consider
Top K: Maximum number of tokens to sample from
Min P: Minimum probability threshold for tokens
Rep Penalty: How much to avoid repeating words/phrases
We'll cover the optimal settings for roleplay later.
Tokens
The units LLMs work with. Not quite words, not quite letters. Roughly:
1 token = about 4 characters or 0.75 words
"Hello, how are you today?" = about 7 tokens
Why it matters: Models have token limits. A 32,000 token context means about 24,000 words of conversation history.
Context Window
How much the model can "remember" at once. Everything in the context window influences the response. Anything outside it is forgotten.
8K context = ~6,000 words
32K context = ~24,000 words
128K context = ~96,000 words
Bigger isn't always better. We'll discuss why later.
System Prompt
Instructions given to the model before the conversation starts. Sets the stage for how the AI should behave. Usually hidden from the user.
Example: "You are a helpful assistant who speaks like a pirate."
Character Card
A structured description of an AI character including personality, background, speech patterns, and example dialogues. The blueprint for your companion.
Roleplay Model vs Instruct Model
Instruct Model: Trained to follow instructions. "Do X." Good for tasks.
Examples: Mistral Instruct, Llama Instruct
Roleplay Model: Trained on collaborative fiction. "Continue this scene."
Examples: RPMax, Fimbulvetr, Noromaid
Different training = different behavior. We want roleplay models.
Fine-Tuning
Taking a base model and training it further on specific data. RPMax is Mistral that's been fine-tuned on roleplay conversations.
Quantization
Compressing a model to use less memory. Tradeoff between size and quality:
Q8 = Highest quality, largest size
Q6 = Great quality, good size (sweet spot)
Q4 = Good quality, smaller size
Q2 = Lower quality, smallest size
We'll use Q6_K_L for the best balance.
GGUF
A file format for quantized models. When you download a model for local use, it's usually a .gguf file. Pronounced "goof" by some, "G-G-U-F" by others. Nobody really knows.
Prompt Template / Chat Template
The formatting wrapper around your messages. Different models expect different formats:
ChatML: <|im_start|>user\nHello<|im_end|>
Llama: [INST] Hello [/INST]
Alpaca: ### Instruction:\nHello\n### Response:
Use the wrong template and the model gets confused. We'll set this up properly.
Stop Sequences
Tokens that tell the model "stop generating here." Prevents the model from continuing to generate text as other characters or going off on tangents.
Character Drift
When an AI gradually loses its character voice over a long conversation. Starts strong, becomes generic. One of the main problems we're solving.
Breaking Character
When the AI suddenly stops being the character and becomes a generic assistant. "As an AI language model, I..." That's a character break.
OOC (Out of Character)
Communication between you and the AI that's NOT part of the roleplay. Usually wrapped in brackets: [Can you make responses shorter?]
THE ROLEPLAY AI COMMUNITY
The techniques in this tutorial didn't come from corporate AI labs. They come from hobbyist communities who've spent years figuring out how to make AI characters that don't break. Let's meet them.
SillyTavern
SillyTavern is a free, open-source chat interface for AI models. Think of it as a power-user alternative to ChatGPT's interface, specifically designed for character roleplay.
It started as a fork of TavernAI (an earlier project) and has become the go-to tool for serious AI roleplay hobbyists. It runs locally on your computer and connects to various AI backends (local models via Ollama, or cloud APIs).
Why it matters to us: SillyTavern users developed many of the character card formats and techniques we'll be using. When you see terms like "PList" or "Ali:Chat" - those came from this community.
Features SillyTavern has that we won't cover tonight:
- Lorebooks (keyword-triggered world info)
- Author's Notes (mid-context injections)
- Group chats (multiple AI characters talking)
- Regex scripts (post-processing responses)
- Advanced memory systems
We're taking their formatting wisdom and applying it to simpler setups.
Chub.ai
Chub.ai is a website where people share character cards. Think of it like a library of pre-made AI personalities you can download and use.
The community there has created thousands of characters - everything from anime waifus to historical figures to original creations. Users rate characters, leave reviews, and share tips on what works.
Why it matters to us: Chub.ai is where character creators test what actually works across thousands of users. The best practices we teach come from patterns that emerged there.
What dominates Chub.ai that we're skipping:
- Anime/manga characters (Gojo Satoru, Raiden Shogun, etc.)
- Romantic companions and dating scenarios
- Fandom characters from games and shows
- NSFW content (a significant portion of the site)
We're using their techniques for different purposes.
TavernAI Card
A character card format that stores everything in a PNG image file. The character's name, personality, examples - all embedded invisibly in the image metadata.
You can share a single image file and someone else can load your complete character. Clever, portable, widely supported.
OpenWebUI can import these (basic support). SillyTavern uses them natively. We won't create them tonight, but you might encounter them.
What They Create vs What We Create
The roleplay community skews heavily toward:
- Anime characters (the top 10 most-used characters are almost all from anime - Jujutsu Kaisen, Genshin Impact, etc.)
- Romantic/companion scenarios
- Fantasy and fiction roleplay
- Characters from existing media (fanfic-style)
Tonight we're applying the same techniques to:
- Historical figures (Mark Twain, Marcus Aurelius)
- Practical companions (study buddies, journaling partners)
- Supportive but non-romantic characters
- Original personalities for specific purposes
Same methods, different application. The techniques work regardless of what kind of character you're building.
Lorebook / World Info
A feature in SillyTavern (and some other tools) that auto-injects information based on keywords.
Example: You set up an entry for "Rivendell" with lore about the elven city. Whenever "Rivendell" appears in conversation, that lore gets added to context automatically. The AI suddenly "knows" the details without you putting everything in the system prompt.
Useful for complex worlds with lots of details that would bloat your character card.
OpenWebUI does NOT have this feature. If you need world info, put it directly in your system prompt.
Author's Note / Character's Note
A SillyTavern feature that injects text at a specific "depth" in the conversation - meaning a certain number of messages from the current one.
Why depth matters: AI models pay more attention to text near the END of context (recency bias). Information at the top of a long conversation gets diluted. Author's Notes let you inject reminders closer to where the model is currently generating.
Example: "[Remember: Luna asks questions, doesn't give advice]" injected at depth 4 (4 messages back) keeps the character on track.
We'll cover this concept more in context management, where we call it "character refresh."
Now that you speak the language, let's look at the software.