BASE64 ENCODING EXPLAINED
Here's a question: how do you send a picture to a text-based system?
The Problem
Computers store images as binary data: ones and zeros representing pixel colors. But web requests and JSON (the format Ollama uses) are text-based. You can't just paste binary data into a text message.
The solution: Base64 encoding.
What is Base64?
Base64 is a way to represent binary data using only text characters. It takes any file (image, audio, PDF, whatever) and converts it into a long string of letters, numbers, and a few symbols.
Example: A tiny 1x1 pixel image looks like this when Base64 encoded:
iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR4nGNg
YPj/HwADAgH/a7PLIQAAAABJRU5ErkJggg==
That gibberish IS the image, just represented as text. The receiving system decodes it back into binary to display or process it.
Base64 increases file size by about 33% (because you're using text to represent binary), but it lets you embed images directly in JSON requests.
The Name
Why "Base64"? Because it uses 64 different characters to encode data:
A-Z (26) + a-z (26) + 0-9 (10) + two symbols (+ and /)
That gives you 64 characters total. Each character represents 6 bits of binary data. But you don't need to understand the math to use it.
Converting an Image to Base64
On macOS or Linux, use the base64 command:
base64 -i image.png
This prints the encoded data to your terminal. The -i flag specifies the input file.
For use in scripts, we want the output on a single line (no whitespace):
base64 -i image.png | perl -pe's~\s~~g'
The Perl one-liner removes ALL whitespace characters (spaces, tabs, newlines) giving us one continuous string. This is cleaner than just removing newlines because Base64 output can have other whitespace too.
Terminal Character Limits (Important!)
Here's something that bit us during testing: your terminal has a maximum command length. On macOS, check it with:
getconf ARG_MAX
You'll see something like 1048576 (about 1MB). Sounds like a lot, but Base64 encoding inflates file size by 33%. A 750KB image becomes a 1MB string, which can exceed the limit.
If you get weird errors when sending large images, this is probably why.
The fix: resize images before encoding. Use sips (built into macOS):
sips --resampleHeightWidthMax 768 image.png --out resized.png
This shrinks the image so its longest side is 768 pixels. That's plenty for vision model analysis and keeps the Base64 string manageable.
Let's Try It
Using any PNG image you have:
base64 -i yourimage.png | head -c 100
This shows the first 100 characters of the encoded image. You'll see something like:
iVBORw0KGgoAAAANSUhEUgAABAAAAAQACAIAAADwf7zUAAE...
That's the beginning of the image data as text.
Checking Character Count
Before sending to the API, verify your image isn't too large:
base64 -i yourimage.png | perl -pe's~\s~~g' | wc -c
This shows the character count. If it's over 800,000 characters, resize the image first using sips.
Storing in a Variable
For scripting, we store the encoded image in a shell variable:
IMAGE_B64=$(base64 -i photo.png | perl -pe's~\s~~g')
The $(...) syntax runs the command and captures the output. Now the variable IMAGE_B64 contains the entire encoded image as a string.
To verify it worked:
echo ${#IMAGE_B64}
This prints the length. You should see a large number (hundreds of thousands for a typical PNG).
Why This Matters
When we send an image to Ollama, we'll put this Base64 string inside our JSON request. The vision model receives it, decodes it back to binary, and analyzes the actual image.
It's just a transport mechanism. The AI sees the original image, not the encoded gibberish.