BUILDING YOUR OWN AUTOMATION

What we've shown is the foundation. You now have all the pieces:

How to encode an image for the API
How to send a request to Ollama
How to craft prompts that give structured output
How to parse the response with regex
How to make a pass/fail decision

From here, building a full automation pipeline is up to you.

Batch Processing

Loop through a folder of images and check each one:

for img in *.png; do
    echo "Checking $img..."
    
    IMAGE_B64=$(base64 -i "$img" | perl -pe's~\s~~g')
    
    RESPONSE=$(curl -s http://localhost:11434/api/generate \
      -H "Content-Type: application/json" \
      -d '{
        "model": "ministral",
        "prompt": "Answer YES or NO only.\n\nNUDE_CHEST: YES or NO",
        "images": ["'"$IMAGE_B64"'"],
        "stream": false
      }' | jq -r '.response')
    
    if echo "$RESPONSE" | grep -qi "NUDE_CHEST: YES"; then
        echo "  REJECT"
    else
        echo "  PASS"
    fi
done

Moving Rejected Files

Don't delete rejects. Move them to a separate folder for review:

mkdir -p ./rejects

for img in *.png; do
    # ... run your check ...
    
    if echo "$RESPONSE" | grep -qi "NUDE_CHEST: YES"; then
        mv "$img" ./rejects/
        echo "Moved to rejects: $img"
    fi
done

Logging

Keep a record of what was checked and why:

LOGFILE="moderation.log"

for img in images/*.png; do
    # ... run your check ...
    
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    
    if echo "$RESPONSE" | grep -qi "NUDE_CHEST: YES"; then
        RESULT="REJECT"
        REASON="Nudity detected"
    else
        RESULT="PASS"
        REASON=""
    fi
    
    echo "$TIMESTAMP,$img,$RESULT,$REASON" >> "$LOGFILE"
done

This creates a CSV-style log you can review later.

Multiple Checks

You could run different prompts for different concerns:

Content moderation (what we showed)
Quality assessment (check for artifacts, extra fingers)
Style matching (does it match the requested style?)
Text detection (is there readable text in the image?)

Each check is just a different prompt with different parsing logic.

Model Settings

You can add an "options" block to your JSON request to control behavior. The temperature setting controls randomness:

"options": {
  "temperature": 0.1
}

Temperature values:

0.1   More predictable, consistent
0.7   More creative, varied
1.0   Most random

For moderation, lower is better. We want consistent yes/no answers, not creative interpretation.

Other useful options:

"num_ctx": 4096       Context window size
"num_predict": 100    Max tokens to generate

Performance Considerations

Vision models are slower than text-only models. A single image analysis might take 5-30 seconds depending on your hardware and model size.

For batch processing hundreds of images:

Process during off-hours
Consider a smaller/faster model for initial screening
Use a more thorough model for borderline cases
Cache results (don't re-check unchanged images)

Other Use Cases

The pattern (encode, prompt, parse, act) works for many tasks:

Accessibility: Generate alt-text descriptions for images
Organization: Auto-tag photos by content
Quality Control: Detect blurry or corrupted images
Document Processing: Extract text from screenshots
Security: Detect sensitive information in images

The Key Insight

Once you understand the pattern, you can adapt it to almost any image analysis task. The prompt is what you change. The plumbing stays the same.

Build something. See what works. Iterate.