Automated Image Captioning for Accessibility: An Easy Computer Vision Project for Newbies

Ever noticed how many blog images have no alt text or just generic descriptions like "image1.jpg"? Yeah, it's a problem. Not just for visually impaired readers using screen readers, but also for your SEO and overall user experience. I've been guilty of this too—uploading dozens of images to my blog posts without proper descriptions because, let's be honest, writing alt text for 20+ images is tedious.

But what if I told you that you could automate this process using some basic computer vision techniques? And better yet, what if this accessibility improvement could actually generate money for your blog?

Today, I'm going to walk you through building a simple image captioning tool that can generate descriptive alt text for your blog images. This is a great mini-project while you're getting started with your coding journey, and can be practically implemented.

Why Bother With Alt Text Anyway?

Before we dive into the code, let's talk about why this matters:

Accessibility: Screen readers rely on alt text to describe images to visually impaired users. Without it, those users miss out on important content.

SEO Benefits: Google uses alt text to understand images, which can improve your search rankings. I've seen firsthand how proper image descriptions can drive additional traffic.

Fallback Display: When images fail to load (slow connections, broken links), the alt text appears instead.

Legal Compliance: In some jurisdictions, web accessibility is legally required. I learned this the hard way when a reader pointed out my blog wasn't ADA compliant.

I realized that a typical viewer wouldn't follow my Python tutorials because all the code screenshots were just "image.jpg" to their screen reader. That was my wake-up call.

The Manual Alt Text Problem

The issue is simple: writing good alt text takes time. For a tutorial with 15+ screenshots, you might spend an extra 30 minutes just describing images. As someone who values efficiency (some might call it laziness), I wanted a better solution.

I tried using generic descriptions, but they weren't helpful.

That's when I turned to computer vision. If computers can recognize objects in images, why couldn't they write my alt text?

The Computer Vision Solution: Easier Than You Think

Here's the good news: building an image captioning system is much easier today than it was even two years ago. You don't need a PhD in machine learning or expensive GPU servers. With pre-trained models and simple Python code, you can get surprisingly good results.

I'll show you how to build this in stages, from a basic proof-of-concept to a more sophisticated system you can actually use on your blog.

What You'll Need

Basic Python knowledge (if you can write a for-loop, you're good)
A computer with Python installed
About 1-2 hours of time
No prior machine learning experience required!

Stage 1: The Quick and Dirty Solution

Let's start with the simplest possible solution using a pre-trained model. This is what I built first as a proof-of-concept before investing more time.


python
# Don't worry if you don't understand all of this code
# I'll explain the important parts

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

# Load pre-trained model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

def generate_alt_text(image_path):
    # Load and process the image
    image = Image.open(image_path).convert('RGB')
    inputs = processor(image, return_tensors="pt")
    
    # Generate caption
    outputs = model.generate(**inputs, max_length=30)
    caption = processor.decode(outputs[0], skip_special_tokens=True)
    
    return caption

# Example usage
alt_text = generate_alt_text("path/to/your/image.jpg")
print(alt_text)

When I first ran this code on some of my blog images, I was genuinely surprised by the results. For a screenshot of code, it generated "A screenshot of programming code in a text editor." Not perfect, but way better than nothing!

For a graph showing website traffic growth, it produced "A line graph showing an upward trend over time." Again, not detailed, but functional.

The best part? This took about 10 minutes to set up and could process my entire image library overnight.

Stage 2: Making It Actually Useful

The basic solution works, but has limitations. After using it for a few weeks, I noticed several problems:

Generic descriptions: "A person sitting at a computer" isn't very helpful
Missing context: It couldn't tell that a screenshot was from WordPress vs. Blogger
Technical limitations: It struggled with text-heavy images and diagrams

So I improved it. Here's the enhanced version I ended up using:


python
import os
import requests
from PIL import Image
import pytesseract
from transformers import BlipProcessor, BlipForConditionalGeneration

# Load pre-trained models
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

def extract_text_from_image(image):
    # Extract any text visible in the image
    text = pytesseract.image_to_string(image)
    return text.strip()

def generate_improved_alt_text(image_path, post_context=None):
    # Load image
    image = Image.open(image_path).convert('RGB')
    
    # Get basic caption
    inputs = processor(image, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=30)
    basic_caption = processor.decode(outputs[0], skip_special_tokens=True)
    
    # Extract text from image
    image_text = extract_text_from_image(image)
    
    # Combine information
    if image_text and len(image_text) > 10:  # Only use if meaningful text was found
        if "code" in basic_caption.lower() or "screenshot" in basic_caption.lower():
            return f"Screenshot showing code: {image_text[:100]}..."
        else:
            return f"{basic_caption} Contains text: {image_text[:100]}..."
    
    # If post context is provided, use it to enhance the caption
    if post_context and basic_caption:
        return f"{basic_caption} Related to {post_context}."
    
    return basic_caption

# Example usage
alt_text = generate_improved_alt_text("screenshot.jpg", "Blogger template customization")
print(alt_text)

This version does a few important things:

It extracts text from the image using OCR (Optical Character Recognition)
It combines the visual caption with any text found in the image
It allows you to provide context from the post topic

The results were much better. A screenshot of HTML code now generated: "Screenshot showing code: <div class='main-content'>...</div>..." which is actually useful for someone using a screen reader.

Stage 3: Building a Complete Workflow

After experimenting for a few weeks, I built a complete workflow that integrated with my blogging process. Here's what it looks like:

Batch processing: Process all images in a folder at once
Integration with Blogger: Automatically update image HTML with new alt text
Quality control: Flag low-confidence captions for manual review

Here's the code for the batch processor:


python
import os
import glob
from bs4 import BeautifulSoup
import re

def process_blog_post_images(html_file, images_folder, post_topic):
    # Read the HTML file
    with open(html_file, 'r', encoding='utf-8') as f:
        html_content = f.read()
    
    # Parse HTML
    soup = BeautifulSoup(html_content, 'html.parser')
    
    # Find all images
    images = soup.find_all('img')
    
    for img in images:
        # Get image filename
        src = img.get('src', '')
        if not src:
            continue
            
        # Extract filename from path
        filename = os.path.basename(src)
        
        # Find matching image in folder
        image_path = glob.glob(f"{images_folder}/**/{filename}", recursive=True)
        
        if not image_path:
            continue
            
        # Generate alt text
        alt_text = generate_improved_alt_text(image_path[0], post_topic)
        
        # Update the alt attribute
        img['alt'] = alt_text
        
        print(f"Updated alt text for {filename}: {alt_text}")
    
    # Save the updated HTML
    with open(html_file.replace('.html', '_updated.html'), 'w', encoding='utf-8') as f:
        f.write(str(soup))
    
    return "HTML file updated with new alt text"

# Example usage
process_blog_post_images("my_post.html", "images_folder", "Python Tutorial")

This function takes an HTML file (which you can export from Blogger), processes all the images, and creates an updated version with proper alt text.

The Results: Was It Worth It?

After implementing this system, I ran it on my entire blog archive—about 200 posts with roughly 1,500 images. Here's what happened:

Accessibility Improvement: My blog became genuinely usable for visually impaired visitors. I even received a thank-you email from the reader who originally complained.
SEO Boost: Within two months, I noticed increased image search traffic. Some of my screenshots now appear in Google Image results, driving additional visitors.
Reduced Workload: For new posts, I now run images through this system first, then just review and tweak the generated alt text instead of writing from scratch.

But the most interesting outcome was unexpected...

The Monetization Angle: How This Actually Makes Money

I didn't build this tool thinking about monetization, but it ended up creating several revenue opportunities:

1. Improved Ad Performance

After adding proper alt text, my overall page RPM (Revenue Per Mille) increased by about 8%. Why? Better SEO meant more targeted traffic, which led to more relevant ads and higher click-through rates.

2. Accessibility Consulting

Try sharing your expertise, and maybe you'll end up saying, "two companies reached out asking if I could help implement similar systems for their content. I ended up doing small consulting projects, essentially getting paid to implement the same code with minor customizations."

3. Premium Plugin Development

The biggest opportunity comes when you packaged this functionality into a WordPress plugin. While you can initially offer it for free, consider later adding premium features like:

Bulk processing of existing media libraries
Custom training for specific blog niches
Integration with Yoast SEO

The plugin can steadily generate a modest income stream of about $300/month—not life-changing, but a nice bonus for solving a problem I had anyway.

Common Challenges and How I Solved Them

Building this wasn't all smooth sailing. Here are some issues I encountered and how I addressed them:

Challenge 1: Poor Captions for Technical Screenshots

The model struggled with technical screenshots, especially code. My solution was to prioritize OCR results for these images and format them specifically as code examples.


python
# Special handling for code screenshots
if "code" in basic_caption.lower() or any(code_term in image_text.lower() for code_term in ["function", "class", "def", "var", "<div"]):
    return f"Code snippet: {image_text[:150]}..."

Challenge 2: Processing Time

Running the full model was slow on my laptop. I solved this by:

Resizing images before processing (most blog images don't need to be full resolution for captioning)
Batching the processing to run overnight
Eventually moving to a cloud service (Google Colab) for free GPU acceleration

Challenge 3: Integration with Blogger

Blogger doesn't have a great API for this kind of task. My workaround was:

Export the blog post as HTML
Process the HTML file locally
Update the image tags
Import the updated HTML

Not elegant, but effective. For WordPress users, this would be much easier with their robust API.

How You Can Build This (Even If You're Not a Coder)

Not everyone is comfortable with Python, and that's okay. Here are options for different skill levels:

Option 1: Use Existing Tools

If you don't want to code at all, several services now offer AI image captioning:

WordPress has plugins like "Auto Alt Text" that do this automatically
Cloudinary offers an AI-based image captioning service
Microsoft's Azure Computer Vision API can generate alt text via their web interface

These aren't free, but they're simple to use.

Option 2: The Google Colab Approach

This is what I recommend for beginners who want to try it themselves without installing anything:

1. Go to Google Colab (it's free)

2. Create a new notebook

3. Copy and paste this simplified version of the code:


python
!pip install transformers Pillow pytesseract

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import pytesseract
from google.colab import files

# Load pre-trained model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

def generate_alt_text(image):
    # Process image
    inputs = processor(image, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=30)
    caption = processor.decode(outputs[0], skip_special_tokens=True)
    return caption

# Upload an image
uploaded = files.upload()

# Process each uploaded image
for filename in uploaded.keys():
    image = Image.open(filename).convert('RGB')
    alt_text = generate_alt_text(image)
    print(f"Suggested alt text for {filename}: {alt_text}")

4. Run the code and upload your images

5. Copy the generated alt text for your blog

This approach requires no setup and runs on Google's servers for free.

Option 3: The Full Solution

Beyond Alt Text: Other Applications of This Technology

Once you have this system working, you can extend it in several interesting ways:

1. Automatic Featured Image Selection

I built a script that analyzes all images in a post and selects the most descriptive one as the featured image:


python
def find_best_featured_image(image_paths, post_title):
    best_score = 0
    best_image = None
    
    for img_path in image_paths:
        # Generate caption
        image = Image.open(img_path).convert('RGB')
        caption = generate_alt_text(image)
        
        # Calculate relevance score (simple word overlap)
        title_words = set(post_title.lower().split())
        caption_words = set(caption.lower().split())
        overlap = len(title_words.intersection(caption_words))
        
        # Prefer images with good overlap to post title
        if overlap > best_score:
            best_score = overlap
            best_image = img_path
    
    return best_image

This saves me time selecting featured images and often makes better choices than I would.

2. Content Suggestion Engine

I extended the image analysis to suggest related posts based on visual similarity:


python
def find_related_posts(image_path, post_database):
    # Generate caption for current image
    current_caption = generate_alt_text(Image.open(image_path).convert('RGB'))
    
    related_posts = []
    
    # Compare with captions from other posts
    for post in post_database:
        for post_image in post['images']:
            similarity = calculate_text_similarity(current_caption, post_image['caption'])
            if similarity > 0.7:  # Threshold for relatedness
                related_posts.append(post['title'])
                break  # One match is enough
    
    return related_posts

This helps me build better internal linking structures and "related posts" widgets.

3. Automatic Social Media Excerpts

When sharing blog posts on social media, I use the image captions to generate post excerpts:


python
def generate_social_excerpt(post_title, featured_image_path):
    # Get image caption
    caption = generate_alt_text(Image.open(featured_image_path).convert('RGB'))
    
    # Create social media excerpt
    excerpt = f"New post: {post_title}. {caption} Read more on the blog!"
    
    return excerpt

This creates more engaging social posts with minimal effort.

The Future: Where This Technology Is Heading

Image captioning technology is improving rapidly. Here's what I expect to see in the next few years:

More contextual awareness: Future models will better understand the relationship between images and surrounding text.
Multi-modal understanding: Systems will analyze both images and text together to generate more relevant descriptions.

Customized models: You'll be able to fine-tune captioning models for your specific blog niche.

I'm particularly excited about fine-tuning models on specific domains. A captioning system trained specifically on programming screenshots would be incredibly valuable for technical bloggers.

My Personal Tips for Getting Started

If you decide to implement this for your blog, here are my recommendations based on what I learned:

Start small: Process your most popular posts first to see the biggest impact.
Review before publishing: Always review generated captions before using them—AI is good but not perfect.
Combine with manual effort: Use the AI to generate a first draft, then enhance it with your knowledge of the post context.
Track the results: Monitor your accessibility score and image search traffic to measure the impact.

Keep learning: This field is evolving quickly, so stay updated on new models and techniques.

Conclusion: Is It Worth Your Time?

The accessibility benefits alone justify the effort, but the SEO improvements and monetization opportunities make it a no-brainer. Even if you use the simplest implementation, you'll see benefits.

If you're serious about blogging and have a substantial image library, this is one of the highest ROI technical projects you can tackle.

Read: Bounce Rate - What You Need to Know (Infographic)

Read: Top 10 Common Mistakes Every Blogger Makes + Infographic

Have you implemented any accessibility features on your blog? Have questions about the code or approach? Let me know in the comments—I'm always happy to help fellow bloggers make their content more accessible while potentially opening new revenue streams.

Menu

Search This Blog