Automated Image Captioning for Accessibility: An Easy Computer Vision Project for Newbies

 


Ever noticed how many blog images have no alt text or just generic descriptions like "image1.jpg"? Yeah, it's a problem. Not just for visually impaired readers using screen readers, but also for your SEO and overall user experience. I've been guilty of this too—uploading dozens of images to my blog posts without proper descriptions because, let's be honest, writing alt text for 20+ images is tedious.

But what if I told you that you could automate this process using some basic computer vision techniques? And better yet, what if this accessibility improvement could actually generate money for your blog?

Today, I'm going to walk you through building a simple image captioning tool that can generate descriptive alt text for your blog images. This is a great mini-project while you're getting started with your coding journey, and can be practically implemented.


Why Bother With Alt Text Anyway?

Before we dive into the code, let's talk about why this matters:

  1. Accessibility: Screen readers rely on alt text to describe images to visually impaired users. Without it, those users miss out on important content.

  2. SEO Benefits: Google uses alt text to understand images, which can improve your search rankings. I've seen firsthand how proper image descriptions can drive additional traffic.

  3. Fallback Display: When images fail to load (slow connections, broken links), the alt text appears instead.

  4. Legal Compliance: In some jurisdictions, web accessibility is legally required. I learned this the hard way when a reader pointed out my blog wasn't ADA compliant.

I realized that a typical viewer wouldn't follow my Python tutorials because all the code screenshots were just "image.jpg" to their screen reader. That was my wake-up call.

The Manual Alt Text Problem

The issue is simple: writing good alt text takes time. For a tutorial with 15+ screenshots, you might spend an extra 30 minutes just describing images. As someone who values efficiency (some might call it laziness), I wanted a better solution.

I tried using generic descriptions, but they weren't helpful.

That's when I turned to computer vision. If computers can recognize objects in images, why couldn't they write my alt text?

The Computer Vision Solution: Easier Than You Think

Here's the good news: building an image captioning system is much easier today than it was even two years ago. You don't need a PhD in machine learning or expensive GPU servers. With pre-trained models and simple Python code, you can get surprisingly good results.

I'll show you how to build this in stages, from a basic proof-of-concept to a more sophisticated system you can actually use on your blog.

What You'll Need

  • Basic Python knowledge (if you can write a for-loop, you're good)

  • A computer with Python installed

  • About 1-2 hours of time

  • No prior machine learning experience required!

Stage 1: The Quick and Dirty Solution

Let's start with the simplest possible solution using a pre-trained model. This is what I built first as a proof-of-concept before investing more time.

python
# Don't worry if you don't understand all of this code # I'll explain the important parts import requests from PIL import Image from transformers import BlipProcessor, BlipForConditionalGeneration # Load pre-trained model processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base") model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base") def generate_alt_text(image_path): # Load and process the image image = Image.open(image_path).convert('RGB') inputs = processor(image, return_tensors="pt") # Generate caption outputs = model.generate(**inputs, max_length=30) caption = processor.decode(outputs[0], skip_special_tokens=True) return caption # Example usage alt_text = generate_alt_text("path/to/your/image.jpg") print(alt_text)

When I first ran this code on some of my blog images, I was genuinely surprised by the results. For a screenshot of code, it generated "A screenshot of programming code in a text editor." Not perfect, but way better than nothing!

For a graph showing website traffic growth, it produced "A line graph showing an upward trend over time." Again, not detailed, but functional.

The best part? This took about 10 minutes to set up and could process my entire image library overnight.

Stage 2: Making It Actually Useful

The basic solution works, but has limitations. After using it for a few weeks, I noticed several problems:

  1. Generic descriptions: "A person sitting at a computer" isn't very helpful

  2. Missing context: It couldn't tell that a screenshot was from WordPress vs. Blogger

  3. Technical limitations: It struggled with text-heavy images and diagrams

So I improved it. Here's the enhanced version I ended up using:

python
import os import requests from PIL import Image import pytesseract from transformers import BlipProcessor, BlipForConditionalGeneration # Load pre-trained models processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base") model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base") def extract_text_from_image(image): # Extract any text visible in the image text = pytesseract.image_to_string(image) return text.strip() def generate_improved_alt_text(image_path, post_context=None): # Load image image = Image.open(image_path).convert('RGB') # Get basic caption inputs = processor(image, return_tensors="pt") outputs = model.generate(**inputs, max_length=30) basic_caption = processor.decode(outputs[0], skip_special_tokens=True) # Extract text from image image_text = extract_text_from_image(image) # Combine information if image_text and len(image_text) > 10: # Only use if meaningful text was found if "code" in basic_caption.lower() or "screenshot" in basic_caption.lower(): return f"Screenshot showing code: {image_text[:100]}..." else: return f"{basic_caption} Contains text: {image_text[:100]}..." # If post context is provided, use it to enhance the caption if post_context and basic_caption: return f"{basic_caption} Related to {post_context}." return basic_caption # Example usage alt_text = generate_improved_alt_text("screenshot.jpg", "Blogger template customization") print(alt_text)

This version does a few important things:

  1. It extracts text from the image using OCR (Optical Character Recognition)

  2. It combines the visual caption with any text found in the image

  3. It allows you to provide context from the post topic

The results were much better. A screenshot of HTML code now generated: "Screenshot showing code: <div class='main-content'>...</div>..." which is actually useful for someone using a screen reader.

Stage 3: Building a Complete Workflow

After experimenting for a few weeks, I built a complete workflow that integrated with my blogging process. Here's what it looks like:

  1. Batch processing: Process all images in a folder at once

  2. Integration with Blogger: Automatically update image HTML with new alt text

  3. Quality control: Flag low-confidence captions for manual review

Here's the code for the batch processor:

python
import os import glob from bs4 import BeautifulSoup import re def process_blog_post_images(html_file, images_folder, post_topic): # Read the HTML file with open(html_file, 'r', encoding='utf-8') as f: html_content = f.read() # Parse HTML soup = BeautifulSoup(html_content, 'html.parser') # Find all images images = soup.find_all('img') for img in images: # Get image filename src = img.get('src', '') if not src: continue # Extract filename from path filename = os.path.basename(src) # Find matching image in folder image_path = glob.glob(f"{images_folder}/**/{filename}", recursive=True) if not image_path: continue # Generate alt text alt_text = generate_improved_alt_text(image_path[0], post_topic) # Update the alt attribute img['alt'] = alt_text print(f"Updated alt text for {filename}: {alt_text}") # Save the updated HTML with open(html_file.replace('.html', '_updated.html'), 'w', encoding='utf-8') as f: f.write(str(soup)) return "HTML file updated with new alt text" # Example usage process_blog_post_images("my_post.html", "images_folder", "Python Tutorial")

This function takes an HTML file (which you can export from Blogger), processes all the images, and creates an updated version with proper alt text.

The Results: Was It Worth It?

After implementing this system, I ran it on my entire blog archive—about 200 posts with roughly 1,500 images. Here's what happened:

  1. Accessibility Improvement: My blog became genuinely usable for visually impaired visitors. I even received a thank-you email from the reader who originally complained.

  2. SEO Boost: Within two months, I noticed increased image search traffic. Some of my screenshots now appear in Google Image results, driving additional visitors.

  3. Reduced Workload: For new posts, I now run images through this system first, then just review and tweak the generated alt text instead of writing from scratch.

But the most interesting outcome was unexpected...

The Monetization Angle: How This Actually Makes Money

I didn't build this tool thinking about monetization, but it ended up creating several revenue opportunities:

1. Improved Ad Performance

After adding proper alt text, my overall page RPM (Revenue Per Mille) increased by about 8%. Why? Better SEO meant more targeted traffic, which led to more relevant ads and higher click-through rates.

2. Accessibility Consulting

Try sharing your expertise, and maybe you'll end up saying, "two companies reached out asking if I could help implement similar systems for their content. I ended up doing small consulting projects, essentially getting paid to implement the same code with minor customizations."

3. Premium Plugin Development

The biggest opportunity comes when you packaged this functionality into a WordPress plugin. While you can initially offer it for free, consider later adding premium features like:

  • Bulk processing of existing media libraries

  • Custom training for specific blog niches

  • Integration with Yoast SEO

The plugin can steadily generate a modest income stream of about $300/month—not life-changing, but a nice bonus for solving a problem I had anyway.

Common Challenges and How I Solved Them

Building this wasn't all smooth sailing. Here are some issues I encountered and how I addressed them:

Challenge 1: Poor Captions for Technical Screenshots

The model struggled with technical screenshots, especially code. My solution was to prioritize OCR results for these images and format them specifically as code examples.

python
# Special handling for code screenshots if "code" in basic_caption.lower() or any(code_term in image_text.lower() for code_term in ["function", "class", "def", "var", "<div"]): return f"Code snippet: {image_text[:150]}..."

Challenge 2: Processing Time

Running the full model was slow on my laptop. I solved this by:

  1. Resizing images before processing (most blog images don't need to be full resolution for captioning)

  2. Batching the processing to run overnight

  3. Eventually moving to a cloud service (Google Colab) for free GPU acceleration

Challenge 3: Integration with Blogger

Blogger doesn't have a great API for this kind of task. My workaround was:

  1. Export the blog post as HTML

  2. Process the HTML file locally

  3. Update the image tags

  4. Import the updated HTML

Not elegant, but effective. For WordPress users, this would be much easier with their robust API.

How You Can Build This (Even If You're Not a Coder)

Not everyone is comfortable with Python, and that's okay. Here are options for different skill levels:

Option 1: Use Existing Tools

If you don't want to code at all, several services now offer AI image captioning:

  • WordPress has plugins like "Auto Alt Text" that do this automatically

  • Cloudinary offers an AI-based image captioning service

  • Microsoft's Azure Computer Vision API can generate alt text via their web interface

These aren't free, but they're simple to use.

Option 2: The Google Colab Approach

This is what I recommend for beginners who want to try it themselves without installing anything:

  1. Go to Google Colab (it's free)

  2. Create a new notebook

  3. Copy and paste this simplified version of the code:

python
!pip install transformers Pillow pytesseract import requests from PIL import Image from transformers import BlipProcessor, BlipForConditionalGeneration import pytesseract from google.colab import files # Load pre-trained model processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base") model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base") def generate_alt_text(image): # Process image inputs = processor(image, return_tensors="pt") outputs = model.generate(**inputs, max_length=30) caption = processor.decode(outputs[0], skip_special_tokens=True) return caption # Upload an image uploaded = files.upload() # Process each uploaded image for filename in uploaded.keys(): image = Image.open(filename).convert('RGB') alt_text = generate_alt_text(image) print(f"Suggested alt text for {filename}: {alt_text}")
  1. Run the code and upload your images

  2. Copy the generated alt text for your blog

This approach requires no setup and runs on Google's servers for free.

Option 3: The Full Solution

Beyond Alt Text: Other Applications of This Technology

Once you have this system working, you can extend it in several interesting ways:

1. Automatic Featured Image Selection

I built a script that analyzes all images in a post and selects the most descriptive one as the featured image:

python
def find_best_featured_image(image_paths, post_title): best_score = 0 best_image = None for img_path in image_paths: # Generate caption image = Image.open(img_path).convert('RGB') caption = generate_alt_text(image) # Calculate relevance score (simple word overlap) title_words = set(post_title.lower().split()) caption_words = set(caption.lower().split()) overlap = len(title_words.intersection(caption_words)) # Prefer images with good overlap to post title if overlap > best_score: best_score = overlap best_image = img_path return best_image

This saves me time selecting featured images and often makes better choices than I would.

2. Content Suggestion Engine

I extended the image analysis to suggest related posts based on visual similarity:

python
def find_related_posts(image_path, post_database): # Generate caption for current image current_caption = generate_alt_text(Image.open(image_path).convert('RGB')) related_posts = [] # Compare with captions from other posts for post in post_database: for post_image in post['images']: similarity = calculate_text_similarity(current_caption, post_image['caption']) if similarity > 0.7: # Threshold for relatedness related_posts.append(post['title']) break # One match is enough return related_posts

This helps me build better internal linking structures and "related posts" widgets.

3. Automatic Social Media Excerpts

When sharing blog posts on social media, I use the image captions to generate post excerpts:

python
def generate_social_excerpt(post_title, featured_image_path): # Get image caption caption = generate_alt_text(Image.open(featured_image_path).convert('RGB')) # Create social media excerpt excerpt = f"New post: {post_title}. {caption} Read more on the blog!" return excerpt

This creates more engaging social posts with minimal effort.

The Future: Where This Technology Is Heading

Image captioning technology is improving rapidly. Here's what I expect to see in the next few years:

  1. More contextual awareness: Future models will better understand the relationship between images and surrounding text.

  2. Multi-modal understanding: Systems will analyze both images and text together to generate more relevant descriptions.

  3. Customized models: You'll be able to fine-tune captioning models for your specific blog niche.

I'm particularly excited about fine-tuning models on specific domains. A captioning system trained specifically on programming screenshots would be incredibly valuable for technical bloggers.

My Personal Tips for Getting Started

If you decide to implement this for your blog, here are my recommendations based on what I learned:

  1. Start small: Process your most popular posts first to see the biggest impact.

  2. Review before publishing: Always review generated captions before using them—AI is good but not perfect.

  3. Combine with manual effort: Use the AI to generate a first draft, then enhance it with your knowledge of the post context.

  4. Track the results: Monitor your accessibility score and image search traffic to measure the impact.

  5. Keep learning: This field is evolving quickly, so stay updated on new models and techniques.

Conclusion: Is It Worth Your Time?

The accessibility benefits alone justify the effort, but the SEO improvements and monetization opportunities make it a no-brainer. Even if you use the simplest implementation, you'll see benefits.

If you're serious about blogging and have a substantial image library, this is one of the highest ROI technical projects you can tackle.

Read: Bounce Rate - What You Need to Know (Infographic)

Read: Top 10 Common Mistakes Every Blogger Makes + Infographic

Have you implemented any accessibility features on your blog? Have questions about the code or approach? Let me know in the comments—I'm always happy to help fellow bloggers make their content more accessible while potentially opening new revenue streams.

0 Comments

BloggersLiveOnline

BloggersLiveOnline