Building with LLMs

Image Generation

Create and edit images with diffusion models and understand how image generation works.

How Image Generation Works

Modern image generation uses diffusion models:

  1. Forward process: Add noise to images over many steps
  2. Training: Model learns to predict and reverse noise
  3. Generation: Start with pure noise, iteratively denoise

Key Models

  • DALL-E 3 (OpenAI): Best for following complex prompts
  • Stable Diffusion: Open-source, highly customizable
  • Midjourney: Best for artistic quality
  • Imagen (Google): High fidelity

Prompt Engineering for Images

Image prompts work differently from text prompts:

  • Describe style: "photorealistic", "oil painting", "digital art"
  • Lighting: "golden hour", "studio lighting", "dramatic shadows"
  • Composition: "close-up", "wide shot", "bird's eye view"
  • Quality: "highly detailed", "8K", "professional photography"

Multimodal Models

Claude and GPT-4V can also analyze images — describe them, extract text, answer questions about them.

Example

python
import anthropic
import base64
from pathlib import Path

client = anthropic.Anthropic()

# Vision: Analyze an image with Claude
def analyze_image(image_path):
    with open(image_path, "rb") as f:
        image_data = base64.standard_b64encode(f.read()).decode("utf-8")

    # Detect image type
    suffix = Path(image_path).suffix.lower()
    media_type_map = {'.jpg': 'image/jpeg', '.png': 'image/png', '.gif': 'image/gif'}
    media_type = media_type_map.get(suffix, 'image/jpeg')

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": image_data,
                        },
                    },
                    {
                        "type": "text",
                        "text": "Describe this image in detail. What do you see?"
                    }
                ],
            }
        ],
    )
    return message.content[0].text

# Image from URL
def analyze_image_url(url):
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {"type": "url", "url": url},
                    },
                    {"type": "text", "text": "What is in this image?"}
                ],
            }
        ],
    )
    return message.content[0].text

# Using OpenAI DALL-E for image generation
# from openai import OpenAI
# client = OpenAI()
# response = client.images.generate(
#     model="dall-e-3",
#     prompt="A serene mountain lake at sunrise with pine trees reflected in the water",
#     size="1024x1024",
#     quality="standard",
#     n=1,
# )
# image_url = response.data[0].url
Try it yourself — PYTHON