Building with LLMs
Image Generation
Create and edit images with diffusion models and understand how image generation works.
How Image Generation Works
Modern image generation uses diffusion models:
- Forward process: Add noise to images over many steps
- Training: Model learns to predict and reverse noise
- Generation: Start with pure noise, iteratively denoise
Key Models
- DALL-E 3 (OpenAI): Best for following complex prompts
- Stable Diffusion: Open-source, highly customizable
- Midjourney: Best for artistic quality
- Imagen (Google): High fidelity
Prompt Engineering for Images
Image prompts work differently from text prompts:
- Describe style: "photorealistic", "oil painting", "digital art"
- Lighting: "golden hour", "studio lighting", "dramatic shadows"
- Composition: "close-up", "wide shot", "bird's eye view"
- Quality: "highly detailed", "8K", "professional photography"
Multimodal Models
Claude and GPT-4V can also analyze images — describe them, extract text, answer questions about them.
Example
python
import anthropic
import base64
from pathlib import Path
client = anthropic.Anthropic()
# Vision: Analyze an image with Claude
def analyze_image(image_path):
with open(image_path, "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
# Detect image type
suffix = Path(image_path).suffix.lower()
media_type_map = {'.jpg': 'image/jpeg', '.png': 'image/png', '.gif': 'image/gif'}
media_type = media_type_map.get(suffix, 'image/jpeg')
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": image_data,
},
},
{
"type": "text",
"text": "Describe this image in detail. What do you see?"
}
],
}
],
)
return message.content[0].text
# Image from URL
def analyze_image_url(url):
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "url", "url": url},
},
{"type": "text", "text": "What is in this image?"}
],
}
],
)
return message.content[0].text
# Using OpenAI DALL-E for image generation
# from openai import OpenAI
# client = OpenAI()
# response = client.images.generate(
# model="dall-e-3",
# prompt="A serene mountain lake at sunrise with pine trees reflected in the water",
# size="1024x1024",
# quality="standard",
# n=1,
# )
# image_url = response.data[0].urlTry it yourself — PYTHON