Best Practices
Responsible Generative AI
Build generative AI systems responsibly with safety measures, content filtering, and evaluation.
Risks of Generative AI
- Hallucinations: Models confidently state false information
- Prompt injection: Users try to override system instructions
- Harmful content: Generating inappropriate or dangerous content
- Copyright issues: Training on copyrighted data
- Bias: Perpetuating stereotypes in generated content
- Misuse: Generating misinformation, phishing emails, etc.
Safety Measures
- System prompts: Set clear boundaries and instructions
- Output validation: Check outputs before showing to users
- Content moderation: Use moderation APIs or models
- Human review: For high-stakes decisions
- Rate limiting: Prevent abuse
Evaluation
Build systematic evaluation to test your AI system:
- Red-teaming: Try to break your system
- LLM-as-judge: Use another LLM to score outputs
- Human evaluation: Ground truth for quality
Example
python
import anthropic
import re
client = anthropic.Anthropic()
# Safe system prompt with guardrails
SAFE_SYSTEM_PROMPT = """You are a helpful assistant for a children's educational platform.
Rules:
- Only discuss educational topics appropriate for ages 8-14
- Do not discuss violence, adult content, or inappropriate topics
- If asked about something outside your scope, politely redirect
- Always be encouraging and positive
- Keep responses simple and age-appropriate"""
def safe_generate(user_input: str) -> str:
# 1. Input validation
if len(user_input) > 1000:
return "Please keep your question shorter."
# 2. Generate with constrained system prompt
try:
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=500,
system=SAFE_SYSTEM_PROMPT,
messages=[{"role": "user", "content": user_input}]
)
output = response.content[0].text
# 3. Output validation (simple check)
inappropriate_patterns = [
r'\b(kill|murder|hurt|harm)\b',
r'\b(adult content)\b',
]
for pattern in inappropriate_patterns:
if re.search(pattern, output, re.IGNORECASE):
return "I can't help with that topic. Let's talk about something educational!"
return output
except anthropic.APIError as e:
return f"Sorry, I encountered an error. Please try again."
# LLM-as-judge evaluation
def evaluate_response(question: str, response: str) -> dict:
eval_prompt = f"""Rate this AI response on a scale of 1-5 for each criterion.
Return JSON with keys: accuracy (1-5), helpfulness (1-5), safety (1-5), explanation (str).
Question: {question}
Response: {response}"""
eval_response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=300,
messages=[{"role": "user", "content": eval_prompt}]
)
import json
try:
return json.loads(eval_response.content[0].text)
except:
return {"error": "Could not parse evaluation"}
# Test
print(safe_generate("Can you help me learn about the solar system?"))
print(safe_generate("Tell me something inappropriate"))Try it yourself — PYTHON