Production Patterns
Error Handling and Rate Limits
Handle Gemini API errors gracefully with proper retry logic, rate limit management, and production-ready error patterns.
Common Error Types
| Error | HTTP Status | Cause | Fix |
|---|---|---|---|
| RESOURCE_EXHAUSTED | 429 | Rate limit exceeded | Exponential backoff + retry |
| INVALID_ARGUMENT | 400 | Bad request format | Validate input before sending |
| PERMISSION_DENIED | 403 | Invalid API key | Check key and billing |
| UNAVAILABLE | 503 | Temporary overload | Retry with backoff |
| DEADLINE_EXCEEDED | 504 | Request timeout | Reduce prompt size or use streaming |
Rate Limits (Free Tier)
| Model | RPM | TPM |
|---|---|---|
| gemini-1.5-flash | 15 | 1,000,000 |
| gemini-1.5-pro | 2 | 32,000 |
Paid tier limits are much higher. Check Google AI Studio for current quotas.
Exponential Backoff
The standard retry pattern for 429 and 503 errors. Each retry waits progressively longer, with jitter to avoid thundering herd:
text
Attempt 1: wait 1s
Attempt 2: wait 2s
Attempt 3: wait 4s
Attempt 4: wait 8s
Maximum: ~60sToken Counting
Before sending large prompts, estimate token count with countTokens() to avoid hitting context limits or unexpected costs.
Validation Checklist
Before production deployment:
- [ ] API key stored in environment variables
- [ ] Retry logic implemented for 429/503 errors
- [ ] Safety filter responses handled gracefully
- [ ] Token counting for large inputs
- [ ] Streaming used for outputs > 500 tokens
- [ ] Timeouts configured (avoid infinite waits)
Example
typescript
import { GoogleGenerativeAI, GoogleGenerativeAIError } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
// Exponential backoff retry
async function generateWithRetry(
prompt: string,
maxRetries = 5,
): Promise<string | null> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const result = await model.generateContent(prompt);
return result.response.text();
} catch (error) {
if (error instanceof GoogleGenerativeAIError) {
const status = (error as any).status;
if (status === 429 || status === 503) {
const waitMs = Math.min(1000 * 2 ** attempt + Math.random() * 1000, 60000);
console.warn(`Attempt ${attempt + 1} failed (${status}), retrying in ${waitMs}ms...`);
await new Promise(resolve => setTimeout(resolve, waitMs));
continue;
}
if (status === 400) {
console.error("Invalid request:", error.message);
return null; // Don't retry bad requests
}
throw error; // Re-throw unknown errors
}
throw error;
}
}
console.error("Max retries exceeded");
return null;
}
// Token counting before large requests
async function safeGenerateWithLargeContext(documents: string[], query: string) {
const fullPrompt = documents.join("\n\n") + "\n\nQuestion: " + query;
const { totalTokens } = await model.countTokens(fullPrompt);
console.log(`Estimated tokens: ${totalTokens}`);
if (totalTokens > 900_000) {
throw new Error(`Context too large: ${totalTokens} tokens (max 1,000,000)`);
}
return generateWithRetry(fullPrompt);
}
// Usage
const result = await generateWithRetry("What is the Gemini context window size?");
console.log(result);Try it yourself — TYPESCRIPT