Production Patterns

Error Handling and Rate Limits

Handle Gemini API errors gracefully with proper retry logic, rate limit management, and production-ready error patterns.

Common Error Types

ErrorHTTP StatusCauseFix
RESOURCE_EXHAUSTED429Rate limit exceededExponential backoff + retry
INVALID_ARGUMENT400Bad request formatValidate input before sending
PERMISSION_DENIED403Invalid API keyCheck key and billing
UNAVAILABLE503Temporary overloadRetry with backoff
DEADLINE_EXCEEDED504Request timeoutReduce prompt size or use streaming

Rate Limits (Free Tier)

ModelRPMTPM
gemini-1.5-flash151,000,000
gemini-1.5-pro232,000

Paid tier limits are much higher. Check Google AI Studio for current quotas.

Exponential Backoff

The standard retry pattern for 429 and 503 errors. Each retry waits progressively longer, with jitter to avoid thundering herd:

text
Attempt 1: wait 1s
Attempt 2: wait 2s
Attempt 3: wait 4s
Attempt 4: wait 8s
Maximum: ~60s

Token Counting

Before sending large prompts, estimate token count with countTokens() to avoid hitting context limits or unexpected costs.

Validation Checklist

Before production deployment:

  • [ ] API key stored in environment variables
  • [ ] Retry logic implemented for 429/503 errors
  • [ ] Safety filter responses handled gracefully
  • [ ] Token counting for large inputs
  • [ ] Streaming used for outputs > 500 tokens
  • [ ] Timeouts configured (avoid infinite waits)

Example

typescript
import { GoogleGenerativeAI, GoogleGenerativeAIError } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

// Exponential backoff retry
async function generateWithRetry(
  prompt: string,
  maxRetries = 5,
): Promise<string | null> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const result = await model.generateContent(prompt);
      return result.response.text();
    } catch (error) {
      if (error instanceof GoogleGenerativeAIError) {
        const status = (error as any).status;

        if (status === 429 || status === 503) {
          const waitMs = Math.min(1000 * 2 ** attempt + Math.random() * 1000, 60000);
          console.warn(`Attempt ${attempt + 1} failed (${status}), retrying in ${waitMs}ms...`);
          await new Promise(resolve => setTimeout(resolve, waitMs));
          continue;
        }

        if (status === 400) {
          console.error("Invalid request:", error.message);
          return null; // Don't retry bad requests
        }

        throw error; // Re-throw unknown errors
      }
      throw error;
    }
  }
  console.error("Max retries exceeded");
  return null;
}

// Token counting before large requests
async function safeGenerateWithLargeContext(documents: string[], query: string) {
  const fullPrompt = documents.join("\n\n") + "\n\nQuestion: " + query;
  const { totalTokens } = await model.countTokens(fullPrompt);

  console.log(`Estimated tokens: ${totalTokens}`);

  if (totalTokens > 900_000) {
    throw new Error(`Context too large: ${totalTokens} tokens (max 1,000,000)`);
  }

  return generateWithRetry(fullPrompt);
}

// Usage
const result = await generateWithRetry("What is the Gemini context window size?");
console.log(result);
Try it yourself — TYPESCRIPT