Production Patterns

Error Handling and Rate Limits

Handle Gemini API errors gracefully with proper retry logic, rate limit management, and production-ready error patterns.

Common Error Types

Error	HTTP Status	Cause	Fix
RESOURCE_EXHAUSTED	429	Rate limit exceeded	Exponential backoff + retry
INVALID_ARGUMENT	400	Bad request format	Validate input before sending
PERMISSION_DENIED	403	Invalid API key	Check key and billing
UNAVAILABLE	503	Temporary overload	Retry with backoff
DEADLINE_EXCEEDED	504	Request timeout	Reduce prompt size or use streaming

Rate Limits (Free Tier)

Model	RPM	TPM
gemini-1.5-flash	15	1,000,000
gemini-1.5-pro	2	32,000

Paid tier limits are much higher. Check Google AI Studio for current quotas.

Exponential Backoff

The standard retry pattern for 429 and 503 errors. Each retry waits progressively longer, with jitter to avoid thundering herd:

text

Attempt 1: wait 1s
Attempt 2: wait 2s
Attempt 3: wait 4s
Attempt 4: wait 8s
Maximum: ~60s

Token Counting

Before sending large prompts, estimate token count with countTokens() to avoid hitting context limits or unexpected costs.

Validation Checklist

Before production deployment:

[ ] API key stored in environment variables
[ ] Retry logic implemented for 429/503 errors
[ ] Safety filter responses handled gracefully
[ ] Token counting for large inputs
[ ] Streaming used for outputs > 500 tokens
[ ] Timeouts configured (avoid infinite waits)

Example

typescript

import { GoogleGenerativeAI, GoogleGenerativeAIError } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

// Exponential backoff retry
async function generateWithRetry(
  prompt: string,
  maxRetries = 5,
): Promise<string | null> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const result = await model.generateContent(prompt);
      return result.response.text();
    } catch (error) {
      if (error instanceof GoogleGenerativeAIError) {
        const status = (error as any).status;

        if (status === 429 || status === 503) {
          const waitMs = Math.min(1000 * 2 ** attempt + Math.random() * 1000, 60000);
          console.warn(`Attempt ${attempt + 1} failed (${status}), retrying in ${waitMs}ms...`);
          await new Promise(resolve => setTimeout(resolve, waitMs));
          continue;
        }

        if (status === 400) {
          console.error("Invalid request:", error.message);
          return null; // Don't retry bad requests
        }

        throw error; // Re-throw unknown errors
      }
      throw error;
    }
  }
  console.error("Max retries exceeded");
  return null;
}

// Token counting before large requests
async function safeGenerateWithLargeContext(documents: string[], query: string) {
  const fullPrompt = documents.join("\n\n") + "\n\nQuestion: " + query;
  const { totalTokens } = await model.countTokens(fullPrompt);

  console.log(`Estimated tokens: ${totalTokens}`);

  if (totalTokens > 900_000) {
    throw new Error(`Context too large: ${totalTokens} tokens (max 1,000,000)`);
  }

  return generateWithRetry(fullPrompt);
}

// Usage
const result = await generateWithRetry("What is the Gemini context window size?");
console.log(result);

Try it yourself — TYPESCRIPT

import { GoogleGenerativeAI, GoogleGenerativeAIError } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

// Exponential backoff retry
async function generateWithRetry(
  prompt: string,
  maxRetries = 5,
): Promise<string | null> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const result = await model.generateContent(prompt);
      return result.response.text();
    } catch (error) {
      if (error instanceof GoogleGenerativeAIError) {
        const status = (error as any).status;

if (status === 429 || status === 503) {
          const waitMs = Math.min(1000 * 2 ** attempt + Math.random() * 1000, 60000);
          console.warn(`Attempt ${attempt + 1} failed (${status}), retrying in ${waitMs}ms...`);
          await new Promise(resolve => setTimeout(resolve, waitMs));
          continue;
        }

if (status === 400) {
          console.error("Invalid request:", error.message);
          return null; // Don't retry bad requests
        }

throw error; // Re-throw unknown errors
      }
      throw error;
    }
  }
  console.error("Max retries exceeded");
  return null;
}

// Token counting before large requests
async function safeGenerateWithLargeContext(documents: string[], query: string) {
  const fullPrompt = documents.join("\n\n") + "\n\nQuestion: " + query;
  const { totalTokens } = await model.countTokens(fullPrompt);

console.log(`Estimated tokens: ${totalTokens}`);

if (totalTokens > 900_000) {
    throw new Error(`Context too large: ${totalTokens} tokens (max 1,000,000)`);
  }

return generateWithRetry(fullPrompt);
}

// Usage
const result = await generateWithRetry("What is the Gemini context window size?");
console.log(result);