Production Patterns
Vertex AI: Production Deployment
Deploy Gemini on Google Cloud Vertex AI for enterprise security, global scale, and tight Google Cloud integration.
Google AI Studio vs Vertex AI
When building production applications with Gemini, choose the right access path:
| Feature | Google AI SDK | Vertex AI |
|---|---|---|
| Authentication | API key | Google Cloud IAM |
| VPC / Private network | No | Yes |
| Data residency | Limited | Full control |
| SLA | No SLA | Enterprise SLA |
| Usage | Prototyping, small apps | Production, enterprise |
| Pricing | Per-token | Per-token + Google Cloud |
| Context caching | Yes | Yes |
| Batch prediction | No | Yes |
When to Use Vertex AI
- You have compliance requirements (HIPAA, SOC2, GDPR data residency)
- Your application runs on Google Cloud
- You need VPC-SC for network isolation
- You require enterprise SLAs and support
- You need batch prediction for offline processing of large datasets
- You want tight integration with BigQuery, Cloud Storage, or Cloud Run
Authentication
Vertex AI uses Google Cloud Application Default Credentials (ADC) — no API key needed:
bash
gcloud auth application-default loginOr use a service account key in production:
bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"Context Caching on Vertex AI
Context caching is especially valuable on Vertex AI for repeated queries against large system prompts or document corpora. Cache the content once and reuse the cache ID across requests.
Model Garden
Vertex AI's Model Garden provides access not just to Gemini, but also to open-source models (Llama 3, Mistral, Gemma), specialized models, and fine-tuning capabilities — all through the same infrastructure.
Example
typescript
// Install: npm install @google-cloud/vertexai
import { VertexAI } from "@google-cloud/vertexai";
// Initialize Vertex AI client
const vertexAI = new VertexAI({
project: process.env.GOOGLE_CLOUD_PROJECT!,
location: "us-central1", // or "europe-west4" for EU data residency
});
const model = vertexAI.getGenerativeModel({
model: "gemini-1.5-pro-001",
systemInstruction: "You are a production AI assistant. Be accurate and concise.",
generationConfig: {
temperature: 0.2,
maxOutputTokens: 2048,
},
});
// Multi-turn chat
const chat = model.startChat();
const result = await chat.sendMessage("Analyze the architecture of a microservices system.");
console.log(result.response.candidates?.[0]?.content?.parts?.[0]?.text);
// Batch prediction (Vertex AI exclusive)
// For offline processing of thousands of prompts
import { BatchPredictionJob } from "@google-cloud/vertexai";
// Configure a batch prediction job via Vertex AI Console or SDK
// This processes input JSONL from Cloud Storage and writes output to Cloud Storage
const batchJob = {
displayName: "nightly-content-analysis",
model: "publishers/google/models/gemini-1.5-pro-001",
inputConfig: {
instancesFormat: "jsonl",
gcsSource: { uris: ["gs://my-bucket/inputs/prompts.jsonl"] },
},
outputConfig: {
predictionsFormat: "jsonl",
gcsDestination: { outputUriPrefix: "gs://my-bucket/outputs/" },
},
};
console.log("Batch job config:", JSON.stringify(batchJob, null, 2));Try it yourself — TYPESCRIPT