Production Patterns

Vertex AI: Production Deployment

Deploy Gemini on Google Cloud Vertex AI for enterprise security, global scale, and tight Google Cloud integration.

Google AI Studio vs Vertex AI

When building production applications with Gemini, choose the right access path:

FeatureGoogle AI SDKVertex AI
AuthenticationAPI keyGoogle Cloud IAM
VPC / Private networkNoYes
Data residencyLimitedFull control
SLANo SLAEnterprise SLA
UsagePrototyping, small appsProduction, enterprise
PricingPer-tokenPer-token + Google Cloud
Context cachingYesYes
Batch predictionNoYes

When to Use Vertex AI

  • You have compliance requirements (HIPAA, SOC2, GDPR data residency)
  • Your application runs on Google Cloud
  • You need VPC-SC for network isolation
  • You require enterprise SLAs and support
  • You need batch prediction for offline processing of large datasets
  • You want tight integration with BigQuery, Cloud Storage, or Cloud Run

Authentication

Vertex AI uses Google Cloud Application Default Credentials (ADC) — no API key needed:

bash
gcloud auth application-default login

Or use a service account key in production:

bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Context Caching on Vertex AI

Context caching is especially valuable on Vertex AI for repeated queries against large system prompts or document corpora. Cache the content once and reuse the cache ID across requests.

Model Garden

Vertex AI's Model Garden provides access not just to Gemini, but also to open-source models (Llama 3, Mistral, Gemma), specialized models, and fine-tuning capabilities — all through the same infrastructure.

Example

typescript
// Install: npm install @google-cloud/vertexai

import { VertexAI } from "@google-cloud/vertexai";

// Initialize Vertex AI client
const vertexAI = new VertexAI({
  project: process.env.GOOGLE_CLOUD_PROJECT!,
  location: "us-central1", // or "europe-west4" for EU data residency
});

const model = vertexAI.getGenerativeModel({
  model: "gemini-1.5-pro-001",
  systemInstruction: "You are a production AI assistant. Be accurate and concise.",
  generationConfig: {
    temperature: 0.2,
    maxOutputTokens: 2048,
  },
});

// Multi-turn chat
const chat = model.startChat();
const result = await chat.sendMessage("Analyze the architecture of a microservices system.");
console.log(result.response.candidates?.[0]?.content?.parts?.[0]?.text);

// Batch prediction (Vertex AI exclusive)
// For offline processing of thousands of prompts
import { BatchPredictionJob } from "@google-cloud/vertexai";

// Configure a batch prediction job via Vertex AI Console or SDK
// This processes input JSONL from Cloud Storage and writes output to Cloud Storage
const batchJob = {
  displayName: "nightly-content-analysis",
  model: "publishers/google/models/gemini-1.5-pro-001",
  inputConfig: {
    instancesFormat: "jsonl",
    gcsSource: { uris: ["gs://my-bucket/inputs/prompts.jsonl"] },
  },
  outputConfig: {
    predictionsFormat: "jsonl",
    gcsDestination: { outputUriPrefix: "gs://my-bucket/outputs/" },
  },
};

console.log("Batch job config:", JSON.stringify(batchJob, null, 2));
Try it yourself — TYPESCRIPT