Production Patterns

Vertex AI: Production Deployment

Deploy Gemini on Google Cloud Vertex AI for enterprise security, global scale, and tight Google Cloud integration.

Google AI Studio vs Vertex AI

When building production applications with Gemini, choose the right access path:

Feature	Google AI SDK	Vertex AI
Authentication	API key	Google Cloud IAM
VPC / Private network	No	Yes
Data residency	Limited	Full control
SLA	No SLA	Enterprise SLA
Usage	Prototyping, small apps	Production, enterprise
Pricing	Per-token	Per-token + Google Cloud
Context caching	Yes	Yes
Batch prediction	No	Yes

When to Use Vertex AI

You have compliance requirements (HIPAA, SOC2, GDPR data residency)
Your application runs on Google Cloud
You need VPC-SC for network isolation
You require enterprise SLAs and support
You need batch prediction for offline processing of large datasets
You want tight integration with BigQuery, Cloud Storage, or Cloud Run

Authentication

Vertex AI uses Google Cloud Application Default Credentials (ADC) — no API key needed:

bash

gcloud auth application-default login

Or use a service account key in production:

bash

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Context Caching on Vertex AI

Context caching is especially valuable on Vertex AI for repeated queries against large system prompts or document corpora. Cache the content once and reuse the cache ID across requests.

Model Garden

Vertex AI's Model Garden provides access not just to Gemini, but also to open-source models (Llama 3, Mistral, Gemma), specialized models, and fine-tuning capabilities — all through the same infrastructure.

Example

typescript

// Install: npm install @google-cloud/vertexai

import { VertexAI } from "@google-cloud/vertexai";

// Initialize Vertex AI client
const vertexAI = new VertexAI({
  project: process.env.GOOGLE_CLOUD_PROJECT!,
  location: "us-central1", // or "europe-west4" for EU data residency
});

const model = vertexAI.getGenerativeModel({
  model: "gemini-1.5-pro-001",
  systemInstruction: "You are a production AI assistant. Be accurate and concise.",
  generationConfig: {
    temperature: 0.2,
    maxOutputTokens: 2048,
  },
});

// Multi-turn chat
const chat = model.startChat();
const result = await chat.sendMessage("Analyze the architecture of a microservices system.");
console.log(result.response.candidates?.[0]?.content?.parts?.[0]?.text);

// Batch prediction (Vertex AI exclusive)
// For offline processing of thousands of prompts
import { BatchPredictionJob } from "@google-cloud/vertexai";

// Configure a batch prediction job via Vertex AI Console or SDK
// This processes input JSONL from Cloud Storage and writes output to Cloud Storage
const batchJob = {
  displayName: "nightly-content-analysis",
  model: "publishers/google/models/gemini-1.5-pro-001",
  inputConfig: {
    instancesFormat: "jsonl",
    gcsSource: { uris: ["gs://my-bucket/inputs/prompts.jsonl"] },
  },
  outputConfig: {
    predictionsFormat: "jsonl",
    gcsDestination: { outputUriPrefix: "gs://my-bucket/outputs/" },
  },
};

console.log("Batch job config:", JSON.stringify(batchJob, null, 2));

Try it yourself — TYPESCRIPT

// Install: npm install @google-cloud/vertexai

import { VertexAI } from "@google-cloud/vertexai";

// Initialize Vertex AI client
const vertexAI = new VertexAI({
  project: process.env.GOOGLE_CLOUD_PROJECT!,
  location: "us-central1", // or "europe-west4" for EU data residency
});

const model = vertexAI.getGenerativeModel({
  model: "gemini-1.5-pro-001",
  systemInstruction: "You are a production AI assistant. Be accurate and concise.",
  generationConfig: {
    temperature: 0.2,
    maxOutputTokens: 2048,
  },
});

// Multi-turn chat
const chat = model.startChat();
const result = await chat.sendMessage("Analyze the architecture of a microservices system.");
console.log(result.response.candidates?.[0]?.content?.parts?.[0]?.text);

// Batch prediction (Vertex AI exclusive)
// For offline processing of thousands of prompts
import { BatchPredictionJob } from "@google-cloud/vertexai";

// Configure a batch prediction job via Vertex AI Console or SDK
// This processes input JSONL from Cloud Storage and writes output to Cloud Storage
const batchJob = {
  displayName: "nightly-content-analysis",
  model: "publishers/google/models/gemini-1.5-pro-001",
  inputConfig: {
    instancesFormat: "jsonl",
    gcsSource: { uris: ["gs://my-bucket/inputs/prompts.jsonl"] },
  },
  outputConfig: {
    predictionsFormat: "jsonl",
    gcsDestination: { outputUriPrefix: "gs://my-bucket/outputs/" },
  },
};

console.log("Batch job config:", JSON.stringify(batchJob, null, 2));