Cloud & Deployment
Monitoring, Logging & Incident Response
You cannot fix what you cannot see — implement the observability stack every production application needs.
You Cannot Fix What You Cannot See
Launching an application without monitoring is like flying blind. You won't know about errors until users complain. You won't know the cause until you dig through server logs. You won't know the impact until the damage is done.
Monitoring and observability are not optional — they are part of the definition of "production-ready."
Monitoring Categories
Uptime Monitoring — Is the application responding? Tools: UptimeRobot (free), Better Uptime, Pingdom. These check your URL every 30 seconds and send alerts when it stops responding.
Error Monitoring — When errors occur, who gets notified? Sentry is the industry standard. It captures exceptions, groups them by root cause, and sends alerts.
Performance Monitoring — Is the application fast? Vercel Analytics, New Relic, Datadog. Track Core Web Vitals, API response times, database query performance.
Security Monitoring — Is the application being attacked? Rate limit violations, authentication failures, unusual traffic patterns.
Setting Up Sentry
// lib/sentry.ts
import * as Sentry from '@sentry/nextjs';
Sentry.init({
dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: 1.0,
integrations: [
Sentry.replayIntegration(),
],
});// In API routes — capture errors with context
export async function POST(req: Request) {
try {
// ... your code
} catch (error) {
Sentry.captureException(error, {
extra: {
userId: currentUser?.id,
endpoint: req.url,
},
});
return Response.json({ error: 'Internal server error' }, { status: 500 });
}
}Health Check Endpoints
Every production application should expose a health check endpoint:
// app/api/health/route.ts
export async function GET() {
const checks: Record<string, 'healthy' | 'unhealthy'> = {};
const startTime = Date.now();
// Check database connectivity
try {
await db.query('SELECT 1');
checks.database = 'healthy';
} catch {
checks.database = 'unhealthy';
}
// Check Redis
try {
await redis.ping();
checks.cache = 'healthy';
} catch {
checks.cache = 'unhealthy';
}
const allHealthy = Object.values(checks).every(s => s === 'healthy');
const status = allHealthy ? 200 : 503;
return Response.json({
status: allHealthy ? 'healthy' : 'degraded',
checks,
latencyMs: Date.now() - startTime,
timestamp: new Date().toISOString(),
}, { status });
}Uptime monitors can check /api/health and alert if it returns a non-200 status.
Structured Logging
console.log is not sufficient for production logging. Use structured JSON logs that can be queried:
// lib/logger.ts
type LogLevel = 'info' | 'warn' | 'error';
export function log(level: LogLevel, message: string, context?: Record<string, unknown>) {
const entry = {
timestamp: new Date().toISOString(),
level,
message,
environment: process.env.NODE_ENV,
...context,
};
console[level](JSON.stringify(entry));
}
// Usage
log('error', 'Payment failed', {
userId: user.id,
amount: 2999,
errorCode: 'card_declined',
});Structured logs can be sent to Axiom, Logtail, or Datadog for aggregation and querying.
Incident Response Basics
When something breaks in production, follow this process:
- Detect — Monitoring alert fires, or user reports an issue
- Triage — How many users affected? Is it total outage or partial? What's the error?
- Communicate — Notify stakeholders: "We are aware of an issue affecting X. Investigating."
- Fix — Implement fix or rollback to previous deployment
- Verify — Confirm the fix resolves the issue. Monitor error rates.
- Postmortem — Document what happened, why, how it was fixed, and how to prevent recurrence
Key Takeaways
- Uptime monitoring, error monitoring, and performance monitoring are the three pillars of production observability
- Sentry captures, groups, and alerts on exceptions — integrate it before you launch, not after the first incident
- Health check endpoints at
/api/healthallow uptime monitors to detect deep failures (database down, not just HTTP errors) - Structured JSON logs are queryable and aggregatable —
console.logstrings are not - Incident response is a process: detect → triage → communicate → fix → verify → postmortem
Example
// Complete health check endpoint
export async function GET() {
const checks: Record<string, { status: 'ok' | 'error'; latencyMs?: number }> = {};
// Database check
const dbStart = Date.now();
try {
await db.execute('SELECT 1');
checks.database = { status: 'ok', latencyMs: Date.now() - dbStart };
} catch (error) {
checks.database = { status: 'error' };
console.error('Health check: database failure', error);
}
// Redis check
const redisStart = Date.now();
try {
await redis.ping();
checks.redis = { status: 'ok', latencyMs: Date.now() - redisStart };
} catch (error) {
checks.redis = { status: 'error' };
}
const allOk = Object.values(checks).every(c => c.status === 'ok');
return Response.json(
{ status: allOk ? 'healthy' : 'degraded', checks },
{ status: allOk ? 200 : 503 }
);
}