Cloud & Deployment

Monitoring, Logging & Incident Response

You cannot fix what you cannot see — implement the observability stack every production application needs.

You Cannot Fix What You Cannot See

Launching an application without monitoring is like flying blind. You won't know about errors until users complain. You won't know the cause until you dig through server logs. You won't know the impact until the damage is done.

Monitoring and observability are not optional — they are part of the definition of "production-ready."

Monitoring Categories

Uptime Monitoring — Is the application responding? Tools: UptimeRobot (free), Better Uptime, Pingdom. These check your URL every 30 seconds and send alerts when it stops responding.

Error Monitoring — When errors occur, who gets notified? Sentry is the industry standard. It captures exceptions, groups them by root cause, and sends alerts.

Performance Monitoring — Is the application fast? Vercel Analytics, New Relic, Datadog. Track Core Web Vitals, API response times, database query performance.

Security Monitoring — Is the application being attacked? Rate limit violations, authentication failures, unusual traffic patterns.

Setting Up Sentry

typescript
// lib/sentry.ts
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 1.0,
  integrations: [
    Sentry.replayIntegration(),
  ],
});
typescript
// In API routes — capture errors with context
export async function POST(req: Request) {
  try {
    // ... your code
  } catch (error) {
    Sentry.captureException(error, {
      extra: {
        userId: currentUser?.id,
        endpoint: req.url,
      },
    });
    return Response.json({ error: 'Internal server error' }, { status: 500 });
  }
}

Health Check Endpoints

Every production application should expose a health check endpoint:

typescript
// app/api/health/route.ts
export async function GET() {
  const checks: Record<string, 'healthy' | 'unhealthy'> = {};
  const startTime = Date.now();

  // Check database connectivity
  try {
    await db.query('SELECT 1');
    checks.database = 'healthy';
  } catch {
    checks.database = 'unhealthy';
  }

  // Check Redis
  try {
    await redis.ping();
    checks.cache = 'healthy';
  } catch {
    checks.cache = 'unhealthy';
  }

  const allHealthy = Object.values(checks).every(s => s === 'healthy');
  const status = allHealthy ? 200 : 503;

  return Response.json({
    status: allHealthy ? 'healthy' : 'degraded',
    checks,
    latencyMs: Date.now() - startTime,
    timestamp: new Date().toISOString(),
  }, { status });
}

Uptime monitors can check /api/health and alert if it returns a non-200 status.

Structured Logging

console.log is not sufficient for production logging. Use structured JSON logs that can be queried:

typescript
// lib/logger.ts
type LogLevel = 'info' | 'warn' | 'error';

export function log(level: LogLevel, message: string, context?: Record<string, unknown>) {
  const entry = {
    timestamp: new Date().toISOString(),
    level,
    message,
    environment: process.env.NODE_ENV,
    ...context,
  };
  console[level](JSON.stringify(entry));
}

// Usage
log('error', 'Payment failed', {
  userId: user.id,
  amount: 2999,
  errorCode: 'card_declined',
});

Structured logs can be sent to Axiom, Logtail, or Datadog for aggregation and querying.

Incident Response Basics

When something breaks in production, follow this process:

  1. Detect — Monitoring alert fires, or user reports an issue
  2. Triage — How many users affected? Is it total outage or partial? What's the error?
  3. Communicate — Notify stakeholders: "We are aware of an issue affecting X. Investigating."
  4. Fix — Implement fix or rollback to previous deployment
  5. Verify — Confirm the fix resolves the issue. Monitor error rates.
  6. Postmortem — Document what happened, why, how it was fixed, and how to prevent recurrence

Key Takeaways

  • Uptime monitoring, error monitoring, and performance monitoring are the three pillars of production observability
  • Sentry captures, groups, and alerts on exceptions — integrate it before you launch, not after the first incident
  • Health check endpoints at /api/health allow uptime monitors to detect deep failures (database down, not just HTTP errors)
  • Structured JSON logs are queryable and aggregatable — console.log strings are not
  • Incident response is a process: detect → triage → communicate → fix → verify → postmortem

Example

typescript
// Complete health check endpoint
export async function GET() {
  const checks: Record<string, { status: 'ok' | 'error'; latencyMs?: number }> = {};

  // Database check
  const dbStart = Date.now();
  try {
    await db.execute('SELECT 1');
    checks.database = { status: 'ok', latencyMs: Date.now() - dbStart };
  } catch (error) {
    checks.database = { status: 'error' };
    console.error('Health check: database failure', error);
  }

  // Redis check
  const redisStart = Date.now();
  try {
    await redis.ping();
    checks.redis = { status: 'ok', latencyMs: Date.now() - redisStart };
  } catch (error) {
    checks.redis = { status: 'error' };
  }

  const allOk = Object.values(checks).every(c => c.status === 'ok');

  return Response.json(
    { status: allOk ? 'healthy' : 'degraded', checks },
    { status: allOk ? 200 : 503 }
  );
}
Try it yourself — TYPESCRIPT

Docker, AWS, Vercel, Netlify, GitHub, GitHub Actions are trademarks of Docker, Inc., Amazon.com, Inc., Vercel, Inc., Netlify, Inc., Microsoft Corporation. DevForge Academy is not affiliated with, endorsed by, or sponsored by these companies. Referenced for educational purposes only. See full disclaimers