P425 min

Production Deployment

Docker, PM2, systemd, structured logging, health checks, monitoring, and horizontal scaling.

On This Page

Key Concepts

  • Docker images optimized for MCP servers
  • PM2 and systemd for process management
  • Structured logging that avoids stdout pollution
  • Health check endpoints for monitoring
  • Horizontal scaling patterns for HTTP transport
  • Production deployment checklist

Your MCP server works locally. Tests pass. Time to run it in production. This module covers the operational concerns that separate a development prototype from a production service: packaging, process management, logging, monitoring, and scaling.

Docker Packaging

Docker is the standard way to package and distribute MCP servers, especially for HTTP-based transports (SSE, Streamable HTTP).

# Dockerfile for an MCP server
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# MCP servers should NOT run as root
RUN addgroup --system mcp && adduser --system --group mcp
USER mcp

# Environment variables for configuration
ENV NODE_ENV=production
ENV PORT=3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:3000/health || exit 1

EXPOSE 3000
CMD ["node", "dist/index.js"]

Key Docker considerations for MCP:

  • Multi-stage build to keep the image small (no dev dependencies)
  • Non-root user for security
  • Health check endpoint for container orchestration
  • For stdio servers: the container is invoked per-connection, not long-running. Use docker run --rm in the client config.
// Claude Desktop config using Docker for stdio
{
  "mcpServers": {
    "my-server": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--env", "API_KEY=...",
        "my-mcp-server:latest"
      ]
    }
  }
}

Process Management

For HTTP-based MCP servers running on a VM or bare metal, you need a process manager to handle restarts, log rotation, and graceful shutdowns.

// PM2 ecosystem config
// ecosystem.config.js
module.exports = {
  apps: [{
    name: "mcp-server",
    script: "dist/index.js",
    instances: 1,  // Or "max" for cluster mode (HTTP only)
    exec_mode: "fork",
    env: {
      NODE_ENV: "production",
      PORT: 3000,
    },
    // Restart on failure
    max_restarts: 10,
    min_uptime: "10s",
    // Graceful shutdown
    kill_timeout: 5000,
    listen_timeout: 10000,
    // Log management
    error_file: "/var/log/mcp-server/error.log",
    out_file: "/var/log/mcp-server/out.log",
    log_date_format: "YYYY-MM-DD HH:mm:ss",
    merge_logs: true,
  }],
};
# systemd service (alternative to PM2)
# /etc/systemd/system/mcp-server.service
[Unit]
Description=MCP Server
After=network.target

[Service]
Type=simple
User=mcp
WorkingDirectory=/opt/mcp-server
ExecStart=/usr/bin/node dist/index.js
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
Environment=NODE_ENV=production
Environment=PORT=3000

[Install]
WantedBy=multi-user.target

Structured Logging

Remember: stdout is reserved for MCP protocol messages. All application logging must go to stderr, files, or a logging service.

// Structured logging with pino (writes to stderr by default)
import pino from "pino";

const logger = pino({
  level: process.env.LOG_LEVEL || "info",
  // Write to stderr, never stdout
  transport: {
    target: "pino/file",
    options: { destination: 2 },  // fd 2 = stderr
  },
  // Structured fields for every log line
  base: {
    service: "mcp-weather-server",
    version: "1.2.0",
  },
});

// In tool handlers:
server.tool("get_weather", { city: z.string() }, async ({ city }) => {
  const startTime = Date.now();

  logger.info({ tool: "get_weather", city }, "Tool called");

  try {
    const result = await fetchWeather(city);

    logger.info({
      tool: "get_weather",
      city,
      duration_ms: Date.now() - startTime,
      success: true,
    }, "Tool completed");

    return { content: [{ type: "text", text: JSON.stringify(result) }] };
  } catch (e) {
    logger.error({
      tool: "get_weather",
      city,
      duration_ms: Date.now() - startTime,
      error: e.message,
    }, "Tool failed");

    throw e;
  }
});

Structured logging lets you query logs by tool name, duration, success/failure, and any other field. This is critical for debugging production issues.

Health Checks & Monitoring

// Health check endpoint for HTTP MCP servers
app.get("/health", async (req, res) => {
  const checks = {
    server: "ok",
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    dependencies: {} as Record<string, string>,
  };

  // Check external dependencies
  try {
    await db.query("SELECT 1");
    checks.dependencies.database = "ok";
  } catch {
    checks.dependencies.database = "error";
  }

  try {
    await fetch("https://api.weather.com/health");
    checks.dependencies.weather_api = "ok";
  } catch {
    checks.dependencies.weather_api = "error";
  }

  const allHealthy = Object.values(checks.dependencies)
    .every(v => v === "ok");

  res.status(allHealthy ? 200 : 503).json(checks);
});

// Metrics endpoint (Prometheus-compatible)
app.get("/metrics", (req, res) => {
  res.set("Content-Type", "text/plain");
  res.send([
    `mcp_tool_calls_total{tool="get_weather"} ${metrics.getWeatherCount}`,
    `mcp_tool_errors_total{tool="get_weather"} ${metrics.getWeatherErrors}`,
    `mcp_tool_duration_seconds{tool="get_weather",quantile="0.99"} ${metrics.getWeatherP99}`,
    `mcp_active_connections ${metrics.activeConnections}`,
  ].join("\n"));
});

Horizontal Scaling

Stdio MCP servers are inherently single-connection: one process per client. HTTP-based servers can scale horizontally.

// Scaling patterns:

// 1. STDIO: One process per client (managed by the client)
//    - Scale by letting clients spawn their own processes
//    - No shared state between instances
//    - Simplest model, no load balancer needed

// 2. SSE / Streamable HTTP: Load-balanced
//    - Multiple server instances behind a load balancer
//    - Sticky sessions required (SSE connections are stateful)
//    - Shared state via Redis or database

// nginx config for SSE load balancing
// upstream mcp_servers {
//   ip_hash;  # Sticky sessions based on client IP
//   server 127.0.0.1:3001;
//   server 127.0.0.1:3002;
//   server 127.0.0.1:3003;
// }
//
// server {
//   location /mcp {
//     proxy_pass http://mcp_servers;
//     proxy_http_version 1.1;
//     proxy_set_header Connection "";
//     proxy_buffering off;  # Required for SSE
//     proxy_read_timeout 86400;  # Keep SSE connections alive
//   }
// }

// 3. Serverless (Lambda, Cloud Functions)
//    - Each request is a new invocation
//    - Stateless by design
//    - Works for Streamable HTTP (stateless request/response)
//    - Does NOT work for SSE (requires persistent connection)

Exercise: Production-Ready Deployment

Take your MCP server and make it production-ready:

  1. Create a Dockerfile with multi-stage build, non-root user, and health check
  2. Add structured logging with pino (stderr only, structured fields)
  3. Add a /health endpoint that checks all dependencies
  4. Create a PM2 ecosystem config with restart policies and log rotation
  5. Write a docker-compose.yml that runs the server with environment variables

Check Your Understanding

  1. Why should Docker containers for MCP servers run as non-root?
  2. What is the difference between PM2 fork mode and cluster mode for MCP servers?
  3. Why must all logging go to stderr in an MCP server?
  4. What should a health check endpoint verify beyond “the server is running”?
  5. Why can't SSE-based MCP servers run on serverless platforms?

Key Takeaway

Production MCP servers need the same operational rigor as any other service: Docker for packaging, process managers for reliability, structured logging for debugging, health checks for monitoring, and a clear scaling strategy. The unique constraint is stdout — it belongs to the protocol, so everything else goes to stderr. Get these fundamentals right and your MCP server will run reliably at any scale.