MCP Server Learnings

Every MCP interaction we have covered so far flows in one direction: the AI model calls tools on the server. Sampling is the reverse — the server asks the AI model to do something. This is one of the most powerful (and least understood) features of MCP.

Think of it this way: tools give the AI hands. Sampling gives the server a brain.

What is Sampling?

Sampling allows an MCP server to send a prompt to the AI model (through the client) and get a response back. The server creates a message, the client forwards it to the model, and the model's response comes back to the server.

// The flow:
// 1. Server calls client.createMessage() with a prompt
// 2. Client shows the request to the user for approval (human-in-the-loop)
// 3. Client forwards the prompt to the AI model
// 4. AI model generates a response
// 5. Response is returned to the server
// 6. Server uses the response in its tool logic

// Server-side sampling request:
const response = await server.createMessage({
  messages: [
    {
      role: "user",
      content: {
        type: "text",
        text: "Classify the following error log entry as one of: " +
              "CRITICAL, WARNING, INFO, DEBUG\n\n" +
              `Log: ${logEntry}`,
      },
    },
  ],
  maxTokens: 50,
  // Optional: hint about which model to use
  modelPreferences: {
    hints: [{ name: "claude-3-haiku-20240307" }],
    speedPriority: 0.8,
    costPriority: 0.9,
  },
});

const classification = response.content.text;

The server doesn't need its own AI API key. It uses the client's existing connection to the model. This means the server can leverage AI capabilities without managing API credentials or billing.

When to Use Sampling

Sampling is powerful but adds latency and cost. Use it when the alternative is worse:

Classification — categorize data as part of a tool's logic (error severity, content type, intent detection)
Extraction — pull structured data from unstructured text (names from emails, dates from documents)
Summarization — condense large data before returning it to the model (summarize 100 log entries into key findings)
Decision support — ask the model to evaluate options before the server acts (should this alert be escalated?)

Do NOT use sampling for:

Simple transformations you can do with code (string manipulation, math)
Lookups that should be database queries
Operations where latency is critical (each sampling round trip adds 1-5 seconds)
Recursive tool calls (server calls model, model calls tool, tool calls model — infinite loop risk)

Implementation

// Practical example: A tool that summarizes git changes
server.tool("summarize_changes", {
  since: z.string().describe("Git ref or date, e.g., 'HEAD~10' or '2024-01-01'"),
}, async ({ since }) => {
  // Step 1: Get raw data (server-side, no AI needed)
  const changes = await getGitLog(since);

  // Step 2: Use sampling to summarize (AI needed)
  const summary = await server.createMessage({
    messages: [{
      role: "user",
      content: {
        type: "text",
        text: [
          "Summarize these git changes into 3-5 bullet points.",
          "Focus on features added, bugs fixed, and breaking changes.",
          "Be concise — one line per bullet.",
          "",
          changes.map(c => `${c.hash} ${c.message}`).join("\n"),
        ].join("\n"),
      },
    }],
    maxTokens: 500,
    modelPreferences: {
      costPriority: 0.8,  // Prefer cheaper model for summaries
    },
  });

  // Step 3: Return the enriched result
  return {
    content: [{
      type: "text",
      text: JSON.stringify({
        period: since,
        total_commits: changes.length,
        summary: summary.content.text,
        raw_commits: changes.slice(0, 5),  // Include a few raw entries for reference
      }, null, 2),
    }],
  };
});

Human-in-the-Loop

The MCP spec mandates that sampling requests go through the client, not directly to the AI provider. This is intentional — it keeps the human in control.

The client can:

Show the sampling request to the user for approval before sending it
Modify the prompt (e.g., add safety instructions)
Choose which model to use (ignoring the server's hint)
Reject the request entirely
Rate limit sampling requests

// Model preferences are HINTS, not requirements
// The client makes the final decision
modelPreferences: {
  hints: [
    { name: "claude-3-haiku-20240307" },  // Preferred
    { name: "claude-3-sonnet-20240229" }, // Fallback
  ],
  // Priority weights (0-1)
  costPriority: 0.9,     // I care about cost
  speedPriority: 0.7,    // Speed matters too
  intelligencePriority: 0.3, // Don't need the smartest model
}

Design your sampling requests to work with any model. Don't rely on model-specific features or expect exact output formats.

Practical Patterns

Pattern 1: Classify-then-route

// Use sampling to classify, then route to the right handler
server.tool("process_support_ticket", {
  ticket_text: z.string(),
}, async ({ ticket_text }) => {
  // Classify with sampling
  const classification = await server.createMessage({
    messages: [{
      role: "user",
      content: {
        type: "text",
        text: `Classify this support ticket into exactly one category:
billing, technical, account, feedback

Ticket: ${ticket_text}

Reply with ONLY the category name.`,
      },
    }],
    maxTokens: 20,
  });

  const category = classification.content.text.trim().toLowerCase();

  // Route based on classification
  switch (category) {
    case "billing": return await handleBilling(ticket_text);
    case "technical": return await handleTechnical(ticket_text);
    case "account": return await handleAccount(ticket_text);
    default: return await handleGeneral(ticket_text);
  }
});

Pattern 2: Reduce-then-return

// Use sampling to reduce large data before returning
server.tool("analyze_logs", {
  service: z.string(),
  hours: z.number().default(24),
}, async ({ service, hours }) => {
  const logs = await fetchLogs(service, hours);  // Could be thousands of lines

  if (logs.length > 100) {
    // Too many to return raw — summarize with sampling
    const analysis = await server.createMessage({
      messages: [{
        role: "user",
        content: {
          type: "text",
          text: `Analyze these ${logs.length} log entries and report:
1. Error count and top error types
2. Any patterns or anomalies
3. Recommended actions

Logs (showing first 200):
${logs.slice(0, 200).join("\n")}`,
        },
      }],
      maxTokens: 1000,
    });

    return {
      content: [{ type: "text", text: analysis.content.text }],
    };
  }

  // Small enough to return raw
  return {
    content: [{ type: "text", text: logs.join("\n") }],
  };
});

Exercise: Build a Sampling-Powered Tool

Build an MCP tool called smart_search that:

Takes a natural language query from the user
Uses sampling to extract search keywords from the query
Runs the keywords against a database full-text search
Uses sampling to rank and summarize the top results
Returns the ranked results with AI-generated snippets

Consider: what modelPreferences would you set? How would you handle the case where the client rejects the sampling request?

Check Your Understanding

What is sampling in MCP, and how does the data flow differ from tool calls?
Name three appropriate use cases for sampling and three inappropriate ones.
Why do sampling requests go through the client instead of directly to the AI provider?
What are model preference hints, and why are they “hints” not “requirements”?
How would you handle a sampling request being rejected by the client?

Key Takeaway

Sampling gives your MCP server access to AI intelligence without managing its own model connection. Use it for classification, extraction, summarization, and decision support within your tools. But use it judiciously — each sampling call adds latency and cost. The client controls the final model choice, and the human can review every request. Design your sampling prompts to be model-agnostic and failure-tolerant.