LLM Agents and the Model Context Protocol (MCP)

An "agent" in 2026 is just a loop: the model proposes a tool call, your code executes it, you feed the result back, repeat until the model emits a final answer. The interesting questions are which loop (ReAct, planning, multi-agent), which tools (function-calling vs MCP servers vs LangGraph nodes), and where the state lives.

This page covers the agent loop itself, how the Model Context Protocol (MCP) is changing how tools are wired in, and a working example of an MCP server plus an Anthropic-API agent that uses it.

1. The Agent Loop

Strip away the framework noise and an agent is this:


# Pseudocode for the agent loop common to every framework.
messages = [{"role": "user", "content": user_request}]

while True:
    response = llm.create(model=MODEL, tools=TOOLS, messages=messages)
    messages.append(response.message)

    if response.stop_reason == "end_turn":
        return response.message.content       # final answer

    # otherwise the model emitted one or more tool_use blocks
    for tool_use in response.tool_uses:
        result = TOOL_REGISTRY[tool_use.name](**tool_use.input)
        messages.append({
            "role": "user",
            "content": [{"type": "tool_result",
                         "tool_use_id": tool_use.id,
                         "content": str(result)}],
        })

Everything else — ReAct prompting, planners, multi-agent supervisors, MCP — is a way to make this loop more reliable, more observable, or more reusable.

2. ReAct: Reason then Act

ReAct (Yao et al., 2022) interleaves a free-text "thought" with each action so the model commits to a rationale before calling a tool. With native tool-use APIs, the "thought" is now usually rolled into the model's hidden reasoning (Claude's extended_thinking, OpenAI's o-series internal CoT). On models without thinking, you still get a measurable lift by asking for it explicitly.


import anthropic

client = anthropic.Anthropic()

SYSTEM = """You are a research agent. For every step, output a  block
with: (1) what you know, (2) what's missing, (3) which tool you'll call and why.
Then emit the tool call. After the tool returns, repeat until you can answer."""

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    system=SYSTEM,
    tools=[
        {"name": "web_search",
         "description": "Search the public web. Returns top 5 snippets.",
         "input_schema": {"type": "object",
                          "properties": {"query": {"type": "string"}},
                          "required": ["query"]}},
    ],
    messages=[{"role": "user", "content": "Who won the 2025 Turing Award and for what?"}],
)

3. Planning vs Reactive Agents

Two patterns dominate:

Reactive (ReAct-style): one tool call at a time, decide the next step from the last result. Cheap, robust, but slow on long tasks because each step is a full round-trip.
Planning (Plan-and-Execute): first call asks the model to produce a plan (a list of subtasks); a worker loop executes each subtask; an optional replanner adjusts when steps fail. Far fewer total LLM calls on long tasks; brittle when the plan is wrong.


# Plan-and-execute skeleton.
plan = llm.create(
    model=MODEL,
    system="Decompose the user task into a numbered list of atomic subtasks. "
           "Each subtask must be completable with a single tool call.",
    messages=[{"role": "user", "content": user_request}],
).text

subtasks = parse_numbered_list(plan)
results = []
for step in subtasks:
    out = run_react_loop(step, tools=TOOLS)   # short reactive loop per subtask
    results.append(out)

final = llm.create(
    model=MODEL,
    system="Synthesize the subtask results into a final answer.",
    messages=[{"role": "user", "content": json.dumps({"task": user_request,
                                                       "results": results})}],
).text

Rule of thumb: reactive for tasks under ~5 steps, planning for anything longer or where step ordering matters (e.g., refactoring across files).

4. Multi-Agent Orchestration

Multi-agent systems split work across specialized agents (researcher, coder, critic) coordinated by a supervisor. Useful when subtasks need different system prompts, different tool sets, or different models (e.g., a small fast model for retrieval, a larger model for synthesis).


# Minimal supervisor pattern. Each "agent" is a function that takes a task
# and returns a result; the supervisor LLM decides who to dispatch to.

AGENTS = {
    "researcher": lambda q: run_react_loop(q, tools=[WEB_SEARCH, FETCH_URL]),
    "coder":      lambda q: run_react_loop(q, tools=[READ_FILE, WRITE_FILE, RUN_TESTS]),
    "critic":     lambda q: llm.create(system="Find flaws in the answer.",
                                       messages=[{"role": "user", "content": q}]).text,
}

def supervise(user_request):
    state = {"task": user_request, "history": []}
    for _ in range(MAX_TURNS):
        decision = llm.create(
            model="claude-opus-4-7",
            system="You route work to: researcher, coder, critic, or DONE. "
                   "Reply with JSON: {agent, instructions} or {agent: 'DONE', answer}.",
            messages=[{"role": "user", "content": json.dumps(state)}],
        )
        d = json.loads(decision.text)
        if d["agent"] == "DONE":
            return d["answer"]
        result = AGENTS[d["agent"]](d["instructions"])
        state["history"].append({"agent": d["agent"], "result": result})

Multi-agent setups consume tokens fast. Always set MAX_TURNS and a per-agent budget. The supervisor is the most failure-prone component — keep its prompt short and its decision space small.

5. The Model Context Protocol (MCP)

MCP (introduced by Anthropic in late 2024, broadly adopted across providers in 2025) is an open JSON-RPC protocol that standardizes how an LLM application discovers and invokes external capabilities. It defines three primitives:

Tools — callable functions (the same shape as function-calling).
Resources — readable data sources (files, DB rows, URLs) the model can request by URI.
Prompts — reusable templated prompts the server exposes to the client.

The split that matters in practice: an MCP server wraps a tool surface (Slack, Postgres, your internal API) once, and any MCP client (Claude Desktop, Cursor, Zed, your own agent) can use it without per-app integration code. You stop hand-wiring function definitions into every agent — you point the agent at the server.

Why this matters in production:

Decoupling: the team that owns Postgres ships an MCP server; agent teams consume it. Tool changes don't require redeploying every agent.
Reusability: the same MCP server works in your production agent, your IDE, and an interactive Claude session for debugging.
Auth and transport: MCP standardizes stdio (local) and HTTP+SSE (remote) transports, plus an OAuth flow for hosted servers.

6. Building an MCP Server with FastMCP

fastmcp is the high-level Python SDK. The decorators turn ordinary functions into MCP tools, complete with JSON-Schema generated from type hints.


pip install fastmcp anthropic


# orders_server.py — run with: python orders_server.py
from fastmcp import FastMCP

mcp = FastMCP("orders")

# Pretend in-memory order DB.
ORDERS = {
    "A-482": {"status": "in_transit", "eta": "2026-04-29", "carrier": "UPS"},
    "A-501": {"status": "delivered",  "eta": "2026-04-22", "carrier": "FedEx"},
}

@mcp.tool()
def get_order_status(order_id: str) -> dict:
    """Look up the shipping status of a customer order by ID."""
    if order_id not in ORDERS:
        return {"error": f"order {order_id} not found"}
    return ORDERS[order_id]

@mcp.tool()
def list_orders(status: str | None = None) -> list[dict]:
    """List orders, optionally filtered by status (in_transit, delivered, returned)."""
    items = [{"id": oid, **o} for oid, o in ORDERS.items()]
    if status:
        items = [o for o in items if o["status"] == status]
    return items

@mcp.resource("orders://schema")
def schema() -> str:
    """Return the order record schema as Markdown — readable as an MCP resource."""
    return "# Order schema\n- id: str\n- status: in_transit | delivered | returned\n- eta: ISO date\n- carrier: str"

if __name__ == "__main__":
    mcp.run()   # defaults to stdio transport

Run it with the MCP inspector to verify the tools list:


npx @modelcontextprotocol/inspector python orders_server.py
# opens a browser UI showing tools, resources, and a tester

7. An Anthropic Agent Calling an MCP Server

The Anthropic Python SDK (1.x) speaks MCP natively. The mcp_servers parameter on the Messages API lets the model see and call tools from one or more MCP servers without you wiring each tool by hand.


# agent.py — runs the orders_server.py above as a subprocess and lets Claude use it.
import anthropic, asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

client = anthropic.Anthropic()

async def main():
    server_params = StdioServerParameters(command="python", args=["orders_server.py"])

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools_resp = await session.list_tools()

            # Translate MCP tools into Anthropic tool schema.
            tools = [{
                "name": t.name,
                "description": t.description or "",
                "input_schema": t.inputSchema,
            } for t in tools_resp.tools]

            messages = [{"role": "user", "content": "Where is order A-482 and what's its carrier?"}]

            while True:
                resp = client.messages.create(
                    model="claude-opus-4-7",
                    max_tokens=1024,
                    tools=tools,
                    messages=messages,
                )
                messages.append({"role": "assistant", "content": resp.content})

                if resp.stop_reason == "end_turn":
                    print(resp.content[-1].text)
                    return

                tool_blocks = [b for b in resp.content if b.type == "tool_use"]
                tool_results = []
                for tb in tool_blocks:
                    result = await session.call_tool(tb.name, tb.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": tb.id,
                        "content": result.content[0].text if result.content else "",
                    })
                messages.append({"role": "user", "content": tool_results})

asyncio.run(main())

The same server works unchanged inside Claude Desktop — drop a config block into claude_desktop_config.json:


{
  "mcpServers": {
    "orders": {
      "command": "python",
      "args": ["/abs/path/to/orders_server.py"]
    }
  }
}

8. Function Calling vs MCP vs LangGraph Nodes

These three are not really competitors — they sit at different layers.

Aspect	Native Function Calling	MCP	LangGraph Nodes
Layer	Model API	Cross-app protocol	Application-level orchestration
Where tools live	In the agent process	In a separate server process (any language)	In the agent process as Python functions
Reuse across apps	Copy/paste schema and impl	Configure server URL/command once per app	None (LangGraph-specific)
State and branching	You write the loop	You write the loop	Built-in graph state, conditional edges, checkpointing
Best for	Simple agents with a fixed in-process tool set	Tools owned by a different team / used by multiple agents	Complex agents with branching, retries, human-in-the-loop
Composability	Low	High (point any client at any server)	Medium (within LangGraph)

In a real system you often use all three: function calling is the underlying transport, MCP servers expose your shared tool surface, and LangGraph (or your own loop) orchestrates the higher-level agent flow.

9. Production Pitfalls

Unbounded loops: every agent loop needs a hard step limit and a token budget. Models will happily call a tool forever if it keeps returning ambiguous results.
Tool descriptions are prompts: the model treats them as instructions. Spend time writing them. "Returns weather" is worse than "Returns current temperature in Celsius and a 3-day forecast for the given city. Use only when the user asks about weather."
Tool result size: a tool that dumps 50 KB of JSON will blow your context window in three calls. Truncate, summarize, or paginate inside the tool.
Determinism: log every (system, messages, tools, response) tuple. When a customer reports a bad answer you need to replay it exactly.
MCP transport choice: stdio for local dev and IDE integrations; HTTP+SSE for shared remote servers (and put OAuth in front of them).
Retry on tool errors, not on model errors: if a tool 500s, retry it. If the model returns malformed JSON, feed the validation error back as a tool result and let it self-correct rather than restarting the loop.

Common Interview Questions:

What is the ReAct pattern and why does it work?

ReAct interleaves Reasoning ("thought") and Acting ("tool call") steps in a single agent loop. The model emits a thought explaining what it wants to do next, calls a tool, receives the observation, and loops. It works because the visible chain-of-thought lets the model self-correct on observations, and the tool results stay grounded in real data instead of hallucinated reasoning. The downside is verbosity — you pay for the thoughts in tokens and latency — so modern function-calling models often run an implicit ReAct without exposing the thoughts.

How is MCP different from function calling?

Function calling is the model-side API: the model picks a tool from a list you provided and emits a JSON arguments object. MCP (Model Context Protocol) is the server-side standard for how those tools are discovered, authenticated, and invoked. With function calling alone, every integration is bespoke code in your app. With MCP, a tool server publishes its capabilities once and any MCP-aware client (Claude Desktop, Cursor, your own loop) can use it. Think of function calling as the model's mouth and MCP as the protocol the tools speak.

What stops an agent loop from running forever?

Three guardrails in production: a hard step counter (max 10–20 iterations), a token budget on the cumulative conversation, and a wall-clock timeout. You also need a "no-progress" detector — if the same tool is called with the same arguments twice in a row, abort. The model itself terminates by emitting a final assistant message with no tool calls, but you can't rely on that alone because failing tool calls or confused models will spin.

When would you reach for a multi-agent pattern instead of a single agent?

Multi-agent (orchestrator + specialists, or supervisor + workers) helps when the sub-tasks have genuinely different toolsets and prompts — e.g., a "researcher" with web search and a "coder" with a sandbox. The cost is more LLM calls, harder debugging, and context-passing overhead. For most problems a single agent with a well-curated tool list and clear system prompt outperforms a multi-agent setup, because every additional agent is another place a hallucination can compound. I default to single-agent until I can point at a specific failure mode it can't handle.

How do you handle a tool that returns 50 KB of JSON?

That blob will eat your context window in three calls. The fix is at the tool boundary, not in the model: paginate (return page 1 of N with a continuation token), summarize (run a small extraction pass server-side and return the structured result the agent actually needs), or filter (let the agent pass a JSONPath/jq expression as an argument). For search tools I cap result count at ~10 with snippets and a fetch_full(id) tool for drill-down. The principle: tool authors are responsible for token discipline, not the agent.

How do you make agent runs reproducible for debugging?

Log every (system_prompt, messages, tools, tool_choice, response) tuple to durable storage, plus the model name and version. Add a request_id that flows through every tool call so the trace stitches together. For deterministic replay set temperature=0 and top_p=1, but accept that even then frontier models drift slightly across versions. The killer feature is being able to re-run a customer-reported bad answer against the exact prompt+tools state — without that you're guessing. I treat the trace store like an APM system: structured, indexed, and queryable.