An "agent" in 2026 is just a loop: the model proposes a tool call, your code executes it, you feed the result back, repeat until the model emits a final answer. The interesting questions are which loop (ReAct, planning, multi-agent), which tools (function-calling vs MCP servers vs LangGraph nodes), and where the state lives.
This page covers the agent loop itself, how the Model Context Protocol (MCP) is changing how tools are wired in, and a working example of an MCP server plus an Anthropic-API agent that uses it.
Strip away the framework noise and an agent is this:
# Pseudocode for the agent loop common to every framework.
messages = [{"role": "user", "content": user_request}]
while True:
response = llm.create(model=MODEL, tools=TOOLS, messages=messages)
messages.append(response.message)
if response.stop_reason == "end_turn":
return response.message.content # final answer
# otherwise the model emitted one or more tool_use blocks
for tool_use in response.tool_uses:
result = TOOL_REGISTRY[tool_use.name](**tool_use.input)
messages.append({
"role": "user",
"content": [{"type": "tool_result",
"tool_use_id": tool_use.id,
"content": str(result)}],
})
Everything else — ReAct prompting, planners, multi-agent supervisors, MCP — is a way to make this loop more reliable, more observable, or more reusable.
ReAct (Yao et al., 2022) interleaves a free-text "thought" with each action so the model commits to a rationale before calling a tool. With native tool-use APIs, the "thought" is now usually rolled into the model's hidden reasoning (Claude's extended_thinking, OpenAI's o-series internal CoT). On models without thinking, you still get a measurable lift by asking for it explicitly.
import anthropic
client = anthropic.Anthropic()
SYSTEM = """You are a research agent. For every step, output a block
with: (1) what you know, (2) what's missing, (3) which tool you'll call and why.
Then emit the tool call. After the tool returns, repeat until you can answer."""
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
system=SYSTEM,
tools=[
{"name": "web_search",
"description": "Search the public web. Returns top 5 snippets.",
"input_schema": {"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]}},
],
messages=[{"role": "user", "content": "Who won the 2025 Turing Award and for what?"}],
)
Two patterns dominate:
# Plan-and-execute skeleton.
plan = llm.create(
model=MODEL,
system="Decompose the user task into a numbered list of atomic subtasks. "
"Each subtask must be completable with a single tool call.",
messages=[{"role": "user", "content": user_request}],
).text
subtasks = parse_numbered_list(plan)
results = []
for step in subtasks:
out = run_react_loop(step, tools=TOOLS) # short reactive loop per subtask
results.append(out)
final = llm.create(
model=MODEL,
system="Synthesize the subtask results into a final answer.",
messages=[{"role": "user", "content": json.dumps({"task": user_request,
"results": results})}],
).text
Rule of thumb: reactive for tasks under ~5 steps, planning for anything longer or where step ordering matters (e.g., refactoring across files).
Multi-agent systems split work across specialized agents (researcher, coder, critic) coordinated by a supervisor. Useful when subtasks need different system prompts, different tool sets, or different models (e.g., a small fast model for retrieval, a larger model for synthesis).
# Minimal supervisor pattern. Each "agent" is a function that takes a task
# and returns a result; the supervisor LLM decides who to dispatch to.
AGENTS = {
"researcher": lambda q: run_react_loop(q, tools=[WEB_SEARCH, FETCH_URL]),
"coder": lambda q: run_react_loop(q, tools=[READ_FILE, WRITE_FILE, RUN_TESTS]),
"critic": lambda q: llm.create(system="Find flaws in the answer.",
messages=[{"role": "user", "content": q}]).text,
}
def supervise(user_request):
state = {"task": user_request, "history": []}
for _ in range(MAX_TURNS):
decision = llm.create(
model="claude-opus-4-7",
system="You route work to: researcher, coder, critic, or DONE. "
"Reply with JSON: {agent, instructions} or {agent: 'DONE', answer}.",
messages=[{"role": "user", "content": json.dumps(state)}],
)
d = json.loads(decision.text)
if d["agent"] == "DONE":
return d["answer"]
result = AGENTS[d["agent"]](d["instructions"])
state["history"].append({"agent": d["agent"], "result": result})
Multi-agent setups consume tokens fast. Always set MAX_TURNS and a per-agent budget. The supervisor is the most failure-prone component — keep its prompt short and its decision space small.
MCP (introduced by Anthropic in late 2024, broadly adopted across providers in 2025) is an open JSON-RPC protocol that standardizes how an LLM application discovers and invokes external capabilities. It defines three primitives:
The split that matters in practice: an MCP server wraps a tool surface (Slack, Postgres, your internal API) once, and any MCP client (Claude Desktop, Cursor, Zed, your own agent) can use it without per-app integration code. You stop hand-wiring function definitions into every agent — you point the agent at the server.
Why this matters in production:
fastmcp is the high-level Python SDK. The decorators turn ordinary functions into MCP tools, complete with JSON-Schema generated from type hints.
pip install fastmcp anthropic
# orders_server.py — run with: python orders_server.py
from fastmcp import FastMCP
mcp = FastMCP("orders")
# Pretend in-memory order DB.
ORDERS = {
"A-482": {"status": "in_transit", "eta": "2026-04-29", "carrier": "UPS"},
"A-501": {"status": "delivered", "eta": "2026-04-22", "carrier": "FedEx"},
}
@mcp.tool()
def get_order_status(order_id: str) -> dict:
"""Look up the shipping status of a customer order by ID."""
if order_id not in ORDERS:
return {"error": f"order {order_id} not found"}
return ORDERS[order_id]
@mcp.tool()
def list_orders(status: str | None = None) -> list[dict]:
"""List orders, optionally filtered by status (in_transit, delivered, returned)."""
items = [{"id": oid, **o} for oid, o in ORDERS.items()]
if status:
items = [o for o in items if o["status"] == status]
return items
@mcp.resource("orders://schema")
def schema() -> str:
"""Return the order record schema as Markdown — readable as an MCP resource."""
return "# Order schema\n- id: str\n- status: in_transit | delivered | returned\n- eta: ISO date\n- carrier: str"
if __name__ == "__main__":
mcp.run() # defaults to stdio transport
Run it with the MCP inspector to verify the tools list:
npx @modelcontextprotocol/inspector python orders_server.py
# opens a browser UI showing tools, resources, and a tester
The Anthropic Python SDK (1.x) speaks MCP natively. The mcp_servers parameter on the Messages API lets the model see and call tools from one or more MCP servers without you wiring each tool by hand.
# agent.py — runs the orders_server.py above as a subprocess and lets Claude use it.
import anthropic, asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
client = anthropic.Anthropic()
async def main():
server_params = StdioServerParameters(command="python", args=["orders_server.py"])
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
tools_resp = await session.list_tools()
# Translate MCP tools into Anthropic tool schema.
tools = [{
"name": t.name,
"description": t.description or "",
"input_schema": t.inputSchema,
} for t in tools_resp.tools]
messages = [{"role": "user", "content": "Where is order A-482 and what's its carrier?"}]
while True:
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=tools,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
print(resp.content[-1].text)
return
tool_blocks = [b for b in resp.content if b.type == "tool_use"]
tool_results = []
for tb in tool_blocks:
result = await session.call_tool(tb.name, tb.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": tb.id,
"content": result.content[0].text if result.content else "",
})
messages.append({"role": "user", "content": tool_results})
asyncio.run(main())
The same server works unchanged inside Claude Desktop — drop a config block into claude_desktop_config.json:
{
"mcpServers": {
"orders": {
"command": "python",
"args": ["/abs/path/to/orders_server.py"]
}
}
}
These three are not really competitors — they sit at different layers.
| Aspect | Native Function Calling | MCP | LangGraph Nodes |
|---|---|---|---|
| Layer | Model API | Cross-app protocol | Application-level orchestration |
| Where tools live | In the agent process | In a separate server process (any language) | In the agent process as Python functions |
| Reuse across apps | Copy/paste schema and impl | Configure server URL/command once per app | None (LangGraph-specific) |
| State and branching | You write the loop | You write the loop | Built-in graph state, conditional edges, checkpointing |
| Best for | Simple agents with a fixed in-process tool set | Tools owned by a different team / used by multiple agents | Complex agents with branching, retries, human-in-the-loop |
| Composability | Low | High (point any client at any server) | Medium (within LangGraph) |
In a real system you often use all three: function calling is the underlying transport, MCP servers expose your shared tool surface, and LangGraph (or your own loop) orchestrates the higher-level agent flow.
ReAct interleaves Reasoning ("thought") and Acting ("tool call") steps in a single agent loop. The model emits a thought explaining what it wants to do next, calls a tool, receives the observation, and loops. It works because the visible chain-of-thought lets the model self-correct on observations, and the tool results stay grounded in real data instead of hallucinated reasoning. The downside is verbosity — you pay for the thoughts in tokens and latency — so modern function-calling models often run an implicit ReAct without exposing the thoughts.
Function calling is the model-side API: the model picks a tool from a list you provided and emits a JSON arguments object. MCP (Model Context Protocol) is the server-side standard for how those tools are discovered, authenticated, and invoked. With function calling alone, every integration is bespoke code in your app. With MCP, a tool server publishes its capabilities once and any MCP-aware client (Claude Desktop, Cursor, your own loop) can use it. Think of function calling as the model's mouth and MCP as the protocol the tools speak.
Three guardrails in production: a hard step counter (max 10–20 iterations), a token budget on the cumulative conversation, and a wall-clock timeout. You also need a "no-progress" detector — if the same tool is called with the same arguments twice in a row, abort. The model itself terminates by emitting a final assistant message with no tool calls, but you can't rely on that alone because failing tool calls or confused models will spin.
Multi-agent (orchestrator + specialists, or supervisor + workers) helps when the sub-tasks have genuinely different toolsets and prompts — e.g., a "researcher" with web search and a "coder" with a sandbox. The cost is more LLM calls, harder debugging, and context-passing overhead. For most problems a single agent with a well-curated tool list and clear system prompt outperforms a multi-agent setup, because every additional agent is another place a hallucination can compound. I default to single-agent until I can point at a specific failure mode it can't handle.
That blob will eat your context window in three calls. The fix is at the tool boundary, not in the model: paginate (return page 1 of N with a continuation token), summarize (run a small extraction pass server-side and return the structured result the agent actually needs), or filter (let the agent pass a JSONPath/jq expression as an argument). For search tools I cap result count at ~10 with snippets and a fetch_full(id) tool for drill-down. The principle: tool authors are responsible for token discipline, not the agent.
Log every (system_prompt, messages, tools, tool_choice, response) tuple to durable storage, plus the model name and version. Add a request_id that flows through every tool call so the trace stitches together. For deterministic replay set temperature=0 and top_p=1, but accept that even then frontier models drift slightly across versions. The killer feature is being able to re-run a customer-reported bad answer against the exact prompt+tools state — without that you're guessing. I treat the trace store like an APM system: structured, indexed, and queryable.