MCP Server
This blog post explains MCP Server and how to set it up using FastMCP. Additionally, it demonstrates how to create a simple AI Agent client that connects to the MCP server and utilizes the exposed tools.
π What is MCP Server?
In simple terms, MCP Server is like a tool API server for AI agents.
Instead of keeping tools (functions like βsearchβ, βread fileβ, βadd numbersβ) inside your agent code, you host them on a server. Any MCP-compatible client can connect to that server and call the tools.
π‘ The Basic Idea
- Client/Agent: The app that chats with the model (Claude Desktop, Cursor, Copilot, or your own Python app).
- Model: The LLM that decides what to do (answer directly vs call a tool).
- MCP Server: The server that actually runs tools and returns results.
π Request/Response Flow
- The client constructs the tool call request (tool name + JSON parameters).
- Sends it to the MCP Server (typically as a POST request).
- The server executes the tool.
- Returns the tool output back to the client.
The Model Context Protocol (MCP) standardizes the request and response format so tools can be reused across clients.
β Why Use MCP Server?
In general, you can write Agentic workflows without worrying about MCP Server, but each framework (OpenAI, Gemini, Anthropic) has its own framework-specific requirements. If you have a tool, you have to write separate code or customize for each framework.
MCP Server abstracts all that away and provides a unified interface to work with any framework. So it becomes plug and play for any framework you want to use.
π οΈ How Does MCP Server Work?
Now here is where we usually get confused: what is the client?
π± What Counts as a Client?
A client is anything that can connect to an MCP Server and ask it to run tools.
- Existing MCP clients: Claude Desktop, Cursor, Copilot (these already know how to talk MCP).
- Your own client: A custom app built using an agent framework (like the Python client below).
π Switching (Swapping) Between Clients
This is the main benefit of MCP: the server/tools stay the same, and you can swap the client.
To swap clients, you usually only need:
- The MCP Server URL (for
streamable-http, e.g.,http://localhost:8000/mcp). - The MCP transport type (here:
streamable-http). - The same tool schema exposed by the server (your
@mcp.toolfunctions).
Then:
- If youβre moving from the Python client β Claude Desktop/Cursor/Copilot: Configure that app to connect to your MCP Server URL.
- If youβre moving from Claude Desktop/Cursor/Copilot β your Python client: Keep the server running and point your Python client to the same URL.
The tool list will be discovered from the server, so you donβt rewrite tools for each client.
βοΈ Setting Up MCP Server
I am using FastMCP here instead of the official MCP Server implementation because itβs easier to set up and use.
π¦ Install FastMCP
You can install FastMCP using pip:
1 | pip install fastmcp |
π Create a Simple MCP Server
File: mcp_server.py
π¦ Understanding Transport Modes
streamable-http: Deploy MCP server as a separate service over network. Multiple clients can access the same server over HTTP. Use this for production deployments.stdio: For local clients running in the same environment. The client starts the MCP server as a subprocess. Use this for development and testing.
In this example, we use streamable-http for network-based communication.
ποΈ Server Initialization
Initialize the FastMCP server instance with a name.
1 | from fastmcp import FastMCP |
π οΈ Tool Definitions
Define the tools that will be exposed to clients. Each tool is a Python function decorated with @mcp.tool and includes type hints and docstrings for clarity.
Each function here represents a tool that the MCP server exposes. Clients can call these tools by name, passing the required parameters. The description in the docstring helps clients understand what each tool does.
Example: If the user asks something like βadd 5 and 3β, the model can decide to call the
addtool with{ "x": 5, "y": 3 }, and then use the returned result to respond.
1 |
|
To run the server:
1 | python mcp_server.py |
You can also inspect the available tools by running fastmcp dev mcp_server.py and it will start an inspector server at http://127.0.0.1:6274. You will see the token attached to the URL in the terminal. Basically, here you can test the tools directly from the browser, given the input parameters.
π€ MCP Client Example
This is a standalone AI Agent client that connects to an MCP server. The client uses OpenAI (or Ollama) as the language model and the MCP server provides the tools.
π Prerequisites
Install required libraries:
1 | pip install openai openai-agents rich |
π Using Free Models with Ollama (Optional)
Instead of using the paid OpenAI API, use free local models with Ollama for development and cost-free experimentation.
π₯ Install Ollama
- Download and install Ollama from ollama.ai.
- Start the Ollama service.
- Pull a model:
ollama pull llama3.2.
Visit the Ollama Library to see all available models.
β οΈ Important Note on Function Calling
Not all models support function calling. Choose models that support function calling for best results. This example uses
llama3.2.
If the agent fails to use tools, switch to larger models available in Ollama, or use the OpenAI API.
π» Client Code
File: mcp_client.py
βοΈ Configuration & Imports
Configure connection settings and import required libraries.
- OpenAI (Paid): Better performance, faster responses.
- Ollama (Free): Local models, no API key required.
This example uses Ollama. To use OpenAI, uncomment Option 1 and comment out Option 2.
Note: If you want to swap this custom Python client with another MCP client (Claude Desktop/Cursor/Copilot), keep the MCP Server running and reuse the same
MCP_SERVER_URL. Only the client changes.
I am using gpt-5-nano for OpenAI for better function calling support. You can experiment with other models as well using Ollama. I tried with llama3.2 but multiple tool calls were not working well, so switched to OpenAIβs gpt-5-nano.
1 | import asyncio |
ποΈ Resource Initialization
Initialize the MCP server connection and the language model client.
- MCP Server Connection: Uses Streamable HTTP transport to connect to the server.
- OpenAI Client: Works with both OpenAI and Ollama APIs (OpenAI-compatible).
- Model: Initialized using
OpenAIChatCompletionsModel.
1 | async def initialize_agent() -> tuple[MCPServerStreamableHttp, OpenAIChatCompletionsModel]: |
π¬ Chat Turn Handler
Manages a single conversation turn with the agent.
- Runner: Orchestrator that manages agent execution, including tool calls and response generation. Handles complexity like single or multiple tool calls and handoffs between tools. Itβs a wrapper that manages the agentβs execution flow.
- Result: Once the runner completes, extract the final output and return it.
1 | async def chat_turn( |
ποΈ Main Function & Agent Creation
Create the agent with the model, MCP server connection, and instructions.
- Agent Class: Defines the agentβs behavior including name, model, connected MCP servers, model settings, and instructions.
- MCP Server Connection: Passed to the agent, allowing it to utilize the tools registered on the MCP server.
- Internal Conversion: All MCP tools are converted to function calling format that the model understands.
1 | async def main() -> None: |
π Summary
- You may think tools are just small functions LLMs can do by default, but in reality, you can design complex tools that encapsulate business logic, access databases, call external APIs, and more. MCP Server allows you to centralize these tools and make them accessible to any MCP-compatible client.
- Now here if you ask
add 2 and 4as it can call theaddtool on the MCP server, get the result6, and respond back. But you can do more complex operations by chaining tool calls and these actions are handled by the Runner internally.
1 | You: add 56 and 3546.9 then multiply by 3.2 and subtract 10 |
π Inspecting Tool Calls
- If you check the MCP server logs, you can check the order of tool calls made by the agent. The order in which you asked agent to perform operations is preserved in the tool calls.
- Also you get much more control over the result as you can the floating point precision in the tool implementations.