How to Create Model Context Protocol Server That Serves API

Last updated: March 16, 2026

Introduction
Prerequisites
Advanced Features
Troubleshooting

Introduction

As AI coding assistants become more sophisticated, they need better access to your API documentation to generate accurate code and provide intelligent responses. The Model Context Protocol (MCP) provides a standardized way for AI tools to interact with external services and data sources. By creating a MCP server that serves your API documentation, you enable AI assistants to understand your API’s structure, endpoints, authentication requirements, and response formats without manual context injection.

This guide walks through building a MCP server in Python that exposes your API documentation to AI tools. You’ll learn how to structure your server, parse documentation formats, and expose tools that AI assistants can query dynamically.

Prerequisites

Before building your MCP documentation server, ensure you have:

Python 3.10 or higher installed
An API with documentation (OpenAPI/Swagger, Postman collection, or custom format)
Basic familiarity with FastMCP or similar MCP frameworks

Step 1: Set Up Your Project

Start by creating a new Python project and installing the necessary dependencies:

mkdir api-docs-mcp-server
cd api-docs-mcp-server
uv venv
uv pip install fastmcp pydantic pyyaml httpx

Initialize your project structure:

api-docs-mcp-server/
├── main.py
├── docs/
│   └── openapi.yaml
└── requirements.txt

Step 2: Create the MCP Server

The core of your documentation server involves parsing your API documentation and exposing it through MCP tools. Here’s a complete implementation using FastMCP:

from fastmcp import FastMCP
from pydantic import BaseModel
from typing import Optional, List
import yaml
import httpx

mcp = FastMCP("API Documentation Server")

# Store parsed documentation in memory
api_docs = {}

class EndpointInfo(BaseModel):
    path: str
    method: str
    summary: str
    description: Optional[str] = None
    parameters: List[dict] = []
    request_body: Optional[dict] = None
    responses: dict = {}

def load_openapi_spec(path: str) -> dict:
    """Load and parse OpenAPI specification file."""
    with open(path, 'r') as f:
        return yaml.safe_load(f)

def parse_endpoints(spec: dict) -> List[EndpointInfo]:
    """Extract endpoint information from OpenAPI spec."""
    endpoints = []
    paths = spec.get('paths', {})

    for path, methods in paths.items():
        for method, details in methods.items():
            if method in ['get', 'post', 'put', 'delete', 'patch']:
                endpoint = EndpointInfo(
                    path=path,
                    method=method.upper(),
                    summary=details.get('summary', ''),
                    description=details.get('description'),
                    parameters=details.get('parameters', []),
                    request_body=details.get('requestBody'),
                    responses=details.get('responses', {})
                )
                endpoints.append(endpoint)

    return endpoints

@mcp.tool()
async def load_documentation(spec_path: str = "docs/openapi.yaml") -> str:
    """Load API documentation from OpenAPI specification file."""
    global api_docs
    spec = load_openapi_spec(spec_path)
    api_docs['spec'] = spec
    api_docs['endpoints'] = parse_endpoints(spec)
    api_docs['title'] = spec.get('info', {}).get('title', 'API')
    api_docs['version'] = spec.get('info', {}).get('version', '1.0.0')

    return f"Loaded documentation for {api_docs['title']} v{api_docs['version']} with {len(api_docs['endpoints'])} endpoints"

@mcp.tool()
async def get_endpoint(path: str, method: str) -> dict:
    """Get detailed information about a specific endpoint."""
    if not api_docs.get('endpoints'):
        return {"error": "No documentation loaded. Call load_documentation first."}

    for endpoint in api_docs['endpoints']:
        if endpoint.path == path and endpoint.method.upper() == method.upper():
            return {
                "path": endpoint.path,
                "method": endpoint.method,
                "summary": endpoint.summary,
                "description": endpoint.description,
                "parameters": endpoint.parameters,
                "request_body": endpoint.request_body,
                "responses": endpoint.responses
            }

    return {"error": f"Endpoint {method} {path} not found"}

@mcp.tool()
async def search_endpoints(query: str) -> List[dict]:
    """Search for endpoints matching a query string."""
    if not api_docs.get('endpoints'):
        return [{"error": "No documentation loaded. Call load_documentation first."}]

    query_lower = query.lower()
    results = []

    for endpoint in api_docs['endpoints']:
        if (query_lower in endpoint.path.lower() or
            query_lower in endpoint.summary.lower() or
            (endpoint.description and query_lower in endpoint.description.lower())):
            results.append({
                "path": endpoint.path,
                "method": endpoint.method,
                "summary": endpoint.summary
            })

    return results

@mcp.tool()
async def list_endpoints() -> List[dict]:
    """List all available endpoints."""
    if not api_docs.get('endpoints'):
        return [{"error": "No documentation loaded. Call load_documentation first."}]

    return [
        {"path": e.path, "method": e.method, "summary": e.summary}
        for e in api_docs['endpoints']
    ]

if __name__ == "__main__":
    mcp.run()

Step 3: Create Sample Documentation

Create a sample OpenAPI specification to test your server:

# docs/openapi.yaml
openapi: 3.0.0
info:
  title: Task Management API
  version: 1.0.0
  description: API for managing tasks and projects

paths:
  /tasks:
    get:
      summary: List all tasks
      description: Retrieve a paginated list of all tasks
      parameters:
        - name: page
          in: query
          schema:
            type: integer
            default: 1
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                type: object
                properties:
                  tasks:
                    type: array
                    items:
                      type: object
                      properties:
                        id:
                          type: string
                        title:
                          type: string
                        status:
                          type: string
                          enum: [pending, in_progress, completed]
    post:
      summary: Create a new task
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - title
              properties:
                title:
                  type: string
                description:
                  type: string
                priority:
                  type: string
                  enum: [low, medium, high]
      responses:
        '201':
          description: Task created successfully

  /tasks/{task_id}:
    get:
      summary: Get a specific task
      parameters:
        - name: task_id
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Successful response
        '404':
          description: Task not found
    delete:
      summary: Delete a task
      parameters:
        - name: task_id
          in: path
          required: true
          schema:
            type: string
      responses:
        '204':
          description: Task deleted successfully

Step 4: Run and Testing the Server

Start your MCP server:

python main.py

The server will start and listen for connections from AI tools. To test it, you can use a MCP-compatible client or manually invoke the tools:

import asyncio
from main import load_documentation, list_endpoints, get_endpoint, search_endpoints

async def test_server():
    # Load documentation
    result = await load_documentation("docs/openapi.yaml")
    print(result)

    # List all endpoints
    endpoints = await list_endpoints()
    print(f"Found {len(endpoints)} endpoints:")
    for ep in endpoints:
        print(f"  {ep['method']} {ep['path']} - {ep['summary']}")

    # Search for endpoints
    results = await search_endpoints("task")
    print(f"\nSearch results: {results}")

    # Get specific endpoint details
    details = await get_endpoint("/tasks", "GET")
    print(f"\nGET /tasks details: {details}")

asyncio.run(test_server())

Step 5: Integrate with AI Tools

Once your MCP server is running, configure your AI assistant to connect to it. Most AI coding tools support MCP through their configuration files:

{
  "mcpServers": {
    "api-docs": {
      "command": "python",
      "args": ["/path/to/api-docs-mcp-server/main.py"]
    }
  }
}

After configuration, your AI assistant can query your API documentation directly. For example:

“Show me all endpoints related to user authentication”
“What parameters does the POST /tasks endpoint accept?”
“How do I authenticate with the API?”

Advanced Features

To enhance your documentation server further, consider adding these capabilities:

Response Examples: Extract and expose example responses from your OpenAPI spec so AI tools can understand data structures.

Authentication Documentation: Parse security schemes and expose authentication requirements clearly.

Rate Limiting Info: Include rate limit headers and retry-after values in endpoint documentation.

Version Comparison: Support multiple API versions and allow querying differences between versions.

Step 6: Caching Documentation in Memory vs. Reloading on Change

Loading the OpenAPI spec on every tool call adds latency. But serving stale docs when the spec changes during development produces confusing AI suggestions. The right approach is to cache with optional hot-reload:

import asyncio
import hashlib
from pathlib import Path

class DocumentationCache:
    def __init__(self, spec_path: str):
        self.spec_path = Path(spec_path)
        self._cache: dict = {}
        self._last_hash: str = ""

    def _file_hash(self) -> str:
        return hashlib.md5(self.spec_path.read_bytes()).hexdigest()

    def get_or_reload(self) -> dict:
        current_hash = self._file_hash()
        if current_hash != self._last_hash:
            spec = load_openapi_spec(str(self.spec_path))
            self._cache = {
                "spec": spec,
                "endpoints": parse_endpoints(spec),
                "title": spec.get("info", {}).get("title", "API"),
                "version": spec.get("info", {}).get("version", "1.0.0"),
            }
            self._last_hash = current_hash
        return self._cache

This approach reloads automatically when the file changes without polling. When the MD5 matches the cached hash, it returns the cached result immediately. Development workflows benefit from the hot-reload; production deployments benefit from the cache hit performance.

Step 7: Exposing Authentication Schemes

AI assistants generate better API client code when they understand the authentication model. Parse the security schemes from your OpenAPI spec and expose them as a dedicated tool:

@mcp.tool()
async def get_auth_schemes() -> dict:
    """Get authentication requirements for the API."""
    if not api_docs.get("spec"):
        return {"error": "No documentation loaded."}

    spec = api_docs["spec"]
    security_schemes = spec.get("components", {}).get("securitySchemes", {})
    global_security = spec.get("security", [])

    result = {"global_requirements": global_security, "schemes": {}}

    for name, scheme in security_schemes.items():
        result["schemes"][name] = {
            "type": scheme.get("type"),
            "description": scheme.get("description", ""),
        }
        if scheme.get("type") == "apiKey":
            result["schemes"][name]["in"] = scheme.get("in")
            result["schemes"][name]["name"] = scheme.get("name")
        elif scheme.get("type") == "http":
            result["schemes"][name]["scheme"] = scheme.get("scheme")  # bearer, basic
        elif scheme.get("type") == "oauth2":
            result["schemes"][name]["flows"] = list(scheme.get("flows", {}).keys())

    return result

When an AI assistant calls get_auth_schemes() before generating code for a protected endpoint, it automatically includes correct Authorization headers or API key parameters rather than requiring you to specify authentication in every prompt.

Step 8: Supporting Multiple API Versions

APIs evolve. An MCP documentation server that only knows the current version can’t help when a client is pinned to an older version. Add multi-version support with a version registry:

# Store multiple spec versions
api_versions: dict[str, dict] = {}

@mcp.tool()
async def load_version(spec_path: str, version_label: str = "current") -> str:
    """Load a specific API version's documentation."""
    spec = load_openapi_spec(spec_path)
    api_versions[version_label] = {
        "spec": spec,
        "endpoints": parse_endpoints(spec),
        "title": spec.get("info", {}).get("title", "API"),
        "version": spec.get("info", {}).get("version"),
    }
    return f"Loaded {version_label}: {api_versions[version_label]['title']} v{api_versions[version_label]['version']}"

@mcp.tool()
async def list_versions() -> list[dict]:
    """List all loaded API versions."""
    return [
        {"label": label, "api_version": data["version"], "title": data["title"]}
        for label, data in api_versions.items()
    ]

@mcp.tool()
async def search_endpoints_by_version(query: str, version_label: str = "current") -> list[dict]:
    """Search endpoints in a specific API version."""
    if version_label not in api_versions:
        return [{"error": f"Version '{version_label}' not loaded."}]
    endpoints = api_versions[version_label]["endpoints"]
    query_lower = query.lower()
    return [
        {"path": e.path, "method": e.method, "summary": e.summary}
        for e in endpoints
        if query_lower in e.path.lower() or query_lower in e.summary.lower()
    ]

Load both versions at startup and AI assistants can generate code targeting either, or compare endpoints across versions to identify breaking changes.

Troubleshooting

Configuration changes not taking effect

Restart the relevant service or application after making changes. Some settings require a full system reboot. Verify the configuration file path is correct and the syntax is valid.

Permission denied errors

Run the command with sudo for system-level operations, or check that your user account has the necessary permissions. On macOS, you may need to grant terminal access in System Settings > Privacy & Security.

Connection or network-related failures

Check your internet connection and firewall settings. If using a VPN, try disconnecting temporarily to isolate the issue. Verify that the target server or service is accessible from your network.

Frequently Asked Questions

How long does it take to create model context protocol server that serves api?

For a straightforward setup, expect 30 minutes to 2 hours depending on your familiarity with the tools involved. Complex configurations with custom requirements may take longer. Having your credentials and environment ready before starting saves significant time.

What are the most common mistakes to avoid?

The most frequent issues are skipping prerequisite steps, using outdated package versions, and not reading error messages carefully. Follow the steps in order, verify each one works before moving on, and check the official documentation if something behaves unexpectedly.

Do I need prior experience to follow this guide?

Basic familiarity with the relevant tools and command line is helpful but not strictly required. Each step is explained with context. If you get stuck, the official documentation for each tool covers fundamentals that may fill in knowledge gaps.

Is this approach secure enough for production?

The patterns shown here follow standard practices, but production deployments need additional hardening. Add rate limiting, input validation, proper secret management, and monitoring before going live. Consider a security review if your application handles sensitive user data.

Where can I get help if I run into issues?

Start with the official documentation for each tool mentioned. Stack Overflow and GitHub Issues are good next steps for specific error messages. Community forums and Discord servers for the relevant tools often have active members who can help with setup problems.

Built by theluckystrike — More at zovo.one

Table of Contents