Gemini OCR MCP Server

This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.

Objective

Extract the text from the following image:

CAPTCHA

and convert it to plain text, e.g., fbVk

Features

File-based OCR: Extract text directly from an image file on your local system.
Base64 OCR: Extract text from a base64 encoded image string.
Easy to Use: Exposes OCR functionality as simple tools in an MCP server.
Powered by Gemini: Utilizes Google's advanced Gemini models for high-accuracy text recognition.

Prerequisites

Python 3.8 or higher
A Google Gemini API Key. You can obtain one from Google AI Studio.

Setup and Installation

Clone the repository:

git clone https://github.com/WindoC/gemini-ocr-mcp
cd gemini-ocr-mcp

Create and activate a virtual environment:

# Install uv standalone if needed

## On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh

## On Windows.
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install the required dependencies:
```
uv sync
```

MCP Configuration Example

If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json.

Windows Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "x:\\path\\to\\your\\project\\gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Linux/macOS Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/your/project/gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Note: Remember to replace the placeholder paths with the absolute path to your project directory.

Tools Provided

`ocr_image_file`

Performs OCR on a local image file.

Parameter:image_file (string): The absolute or relative path to the image file.
Returns: (string) The extracted text from the image.

`ocr_image_base64`

Performs OCR on a base64 encoded image.

Parameter:base64_image (string): The base64 encoded string of the image.
Returns: (string) The extracted text from the image.

Server	Summary	Actions
Snowflake Cortex AI	This Snowflake MCP server provides tooling for Snowflake Cortex AI features, bringing these capabili...	View
Azure AHDS FHIR MCP Server	A Model Context Protocol (MCP) server implementation for Azure Health Data Services FHIR (Fast Healt...	View
MCP Prometheus	A comprehensive Model Context Protocol (MCP) server for Prometheus, written in Go.	View
AWS Bedrock KB Retrieval	MCP server for accessing Amazon Bedrock Knowledge Bases	View
Uberall MCP Server		View
IOL MCP Tool	A Model Context Protocol (MCP) tool for interacting with Invertir Online (IOL) API through Claude De...	View

Server

Summary

Actions

Snowflake Cortex AI

This Snowflake MCP server provides tooling for Snowflake Cortex AI features, bringing these capabili...

View

Azure AHDS FHIR MCP Server

A Model Context Protocol (MCP) server implementation for Azure Health Data Services FHIR (Fast Healt...

View

MCP Prometheus

A comprehensive Model Context Protocol (MCP) server for Prometheus, written in Go.

View