CGM MCP Server
A Model Context Protocol (MCP) server implementation of CodeFuse-CGM (Code Graph Model), providing graph-integrated large language model capabilities for repository-level software engineering tasks.
🚀 Features
🎯 Two Deployment Modes
1. Full CGM Pipeline (with LLM integration)
- Repository-level Code Analysis: Analyze entire codebases using graph-based representations
- Issue Resolution: Automatically generate code patches to fix bugs and implement features
- Four-Stage Pipeline: Rewriter → Retriever → Reranker → Reader architecture
- Multi-LLM Support: Works with OpenAI, Anthropic, Ollama, LM Studio
2. Model-agnostic Tools (pure analysis, no LLM required) ⭐
- Pure Code Analysis: Extract code structure without LLM dependencies
- Universal Integration: Works with ANY AI model or IDE
- No API Keys Required: Zero external dependencies
- High Performance: Cached analysis results for speed
⚡ Performance & GPU Acceleration
Multi-Platform GPU Support 🎯
- Apple Silicon (M1/M2/M3): Native MPS acceleration (42x cache speedup!)
- NVIDIA GPU: Full CUDA support with cuPy integration
- AMD GPU: ROCm (Linux) and DirectML (Windows) support
- CPU Fallback: Automatic fallback ensures universal compatibility
Advanced Caching System 🗄️
- Multi-level Caching: TTL cache (1hr) + LRU cache (500 entries) + AST cache
- Smart Cache Keys: MD5-based cache keys for efficient lookups
- Memory Management: Real-time monitoring with automatic cleanup
- Performance Stats: Detailed hit/miss ratios and timing metrics
Concurrent Processing 🔄
- Async File I/O: Non-blocking file operations with aiofiles
- Batch Processing: Concurrent analysis of multiple files
- GPU-Accelerated: Entity matching and text processing on GPU
- Intelligent Scheduling: Semaphore-controlled concurrency limits
🔧 Common Features
- MCP Integration: Compatible with Claude Desktop, VS Code, Cursor, and other MCP clients
- Graph-based Context: Leverages code structure and relationships for better understanding
- Multiple Output Formats: Structured JSON, Markdown, and Prompt formats
- Real-time Monitoring: GPU usage, memory consumption, and performance metrics
📋 Table of Contents
- Installation
- Quick Start
- Performance & GPU Setup
- Configuration
- Usage
- Architecture
- API Reference
- Examples
- Performance Monitoring
- Contributing
- License
🛠 Installation
Prerequisites
- Python 3.8+
- pip or conda
Install from Source
# Clone the repository
git clone https://github.com/your-org/cgm-mcp.git
cd cgm-mcp
# Install dependencies
pip install -r requirements.txt
# Or install in development mode
pip install -e .
GPU Acceleration Setup (Optional)
🍎 Apple Silicon (M1/M2/M3) - Automatic
# No additional setup needed!
# MPS (Metal Performance Shaders) is automatically detected and enabled
pip install torch torchvision torchaudio # Usually already installed
🟢 NVIDIA GPU
# Install CUDA-enabled PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu118
# Optional: Enhanced GPU features
pip install cupy-cuda11x # For CUDA 11.x
# or
pip install cupy-cuda12x # For CUDA 12.x
🔴 AMD GPU
# Linux (ROCm)
pip install torch --index-url https://download.pytorch.org/whl/rocm5.6
# Windows (DirectML)
pip install torch-directml
Install from PyPI (Coming Soon)
pip install cgm-mcp
⚡ Quick Start
1. Setup Environment
# Run setup script
./scripts/setup.sh
# Copy example environment file
cp .env.example .env
2. Choose Your Model Provider
Option A: Use Cloud Models (OpenAI/Anthropic)
# Edit .env with your API keys
export CGM_LLM_PROVIDER=openai
export CGM_LLM_API_KEY=your-openai-api-key
export CGM_LLM_MODEL=gpt-4
Option B: Use Local Models (Recommended)
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
# Download recommended model
ollama pull deepseek-coder:6.7b
# Start with local model
./scripts/start_local.sh --provider ollama --model deepseek-coder:6.7b
Option C: Use LM Studio
# Download and start LM Studio
# Load deepseek-coder-6.7b-instruct model
# Start local server
# Start CGM with LM Studio
./scripts/start_local.sh --provider lmstudio
3. Start the Server
# Start MCP server (cloud models)
python main.py
# Start with local models
./scripts/start_local.sh
# Or with custom config
python main.py --config config.local.json --log-level DEBUG
4. Test with Example
# Run example usage
python examples/example_usage.py
# Check GPU acceleration status
python check_gpu_dependencies.py
⚡ Performance & GPU Setup
🔍 Check Your GPU Status
# Run the GPU dependency checker
python check_gpu_dependencies.py
Expected output for Apple Silicon:
🎉 OPTIMAL: Apple Silicon GPU acceleration is active!
• MPS backend enabled
• No additional dependencies needed
• CuPy warnings can be ignored
📊 Performance Benchmarks
Platform | Entity Matching | Text Processing | Cache Hit Rate |
---|---|---|---|
Apple Silicon (MPS) | 42x speedup (cached) | ~0.001s (200 files) | 95%+ |
NVIDIA CUDA | 5-10x speedup | 3-5x speedup | 90%+ |
AMD ROCm | 3-8x speedup | 2-4x speedup | 90%+ |
CPU Fallback | Baseline | Baseline | 85%+ |
🛠️ Performance Features
Smart Caching System
- TTL Cache: 1-hour expiration for analysis results
- LRU Cache: 500 most recent files kept in memory
- AST Cache: 200 parsed syntax trees cached
- Embedding Cache: GPU-accelerated similarity vectors
Memory Management
- Real-time Monitoring: Track GPU and system memory usage
- Automatic Cleanup: Clear caches when memory usage > 80%
- Unified Memory: Apple Silicon's shared CPU/GPU memory
- Memory Pools: Efficient GPU memory allocation
Concurrent Processing
- Async File I/O: Non-blocking file operations
- Batch Processing: Process multiple files simultaneously
- Semaphore Control: Limit concurrent operations (default: 10)
- GPU Queuing: Intelligent GPU task scheduling
🔧 Performance Tuning
Environment Variables
# GPU Configuration
export CGM_USE_GPU=true # Enable GPU acceleration
export CGM_GPU_BATCH_SIZE=1024 # Batch size for GPU operations
export CGM_SIMILARITY_THRESHOLD=0.1 # Entity similarity threshold
export CGM_CACHE_EMBEDDINGS=true # Cache embedding vectors
# Memory Management
export CGM_MAX_CACHE_SIZE=500 # Maximum cached files
export CGM_MEMORY_CLEANUP_THRESHOLD=80 # Memory cleanup trigger (%)
export CGM_GPU_MEMORY_FRACTION=0.8 # GPU memory usage limit
Configuration File
{
"gpu": {
"use_gpu": true,
"batch_size": 1024,
"max_sequence_length": 512,
"similarity_threshold": 0.1,
"cache_embeddings": true,
"gpu_memory_fraction": 0.8
},
"performance": {
"max_concurrent_files": 10,
"cache_ttl_seconds": 3600,
"max_file_cache_size": 500,
"memory_cleanup_threshold": 80
}
}
⚙️ Configuration
Environment Variables
Variable | Description | Default |
---|---|---|
CGM_LLM_PROVIDER | LLM provider (openai, anthropic, ollama, lmstudio, mock) | openai |
CGM_LLM_API_KEY | API key for LLM provider (not needed for local models) | Required for cloud |
CGM_LLM_MODEL | Model name | gpt-4 |
CGM_LLM_API_BASE | Custom API base URL (for local models) | Provider default |
CGM_LLM_TEMPERATURE | Generation temperature | 0.1 |
CGM_LOG_LEVEL | Logging level | INFO |
Configuration File
Create a config.json
file:
Cloud Models Configuration
{
"llm": {
"provider": "openai",
"model": "gpt-4",
"temperature": 0.1,
"max_tokens": 4000
}
}
Local Models Configuration
{
"llm": {
"provider": "ollama",
"model": "deepseek-coder:6.7b",
"api_base": "http://localhost:11434",
"temperature": 0.1,
"max_tokens": 4000
},
"graph": {
"max_nodes": 5000,
"max_edges": 25000,
"cache_enabled": true
},
"server": {
"log_level": "INFO",
"max_concurrent_tasks": 3
}
}
📖 Usage
MCP Tools
The server provides the following MCP tools:
Analysis Tools
cgm_analyze_repository
Analyze repository structure and extract code entities with GPU acceleration.
Parameters:
repository_path
: Path to the repositoryquery
: Search query for relevant codeanalysis_scope
: Scope of analysis (full
,focused
,minimal
)max_files
: Maximum number of files to analyze
cgm_get_file_content
Get detailed file content and analysis with concurrent processing.
Parameters:
repository_path
: Path to the repositoryfile_paths
: List of file paths to analyze
cgm_find_related_code
Find code entities related to a specific entity using GPU-accelerated similarity matching.
Parameters:
repository_path
: Path to the repositoryentity_name
: Name of the entity to find relations forrelation_types
: Types of relations to include (optional)
cgm_extract_context
Extract structured context for external model consumption.
Parameters:
repository_path
: Path to the repositoryquery
: Query for context extractionformat
: Output format (structured
,markdown
,prompt
)
Performance Tools
clear_gpu_cache
Clear GPU caches to free memory.
Parameters: None
Legacy Tools
cgm_process_issue
Process a repository issue using the CGM framework.
Parameters:
task_type
: Type of task (issue_resolution
,code_analysis
,bug_fixing
,feature_implementation
)repository_name
: Name of the repositoryissue_description
: Description of the issuerepository_context
: Optional repository context
Example:
{
"task_type": "issue_resolution",
"repository_name": "my-project",
"issue_description": "Authentication fails with special characters in password",
"repository_context": {
"path": "/path/to/repository",
"language": "Python",
"framework": "Django"
}
}
cgm_get_task_status
Get the status of a running task.
Parameters:
task_id
: Task ID to check
cgm_health_check
Check server health status.
MCP Resources
System Resources
cgm://health
: Server health informationcgm://tasks
: List of active tasks
Performance Resources
cgm://cache
: Cache statistics and hit/miss ratioscgm://performance
: Server performance and memory usage metricscgm://gpu
: GPU acceleration status and memory usage
Example Resource Access
# Check GPU status
curl "cgm://gpu"
# Monitor cache performance
curl "cgm://cache"
# View performance metrics
curl "cgm://performance"
🏗 Architecture
CGM follows a four-stage pipeline:
graph LR
A[Issue] --> B[Rewriter]
B --> C[Retriever]
C --> D[Reranker]
D --> E[Reader]
E --> F[Code Patches]
G[Code Graph] --> C
G --> D
G --> E
Components
- Rewriter: Analyzes issues and extracts relevant entities and keywords
- Retriever: Locates relevant code subgraphs based on extracted information
- Reranker: Ranks files by relevance to focus analysis
- Reader: Generates specific code patches to resolve issues
Graph Builder
Constructs repository-level code graphs by analyzing:
- File structure and dependencies
- Class and function definitions
- Import relationships
- Code semantics and documentation
📚 API Reference
Core Models
CGMRequest
class CGMRequest(BaseModel):
task_type: TaskType
repository_name: str
issue_description: str
repository_context: Optional[Dict[str, Any]] = None
CGMResponse
class CGMResponse(BaseModel):
task_id: str
status: str
rewriter_result: Optional[RewriterResponse]
retriever_result: Optional[RetrieverResponse]
reranker_result: Optional[RerankerResponse]
reader_result: Optional[ReaderResponse]
processing_time: float
CodePatch
class CodePatch(BaseModel):
file_path: str
original_code: str
modified_code: str
line_start: int
line_end: int
explanation: str
💡 Examples
Basic Issue Resolution
import asyncio
from cgm_mcp.server import CGMServer
from cgm_mcp.models import CGMRequest, TaskType
async def resolve_issue():
server = CGMServer(config)
request = CGMRequest(
task_type=TaskType.ISSUE_RESOLUTION,
repository_name="my-app",
issue_description="Login fails with special characters",
repository_context={"path": "./my-app"}
)
response = await server._process_issue(request.dict())
for patch in response.reader_result.patches:
print(f"File: {patch.file_path}")
print(f"Changes: {patch.explanation}")
Integration with Claude Desktop
Add to your Claude Desktop MCP configuration:
{
"mcpServers": {
"cgm": {
"command": "python",
"args": ["/path/to/cgm-mcp/main.py"],
"env": {
"CGM_LLM_API_KEY": "your-api-key"
}
}
}
}
📊 Performance Monitoring
Real-time Metrics
GPU Statistics
{
"memory": {
"gpu_available": true,
"platform": "Apple Silicon",
"backend": "Metal Performance Shaders",
"gpu_memory_allocated": 0.6
},
"performance": {
"gpu_entity_matches": 15,
"cache_hit_rate": 94.2
}
}
Cache Performance
{
"analysis_cache": {"size": 45, "maxsize": 100},
"file_cache": {"size": 234, "maxsize": 500},
"stats": {"hits": 156, "misses": 23, "hit_rate": 87.2}
}
Performance Testing
# Check GPU acceleration status
python check_gpu_dependencies.py
# Run performance tests
python gpu_verification.py
python test_multiplatform_gpu.py
🧪 Testing
# Run tests
pytest tests/
# Run with coverage
pytest tests/ --cov=cgm_mcp
# Run specific test
pytest tests/test_components.py::TestRewriterComponent
# Performance tests
python test_gpu_acceleration.py
python gpu_verification.py
🚀 Performance Features Summary
⚡ GPU Acceleration
- Apple Silicon: Native MPS support with 42x cache speedup
- NVIDIA GPU: Full CUDA support with cuPy integration
- AMD GPU: ROCm (Linux) and DirectML (Windows) support
- Auto-detection: Intelligent platform detection and fallback
🗄️ Smart Caching
- Multi-level: TTL (1hr) + LRU (500 files) + AST (200 trees)
- Intelligent: MD5-based cache keys with hit rate monitoring
- Memory-aware: Automatic cleanup at 80% memory usage
- Performance: 85-95% cache hit rates in production
🔄 Concurrent Processing
- Async I/O: Non-blocking file operations with aiofiles
- Batch Processing: Concurrent analysis of multiple files
- Semaphore Control: Configurable concurrency limits (default: 10)
- GPU Queuing: Intelligent GPU task scheduling
📊 Real-time Monitoring
- GPU Stats: Memory usage, platform detection, performance metrics
- Cache Analytics: Hit/miss ratios, size monitoring, cleanup events
- System Metrics: Memory usage, CPU utilization, processing times
- MCP Resources:
cgm://gpu
,cgm://cache
,cgm://performance
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- CodeFuse-CGM - Original CGM implementation
- PocketFlow - Framework inspiration
- Model Context Protocol - MCP specification
📞 Support
- 📧 Email: cgm-mcp@example.com
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
CGM MCP Server - Bringing graph-integrated code intelligence to your development workflow! 🚀