1. Project/Product Overview
| Properties | Content |
|---|---|
| Project Name | Codebase Memory MCP |
| GitHub | DeusData/codebase-memory-mcp |
| Organization | DeusData (Germany) |
| License | MIT |
| Current Version | v0.8.1 (Released 2026-06-12) |
| Language implementation | Pure C (Rewritten from Go to C for v0.5.0) |
| Architecture form | Monostatic binary MCP protocol server |
| Supported programming languages | 158 types(tree-sitter syntax parsing); 9 types Support Hybrid LSP semantic type parsing |
| LSP Language | Python, TypeScript/JavaScript/JSX/TSX, Go, C, C, C#, PHP, Java, Kotlin, Rust |
| Number of MCP Tools | 14 |
| Supported Agents | Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, Kiro(11) |
| Platform | macOS (arm64/amd64), Linux (arm64/amd64), Windows (amd64) |
| Dependency | Zero runtime dependency(SQLite is compiled into binary, tree-sitter syntax library is vendored) |
| Security Certification | OpenSSF Scorecard, SLSA Level 3, 70 AV Scan Zero Check Out, Digital Signature SHA-256 |
| Academic Papers | arXiv:2603.27277 - Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP |
| Test | 5604 test cases, 16 large open source warehouse end-to-end index verification |
| Installation method | curl one-click script, Homebrew, npm, PyPI, Scoop, Winget, Chocolatey, AUR, Nix |
| Release package size | About 35-37MB (compressed package); About 70 MB with UI |
| Data Storage | SQLite (WAL mode), persistent to '~/.cache/codebase-memory-mcp/' |
Project Positioning
Codebase Memory MCP is not yet another code search tool, nor is it a code assistant embedded in the LLM. It is a purely structured code intelligence backend-it is only responsible for building and querying the code knowledge graph, leaving the "understanding code" task to the AI Agent(MCP Client). This division of labor is architecturally clear:
开发者: "ProcessOrder 函数被谁调用?"
→ AI Agent 调用 MCP 工具: trace_path(function_name="ProcessOrder", direction="inbound")
→ Codebase Memory MCP: 执行图查询,返回结构化结果
→ AI Agent: 用自然语言呈现调用链
This design avoids the additional API Key, additional cost and additional model configuration complexity caused by "built-in LLM doing NL→Query translation" common in other similar tools.
2. What does it mostly do?
Codebase Memory MCP to build the entire code base as a multi-relational knowledge graph, and then expose 14 MCP tools to AI Agent for structured and semantic queries:
2.1 graph analysis and structured understanding
| Capabilities | MCP Tools | Description |
|---|---|---|
| Architecture Overview | 'get_architecture' | One call returns language distribution, package structure, entry point, route, hotspot code, boundary, layering, and clustering (Leiden community detection) |
| Call Chain Tracing | trace_path/trace_call_path | Support inbound/outbound bidirectional tracing, depth control, cross-file and cross-package parsing |
| Impact Analysis | detect_changes | Git Not Commit Change → Affected Symbol Mapping Risk Level Classification |
| Dead code detection | Query by Cypher | Zero caller function discovery (excluding entry points) |
| ADR Management | 'manage_adr' | Cross-session persistence schema decision record |
| Leiden community detection | Integrated in get_architecture | Function module clustering based on call graph, second-level calculation on kernel-level large graph |
2.2 search ability
| Search by | MCP Tools | Technical Implementation |
|---|---|---|
| Semantic Search | 'semantic_query' | Based on Nomic nomic-embed-code embedding (768-dimensional int8), built-in comprehensive score of 11 signals (TF-IDF, RRI, AST profile, data stream, Halstead-lite, MinHash, graph diffusion, etc.) |
| BM25 Full Text Search | Integrated in Search Tools | SQLite FTS5 'cbm_camel_split Word Breaker' (hump/underscore aware) |
| Structured Search | 'search_graph' | Regular Match Name Pattern, Tag Filter, Ingress/Ingress Range, File Range Limit |
| Code Search | 'search_code' | Map Enhanced grep, Search Indexed Files Only |
| Cypher Query | 'Cypher' | query_graph-like Syntax: 'MATCH (f:Function)-[:CALLS]->(g) WHERE f.name = 'main' RETURN g.name' |
2.3 Cross-Service Links
-HTTP Route Matching: Route↔Call point matching with confidence score
-gRPC/GraphQL / tRPC: Service detection protobuf Route extraction
-Channel detection: Socket.IO, EventEmitter, and general pub-sub modes are detected by 'EMITS'/LISTENS_ON (covering 8 languages)
2.4 cross-warehouse intelligence
-'CROSS_* 'edge across multiple warehouse link nodes (in the same SQLite store)
-Multi-galaxy 3D UI layout for cross-warehouse architecture visualization
-Cross-warehouse architecture summary to consolidate services, routes, and dependencies for all indexed warehouses
2.5 Infrastructure as Code (IaC) Index
-Dockerfiles → graph node
-Kubernetes manifests → 'Resource' node
-Kustomize overlays → 'Module' node 'IMPORTS' edge
-Helm charts → template/Chart.yaml Dependent Edge
-HCL (Terraform) → Block label merged into node name
2.6 14 types of edge (relationship)
CALLS', 'IMPORTS', 'DEFINES', 'IMPLEMENTS', 'INHERITS', HTTP_CALLS, ASYNC_CALLS, EMITS', LISTENS_ON, DATA_FLOWS (including parameter mapping field access chain), SIMILAR_TO (MinHash LSH approximate code clone detection), SEMANTICALLY_RELATED (vocabulary but semantic mismatch, same language, score ≥ 0.80)
2.7 Optional Graph UI
-Built-in 3D interactive visualization (Three.js self-developed HTTP server)
-'localhost:9749 'access
-Runs in parallel with the MCP server as a background thread
2.8 Team Shared Figure Artifacts
-Single file '.codebase-memory/graph.db.zst' submitted to repository
-Format: SQLite → de-index → VACUUM INTO compression → zstd compression (compression ratio 8-13:1)
-Double gear quality: Best('zstd -9 ', manual index) and Fast('zstd -3',watcher incremental update)
-team members directly decompress and import after clone, skip full reindexing, and only perform incremental indexing.
-'. gitattributs' automatically adds 'merge = ours' to eliminate binary artifact merge conflicts
-Optional: if you don't submit it, each person will have their own full index.
3. Applicable Scenario
3.1 Core Scenario: Understanding the Large Code Base in AI-Assisted Programming
Problem: When AI Agent (such as Claude Code) faces a large code base, each query needs to read a large number of files and execute grep multiple times. Token consumption is extremely large and there is no sense of global structure. Solution : It takes 3-6 seconds (or 3 minutes for large warehouses) to build a knowledge graph, and then all structural queries are completed in <1ms through graph queries, reducing Token consumption by 99.2.
3.2 Typical Applicable Scenarios List
| Scenario | Description | Representative Customer Portrait |
|---|
Large single warehouse (Monorepo), multi-language, multi-service, complex call chain, Internet factory, financial technology platform.
| Legacy system maintenance and migration | Lack of documentation, high turnover, need to quickly understand the code structure | Banking/Insurance/Government IT systems |
| Code Review & Security Audit | Impact Analysis, Dead Code Detection, Call Chain Security Audit | Security Team, Compliance Department |
| New people get started/knowledge inheritance | New members quickly understand the project architecture | Medium and large teams |
| Microservice Architecture Governance | Cross-service route matching, dependency analysis, and Channel detection | Cloud Native team |
| Multi-warehouse architecture understanding | Create global view across warehouse 'CROSS_* '| Platform engineering team |
| CI/CD Integration | detect_changes enables accurate change impact analysis | DevOps/Platform Engineering |
| AI Coding Agent Infrastructure | As a code understanding base for all coding agents | AI Platform Team |
| Code Refactoring | Dead Code Detection, Call Chain Tracing, Similar Code Clone Detection | Architect/Advanced Development |
3.3 Language Coverage
The tree-sitter parsing layer for 158 languages covers almost all major and niche programming languages. Hybrid LSP(9 languages) provides deeper type-aware analysis-especially for enterprise-level mainstay languages such as Go, Python, TypeScript, Java, C/C, C#, PHP, Kotlin, and Rust.
Benchmark data shows Tier 1(≥ 90%) languages include: Lua, Kotlin, C ++, Perl, Objective-C, Groovy, C, Bash, Zig, Swift;Tier 2(75-89%) languages include: Python, TypeScript, TSX, Go, Rust, Java, R, Dart, JavaScript, Erlang, Elixir, Scala, Ruby, PHP, C#.
4. Not quite the scene
| Scenario | Reason | Alternative Suggestions |
|---|---|---|
| Very small project (< 10 files) | The cost of building a knowledge graph is not cost-effective. You can directly let the Agent read the files. | You can directly use the built-in search/read of the Agent without tools. |
| Plain text tasks that do not involve understanding the code structure | Such as document translation and README generation | Agent native capabilities |
| Workflow that does not require Agent AI | For example, code navigation in pure IDE-traditional IDE already has LSP to provide jump and reference lookup | IDE built-in LSP(VS Code / IntelliJ) |
| Unstructured code base (such as configuration confusion, large amount of dynamically generated code) | Index quality depends on AST parsing quality, file parsing with syntax error is degraded | Clean code base first |
| Offline environment with extreme security requirements | Although local full processing is required, the installation process requires access to GitHub Release; Enterprises need intranet distribution | Binary distribution can be done through internal product warehouse |
| Requires 100% accurate call graph ** | The call graph of dynamic language (Python/JS/PHP) depends on the type inference of Hybrid LSP, and there is a certain loss of precision (Benchmark Q10 attribute query some languages return null) | Combined with runtime Profiling tools |
5. Core Competence List
5.1 performance indicators
| Indicator | Value | Description | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Linux kernel full index | 3 minutes ** | 28M LOC, 75K file → 4.81M node, 7.72M edge | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linux kernel fast index | 1 min 12 sec | 1.88M nodes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Django full index | ~ 6 seconds | 49K nodes, 196K edges | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Cypher Query Response | < 1ms | Relational Traversal | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Dead code detection | ~ 150ms | Full image scan + degree filtering | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Call path tracing (depth = 5) | < 10ms | BFS traversal | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Token Efficiency | 99.2 Reduction * | 5 Structured Queries Consumption ~ 3,400 tokens vs File-by-File Search ~ 412,000 tokens |
5.2 Paper Evaluation Results
The results of the evaluation in the arXiv paper on 31 real warehouses:
- Answer Quality :83% (compared with 92% for file-by-file search)
-Token consumption: reduced by 10 times
-Tool calls: Decrease 2.1 times
-graph native query(hub detection, caller ranking): equal or exceed file search in 19 of 31 languages
5.3 index pipeline
文件发现(gitignore 感知)
→ Tree-sitter AST 解析(158 种语言语法)
→ 多阶段提取流水线(定义、调用、导入、使用、HTTP 路由等)
→ 包/模块解析(package.json, go.mod, Cargo.toml 等 10+ 清单文件)
→ Hybrid LSP 语义解析(9 种语言,类型感知调用解析)
→ RAM-first 流水线(LZ4 压缩 → 内存 SQLite → 单次 dump → 释放内存)
5.4 operation and maintenance characteristics
-Automatic synchronization: background file monitoring (based on git), changes are automatically re-indexed
-Auto Index: The index is automatically triggered for the first MCP connection after 'config set auto_index true'
-Self-update:'codebase-memory-mcp update' one-click update
-Uninstall clean: 'Uninstall' codebase-memory-mcp removes all Agent configurations, skills, hooks, and commands
-CLI mode: supports direct command line calls, such as 'codebase-memory-mcp cli search_graph'{"name_pattern": ". Handler."}''
-cgroup awareness: the CBM_WORKERS environment variable controls the degree of parallelism and adapts to containerized deployment.
5.5 Security and Trust
-100% local processing: the code never leaves the native
-No network call: embedded embedding model, no API Key, no Ollama, no Docker
-Release per version:SHA-256 signature +70 + antivirus engine scan (all 0 checked out)
-SLSA Level 3 Build Proof
-Self-built HTTP server (since v0.8.1): only binding 127.0.0.1, strict HTTP/1.1 parsing, hard request limit
6. Architecture/deployment/integration approach
6.1 project source code structure
src/
├── foundation/ Arena 分配器、哈希表、字符串工具、平台兼容
├── store/ SQLite 图存储(WAL 模式、FTS5)
├── cypher/ Cypher 查询 → SQL 转换
├── mcp/ MCP 服务器(JSON-RPC 2.0 over stdio,14 个工具)
├── pipeline/ 多阶段索引流水线
├── httplink.c HTTP 路由提取(Go/Express/Laravel/Ktor/Python 等框架)
├── discover/ 文件发现(gitignore 感知)
├── watcher/ Git 后台自动同步
├── cli/ CLI 子命令(install, update, uninstall, config)
├── ui/ 图可视化 HTTP 服务器(自研 httpd.c)
internal/cbm/ Tree-sitter AST 提取(158 种语言,vendored C 语法)
vendored/ sqlite3, yyjson, mimalloc, xxhash, tree-sitter
graph-ui/ React/Three.js 可视化前端
tests/ 所有 C 测试文件(5604 个)
6.2 Deployment Architecture
┌─────────────────────────────────────────┐
│ 开发者机器 │
│ ┌───────────┐ ┌────────────────────┐ │
│ │ AI Agent │◄──►│ codebase-memory-mcp│ │
│ │(MCP Client)│ │ (MCP Server) │ │
│ │ │ │ - stdio transport │ │
│ │ Claude Code│ │ - 14 tools │ │
│ │ Codex CLI │ │ - SQLite 图存储 │ │
│ │ Gemini CLI │ │ - 文件监控 │ │
│ │ VS Code │ │ - 3D Graph UI │ │
│ │ ... │ │ (:9749 可选) │ │
│ └───────────┘ └────────┬───────────┘ │
│ │ │
│ ~/.cache/ │
│ codebase-memory-mcp/ │
│ (SQLite 数据库) │
└─────────────────────────────────────────┘
6.3 Installation and Integration
One-click installation (macOS / Linux):
# 标准版
curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash
# 带 3D 可视化 UI
curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash -s -- --ui
Windows(PowerShell):
Invoke-WebRequest -Uri https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.ps1 -OutFile install.ps1
.\install.ps1
Package Manager Installation:
# Homebrew
brew install codebase-memory-mcp
# npm
npx codebase-memory-mcp install
# PyPI
uvx codebase-memory-mcp install
# Arch Linux (AUR)
yay -S codebase-memory-mcp-bin
Setup Auto: Detect all installed encoding agents (Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, Kiro) and configure MCP server entries, instruction files, skills, and pre-tool hooks for them.
6.4 Configuration Key
# 启用自动索引(首次连接时自动索引新项目)
codebase-memory-mcp config set auto_index true
# 自动索引文件上限(默认 50000)
codebase-memory-mcp config set auto_index_limit 100000
# 日志级别
export CBM_LOG_LEVEL=debug
# 并行 Workers 数量(容器化环境)
export CBM_WORKERS=4
6.5 build from source code
git clone https://github.com/DeusData/codebase-memory-mcp.git
cd codebase-memory-mcp
git config core.hooksPath scripts/hooks # 激活 pre-commit 安全检查
scripts/build.sh
# 输出: build/c/codebase-memory-mcp
Prerequisites: C compiler (gcc/clang), make, zlib.
How to use #7.
7.1 developer daily workflow
Step 1: Install and start
After installation, restart your AI coding agent (such as Claude Code). The Agent automatically connects to the MCP server.
Step 2: Index Items
Directly to the Agent: " Index this project ".
Or explicitly call:
codebase-memory-mcp cli index_repository '{"path": "/path/to/project"}'
Step 3: Start asking questions (example)
-"What is the overall architecture of this project?" → Agent calls "get_architecture'
-"Who is the ProcessOrder function called? What is the call chain?" → Agent calls 'trace_path/trace_call_path'
-"Is there a function that has never been called?" → Agent uses Cypher to do dead code detection.
-"What modules will be affected by modifying 'auth.go'?" → Agent calls detect_changes'
-"Look for functions similar to' sendNotification'" → Agent uses semantic_query'
-"List all HTTP REST endpoints" → Agent uses 'search_graph(label = "Route")'
7.2 CLI stand-alone use
You can also query directly in the terminal using the CLI mode:
# 搜索函数
codebase-memory-mcp cli search_graph '{"name_pattern": ".*Handler.*"}'
# Cypher 查询
codebase-memory-mcp cli query_graph '{"query": "MATCH (f:Function)-[:CALLS]->(g) WHERE f.name = \"main\" RETURN g.name"}'
# 获取架构
codebase-memory-mcp cli get_architecture '{}'
7.3 Team Shared Workflow
# 开发者 A:索引后导出团队共享工件
# (索引时自动生成 .codebase-memory/graph.db.zst)
# 提交到仓库
git add .codebase-memory/graph.db.zst .gitattributes
git commit -m "更新代码知识图谱工件"
# 开发者 B:clone 后直接使用
# codebase-memory-mcp 检测到 graph.db.zst → 解压导入 → 仅跑增量索引8. What can I say before sales
8.1 Core Value Proposition (Elevator Pitch)
"when your development team uses AI Agent to assist in programming, does it find that Agent is often" lost "in the face of large code bases? Frequent file reading, repeated grep,Token consumption is amazing, but the answer quality is unstable? Codebase Memory MCP takes 3 minutes to build a knowledge map for your code base, after which all code structure queries can be completed in milliseconds, and Token consumption is reduced by 99%. It is a 35MB static binary file, no Docker, no API Key, and the code never leaves your machine-it can be used after installation."
8.2 customer communication skills (by role)
To CTO/Technical VP:
"This is an infrastructure investment, not another tool. It allows your AI coding Agent to change from "blind man touching the elephant" to "bird's eye view of the overall situation", directly improving R & D efficiency."
-"MIT Open Source, Zero Vendor Lock. Maintained by the German team, with academic papers and SLSA Level 3 build proofs."
-"It has been verified on the Linux kernel (28 million lines of code), and your code base will not be larger than it."
For architects:
-"Leiden community detection algorithm can automatically discover the functional module boundaries in the code base to help verify or correct your existing architecture division."
-"Support for cross-service HTTP/gRPC routing automatic matching and cross-warehouse dependency analysis, microservice governance weapon."
-"Support ADR (Architecture Decision Record) management, which can persist architecture knowledge between AI sessions."
For Security/Compliance Officer:
-"100 percent local processing, code never leaves your machine. No external API is required."
-"Each Release is scanned by 70 + antivirus engines SHA-256 signature verification."
-"There is a dedicated SECURITY.md and responsible disclosure process."
To Developer Team Lead:
-"The time for new people to start large projects can be shortened from weeks to days-let the AI Agent directly 'understand' the entire code structure before answering questions."
-"Git changes → affected symbols are automatically mapped, and change impact analysis can be done in Pre-commit or CI."
-"Support 11 mainstream AI Agents, which your team can use."
8.3 Horizontal Comparison Advantage
| Compare dimensions | Codebase Memory MCP | Legacy grep/ripgrep | IDE LSP (single file) | Other code map tools (e. g. Sourcegraph) |
|---|---|---|---|---|
| Cross file call graph | ✅Auto Build | ❌Manual series connection required | ⚠️ Single Language | ✅ |
| Query speed | < 1ms (graph query) | Second level (full-warehouse search) | Instant (limited to open files) | Dependent index |
| Token Efficiency | Excellent (Structured Output) | Poor (Full Text) | N/A | Medium |
| Multi-language support | 158 | Common text | Single language per LSP | Limited |
| Offline/Local | ✅ | ✅ | ✅ | mostly SaaS |
| Deployment complexity | Single-binary, zero-dependency | Single-binary | Complex (per language installation LSP) | Server required |
| Integration with AI Agent | Native MCP | Agent self-assembly required | MCP bridging required | Custom integration required |
| Open Source | MIT | Different tools | Different tools | Most of them are commercial |
8.4 ROI Calculation Ideas
Consider a 10-person development team:
-Each person asks AI Agent 20 code-related questions every day.
-Using grep/file-read method, each question consumes an average of 4000 tokens → 800000 tokens per day
-Average 170 tokens/questions using Codebase Memory MCP → 34000 tokens per day
-Token saves 95.8, priced at Claude Sonnet at about $3/M input tokens, saves about $2.3 per day, ~$580 per person per year
-More importantly, answer quality improvement and developer time savings-reduce incorrect answers and follow-up costs due to insufficient context of the Agent
9. Frequently Asked Customer Questions
Q1: Is the data safe? Is the code uploaded to the outside?
A: It is absolutely safe. 100 percent of all processing is done locally. The code never leaves your machine. The built-in semantic search uses the Nomic embedding model compiled into binary and does not require any external API. Each Release is scanned and SHA-256 signed by 70 + antivirus engines.
Q2: How long does indexing take? Does it affect the development experience?
A: Small and medium-sized projects (such as Django) about 6 seconds; large projects (Linux kernel 28 million lines) about 3 minutes. The memory is released back to the operating system after indexing is complete. Subsequent file changes are automatically incrementally indexed through background monitoring, with almost no sense. You can also enable automatic indexing, which is automatically triggered on the first connection.
Q3: What is the difference between" Go to Definition "and VS Code/IntelliJ?
a: IDE's code navigation is based on single-file/single-language LSP, which mainly serves "people jump in the editor". Codebase Memory MCP builds a cross-file and cross-language global knowledge graph, specifically for AI Agent's semantic understanding service -- Agent can query "the complete call chain of this function", "list of all HTTP endpoints", and "code similar to this module" at one time without dozens of file-by-file jumps.
Q4: Are the 158 supported languages of the same quality?
A: High quality Tier 1(≥ 90%,17 languages) including C/C ++, Kotlin, Lua, Groovy, Swift, Zig, etc. Tier 2(75-89%,16) includes mainstream languages such as Python, TypeScript, Go, Rust, Java, PHP, C#, etc. Nine of these languages have Hybrid LSP deep type resolution. Haskell and OCaml are currently Tier 3(<75%) but continue to improve.
Q5: Can I index multiple warehouses at the same time? Can I do analysis across warehouses?
A: Yes. The same SQLite store can hold multiple indexed repositories. Cross-warehouse nodes are linked by 'CROSS_* 'edges. The 3D visualization UI also supports multi-galaxy layouts to showcase cross-warehouse architectures.
Q6: What is the cost?
A:MIT open source, completely free. No Enterprise Edition, no SaaS subscription fees. The only "cost" is the computational resources (CPU memory) at the time of installation and indexing.
Q7: What permissions do I need?
A: Read your code base (build index) and write to the Agent configuration file (automatically configure MCP entries during installation). No network access required, no sudo required.
Q8: How to update?
a: 'codebase-memory-mcp update' a key to complete. The server also automatically checks for updates when it starts.
10. PoC Recommendations
10.1 PoC Target
Verify the (1) indexing speed, (2) query accuracy, (3) integration experience with existing AI agents, and (4) token saving effect of the Codebase Memory MCP on the customer's real code base.
10.2 PoC scheme (1-2 weeks recommended)
Phase 1: Preparation and Erect (1 day)
- Select a representative code repository (recommended: multilingual,> 1000 files, microservices or modular structure, such as the customer's core business repository)
- Install Codebase Memory MCP on developer machine (integration with customer's existing AI Agent)
- Run the first full index and record the index time, number of nodes, and number of edges.
Phase 2: Functional Verification (2-3 days)
Prepare a Question Set(10-15 questions) covering:
-D1 (Definition/API Discovery):"List all REST API endpoints"/"Where is the project entry point?"
-D2 (relationship/call graph):"the complete call chain of the order process?"/"which packages does the auth module depend on?"
-D3 (Precision Search):"Found the implementation of 'PaymentService.process"
-D4 (Architecture/Structure):"What is the layered architecture of this project and what are the core modules?"
-D5 (cross/semantic):"Is there a function similar to 'sendEmail' but with a different name?"
Answer the same set of questions in two ways:
-Control group: pure AI Agent(grep/read/Glob mode)
-Experimental group:AI Agent Codebase Memory MCP
Comparison records:
-Token consumption
-Answer accuracy (manual score 1-5)
-Answer time
-Number of tool invocations
Phase 3: Deep Scene Verification (2-3 days)
Select 2-3 depth scenes according to the actual pain points of the customer:
-If it is microservice governance → verify that the 'get_architecture HTTP route matches the cross-service call chain.
-If it is legacy system maintenance → verify dead code detection call chain tracking impact analysis ('detect_changes)
-If it is multi-warehouse management → Verify cross-store'CROSS_* 'side multi-warehouse architecture view
-If it is a team collaboration → validation'. codebase-memory/graph.db.zst' artifact sharing process
Phase 4: Summary Report (1 day)
Output a PoC report containing:
-Index performance metrics
-Comparison of accuracy of question answers
-Token savings ratio
-Developer experience feedback
-Recommended team promotion path
10.3 PoC Success Criteria
| Indicator | Target value |
|---|---|
| Index completion time | <0.1 seconds per file of customer code base files |
| structure query answer accuracy | ≥ 80% (manual scoring) |
| Token savings vs pure Agent | ≥ 80% |
| Developer's subjective satisfaction | ≥ 4/5 |
| Install to first availability | < 10 minutes |
10.4 PoC Considerations
- The first index takes some time : do a good job of expected management, emphasizing "one index, long-term benefits"
- The effect of dynamic language is slightly lower than that of static language : The call graph accuracy of Python/JS/PHP depends on the type inference of Hybrid LSP. It is recommended that PoC also include warehouses of static languages such as Go/TypeScript/Java
- Large warehouse index memory :Linux kernel level index requires about several GB of memory to ensure that PoC machine memory is sufficient (16GB is recommended)
- Does not replace code review: graph analysis is auxiliary and should not be used as the only basis for safety/quality
- File Monitoring Overhead: The performance of background git monitoring on the Windows requires additional attention.
11. Risks and Considerations
11.1 Technology Risk
| Risk | Severity | Mitigation |
|---|---|---|
| Insufficient precision of dynamic language call graph | Medium | Explicitly inform customers of differences; PoC covers static + dynamic languages; Pay attention to continuous improvement of Hybrid LSP (Java/Kotlin/Rust support has been added in v0.8.0) |
| Index large warehouse memory peak | Medium | v0.8.0 optimized: streaming SQLite dump, string interning, allocator page reclaim;cgroup perception + 'CBM_WORKERS' adjustable |
| C ++ template code parsing stability | low | v0.8.0 has solved a large number of C ++ template code crashes (10 + issue from the community),16 large warehouse end-to-end verification passed |
| Code synchronization delay | Low | The background git watcher detects changes in real time, and the incremental index is usually <1 second |
| Non-mainstream language parsing quality | Low | OCaml/Haskell, etc. is Tier 3(62-72%), which has little impact on major corporate languages |
11.2 Commercial/Ecological Risk
| Risk | Severity | Mitigation |
|---|---|---|
| The project relies on a single maintainer | medium | observing community activity: v0.8.0 has 16 community contributors and 40 issue reporters; 863 commits, active Release rhythm |
| MCP Protocol Evolution | Low | MCP is already an industry standard protocol. The project is compatible with MCP JSON-RPC 2.0 and has been listed in the official MCP Registry and Glama directory |
| Competition appears | Medium | Currently, there is no direct competition that can reach the same level of speed, language coverage, zero deployment complexity; continuous attention required |
| Customer maintenance concerns for C language projects | Low | Pure C is the source of its performance advantages; compiling to a single static binary means that customers do not need to maintain C code |
| No Commercial Support (MIT Open Source) | Medium | Customers need to be clearly informed; For enterprise customers who need SLA support, they need to evaluate whether there is a third-party support plan |
11.3 Security Risks
-Code data security: 100 local processing has been confirmed, and the code will not leave the machine✅
-Binary Supply Chain Security:SLSA Level 3 Build Proof + SHA-256 Signature + VirusTotal Scan✅
-The graph artifact 'graph.db.zst' is submitted to the repository: contains code structure information (function name and call relationship), but does not contain source code content; Sensitive code libraries can be configured with '.gitignore' not submitted
-Graph UI: Only 127.0.0.1 is bound and not exposed to the public.✅
12. My Pre-Sales Judgment
12.1 Overall Assessment:★★★★★Strongly recommended
Codebase Memory MCP is a truly differentiated product. It is not another shallow wrapper of "AI code", but solves a real and common pain point at the AI Agent infrastructure layer: context explosion and lack of structure awareness when AI Agents face large code bases. It works in three dimensions:
- Extreme Performance : Pure C Writing RAM-first Pipeline LZ4 Compression → Linux Kernel 3-minute Index
- Simple and extreme : single binary file, zero dependency, a curl command installation
- Open Extreme :MIT Open Source, MCP Standard Protocol, Support for 11 Agents, 158 Languages
12.2 who should try it right away?
-All teams that are using AI coding agents: This is the most effective "Agent Code Understanding Accelerator" currently known"
-Teams managing large code bases (> 1000 files): Revenue is proportional to code base size
-Multi-language/multi-service architecture team: Cross-language and cross-service global knowledge graph cannot be provided by single-language LSP.
-Organizations with strict requirements for security and data privacy: local processing is the core selling point
12.3 who needs to wait and see?
-Team with a very small code base (< 100 file): It may be easier to directly let the Agent read the file.
-Teams not using AI coding agents: Traditional IDE navigation is sufficient
- Teams mainly using Tier 3 languages such as Haskell/OCaml : Parsing quality needs to be improved
-Customers who need commercial SLA support: Pure community-driven MIT project, no official commercial support
12.4 Competitive Landscape
Similar projects in the current market (such as graphify, Sourcegraph Cody, GitNexus, etc.) are not as good as Codebase Memory MCP in terms of open source, deployment complexity, and language coverage. The project has removed the comparison table with GitNexus in the README, indicating that it has sufficient confidence in its competitiveness.
12.5 Pre-Sales Action Recommendations
- Priority 1: Recommend PoC to all customers with AI coding Agent usage scenarios (very low cost, one command can start)
- Priority 2: Create guidelines and best practice documents for Chinese culture to lower the threshold for Chinese customers.
- Priority 3 : Collect PoC data and establish a case base for Chinese customers (real data such as Token saving and index performance)
- Priority 4: Focus on project Roadmap, especially Hybrid LSP support and CI/CD integration capabilities in more languages
13. REFERENCE
-Academic Paper-arXiv:2603.27277
-Evaluation Plan (159 languages)
-MCP protocol specification: Model Context Protocol
-Tree-sitter: tree-sitter.github.io