← Back to Project List
核心摘要 Codebase Memory MCP is the fastest AI programming intelligence engine on the market, driven by a static binary file written in pure C. It uses tree-sitter AST parsing and Hybrid LSP semantic type parsing to build any code base (158 languages) into a persistent knowledge graph (Knowledge Graph), and exposes 14 tools through MCP protocol for AI programming Agent to call. Core highlights: Linux kernel (28 million lines of code, 75000 files) 3 minutes to complete full index , structured query < 1ms response , compared with file-by-file search Token consumption reduced by 99.2 (about 120 times). Zero dependencies, zero API keys, zero Docker,macOS, Linux, and Windows platforms are ready to use. Eleven mainstream programming agents, such as Claude Code, Codex CLI, and Gemini CLI, are supported. MIT open source, with arXiv academic paper endorsement, v0.8.1 version through 5604 test cases and 70 + antivirus engine scan. Pre-sales positioning: AI the" knowledge graph middleware "of the programming auxiliary infrastructure layer to solve the problems of context window explosion and lack of structure awareness when AI agents understand large code bases.**

1. Project/Product Overview

PropertiesContent
Project NameCodebase Memory MCP
GitHubDeusData/codebase-memory-mcp
OrganizationDeusData (Germany)
LicenseMIT
Current Versionv0.8.1 (Released 2026-06-12)
Language implementationPure C (Rewritten from Go to C for v0.5.0)
Architecture formMonostatic binary MCP protocol server
Supported programming languages158 types(tree-sitter syntax parsing); 9 types Support Hybrid LSP semantic type parsing
LSP LanguagePython, TypeScript/JavaScript/JSX/TSX, Go, C, C, C#, PHP, Java, Kotlin, Rust
Number of MCP Tools14
Supported AgentsClaude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, Kiro(11)
PlatformmacOS (arm64/amd64), Linux (arm64/amd64), Windows (amd64)
DependencyZero runtime dependency(SQLite is compiled into binary, tree-sitter syntax library is vendored)
Security CertificationOpenSSF Scorecard, SLSA Level 3, 70 AV Scan Zero Check Out, Digital Signature SHA-256
Academic PapersarXiv:2603.27277 - Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP
Test5604 test cases, 16 large open source warehouse end-to-end index verification
Installation methodcurl one-click script, Homebrew, npm, PyPI, Scoop, Winget, Chocolatey, AUR, Nix
Release package sizeAbout 35-37MB (compressed package); About 70 MB with UI
Data StorageSQLite (WAL mode), persistent to '~/.cache/codebase-memory-mcp/'

Project Positioning

Codebase Memory MCP is not yet another code search tool, nor is it a code assistant embedded in the LLM. It is a purely structured code intelligence backend-it is only responsible for building and querying the code knowledge graph, leaving the "understanding code" task to the AI Agent(MCP Client). This division of labor is architecturally clear:

开发者: "ProcessOrder 函数被谁调用?"
  → AI Agent 调用 MCP 工具: trace_path(function_name="ProcessOrder", direction="inbound")
    → Codebase Memory MCP: 执行图查询,返回结构化结果
      → AI Agent: 用自然语言呈现调用链

This design avoids the additional API Key, additional cost and additional model configuration complexity caused by "built-in LLM doing NL→Query translation" common in other similar tools.

2. What does it mostly do?

Codebase Memory MCP to build the entire code base as a multi-relational knowledge graph, and then expose 14 MCP tools to AI Agent for structured and semantic queries:

2.1 graph analysis and structured understanding

CapabilitiesMCP ToolsDescription
Architecture Overview'get_architecture'One call returns language distribution, package structure, entry point, route, hotspot code, boundary, layering, and clustering (Leiden community detection)
Call Chain Tracingtrace_path/trace_call_pathSupport inbound/outbound bidirectional tracing, depth control, cross-file and cross-package parsing
Impact Analysisdetect_changesGit Not Commit Change → Affected Symbol Mapping Risk Level Classification
Dead code detectionQuery by CypherZero caller function discovery (excluding entry points)
ADR Management'manage_adr'Cross-session persistence schema decision record
Leiden community detectionIntegrated in get_architectureFunction module clustering based on call graph, second-level calculation on kernel-level large graph

2.2 search ability

Search byMCP ToolsTechnical Implementation
Semantic Search'semantic_query'Based on Nomic nomic-embed-code embedding (768-dimensional int8), built-in comprehensive score of 11 signals (TF-IDF, RRI, AST profile, data stream, Halstead-lite, MinHash, graph diffusion, etc.)
BM25 Full Text SearchIntegrated in Search ToolsSQLite FTS5 'cbm_camel_split Word Breaker' (hump/underscore aware)
Structured Search'search_graph'Regular Match Name Pattern, Tag Filter, Ingress/Ingress Range, File Range Limit
Code Search'search_code'Map Enhanced grep, Search Indexed Files Only
Cypher Query'Cypher'query_graph-like Syntax: 'MATCH (f:Function)-[:CALLS]->(g) WHERE f.name = 'main' RETURN g.name'

2.3 Cross-Service Links

-HTTP Route Matching: Route↔Call point matching with confidence score

-gRPC/GraphQL / tRPC: Service detection protobuf Route extraction

-Channel detection: Socket.IO, EventEmitter, and general pub-sub modes are detected by 'EMITS'/LISTENS_ON (covering 8 languages)

2.4 cross-warehouse intelligence

-'CROSS_* 'edge across multiple warehouse link nodes (in the same SQLite store)

-Multi-galaxy 3D UI layout for cross-warehouse architecture visualization

-Cross-warehouse architecture summary to consolidate services, routes, and dependencies for all indexed warehouses

2.5 Infrastructure as Code (IaC) Index

-Dockerfiles → graph node

-Kubernetes manifests → 'Resource' node

-Kustomize overlays → 'Module' node 'IMPORTS' edge

-Helm charts → template/Chart.yaml Dependent Edge

-HCL (Terraform) → Block label merged into node name

2.6 14 types of edge (relationship)

CALLS', 'IMPORTS', 'DEFINES', 'IMPLEMENTS', 'INHERITS', HTTP_CALLS, ASYNC_CALLS, EMITS', LISTENS_ON, DATA_FLOWS (including parameter mapping field access chain), SIMILAR_TO (MinHash LSH approximate code clone detection), SEMANTICALLY_RELATED (vocabulary but semantic mismatch, same language, score ≥ 0.80)

2.7 Optional Graph UI

-Built-in 3D interactive visualization (Three.js self-developed HTTP server)

-'localhost:9749 'access

-Runs in parallel with the MCP server as a background thread

2.8 Team Shared Figure Artifacts

-Single file '.codebase-memory/graph.db.zst' submitted to repository

-Format: SQLite → de-index → VACUUM INTO compression → zstd compression (compression ratio 8-13:1)

-Double gear quality: Best('zstd -9 ', manual index) and Fast('zstd -3',watcher incremental update)

-team members directly decompress and import after clone, skip full reindexing, and only perform incremental indexing.

-'. gitattributs' automatically adds 'merge = ours' to eliminate binary artifact merge conflicts

-Optional: if you don't submit it, each person will have their own full index.

3. Applicable Scenario

3.1 Core Scenario: Understanding the Large Code Base in AI-Assisted Programming

Problem: When AI Agent (such as Claude Code) faces a large code base, each query needs to read a large number of files and execute grep multiple times. Token consumption is extremely large and there is no sense of global structure. Solution : It takes 3-6 seconds (or 3 minutes for large warehouses) to build a knowledge graph, and then all structural queries are completed in <1ms through graph queries, reducing Token consumption by 99.2.

3.2 Typical Applicable Scenarios List

ScenarioDescriptionRepresentative Customer Portrait

Large single warehouse (Monorepo), multi-language, multi-service, complex call chain, Internet factory, financial technology platform.

| Legacy system maintenance and migration | Lack of documentation, high turnover, need to quickly understand the code structure | Banking/Insurance/Government IT systems |

| Code Review & Security Audit | Impact Analysis, Dead Code Detection, Call Chain Security Audit | Security Team, Compliance Department |

| New people get started/knowledge inheritance | New members quickly understand the project architecture | Medium and large teams |

| Microservice Architecture Governance | Cross-service route matching, dependency analysis, and Channel detection | Cloud Native team |

| Multi-warehouse architecture understanding | Create global view across warehouse 'CROSS_* '| Platform engineering team |

| CI/CD Integration | detect_changes enables accurate change impact analysis | DevOps/Platform Engineering |

| AI Coding Agent Infrastructure | As a code understanding base for all coding agents | AI Platform Team |

| Code Refactoring | Dead Code Detection, Call Chain Tracing, Similar Code Clone Detection | Architect/Advanced Development |

3.3 Language Coverage

The tree-sitter parsing layer for 158 languages covers almost all major and niche programming languages. Hybrid LSP(9 languages) provides deeper type-aware analysis-especially for enterprise-level mainstay languages such as Go, Python, TypeScript, Java, C/C, C#, PHP, Kotlin, and Rust.

Benchmark data shows Tier 1(≥ 90%) languages include: Lua, Kotlin, C ++, Perl, Objective-C, Groovy, C, Bash, Zig, Swift;Tier 2(75-89%) languages include: Python, TypeScript, TSX, Go, Rust, Java, R, Dart, JavaScript, Erlang, Elixir, Scala, Ruby, PHP, C#.

4. Not quite the scene

ScenarioReasonAlternative Suggestions
Very small project (< 10 files)The cost of building a knowledge graph is not cost-effective. You can directly let the Agent read the files.You can directly use the built-in search/read of the Agent without tools.
Plain text tasks that do not involve understanding the code structureSuch as document translation and README generationAgent native capabilities
Workflow that does not require Agent AIFor example, code navigation in pure IDE-traditional IDE already has LSP to provide jump and reference lookupIDE built-in LSP(VS Code / IntelliJ)
Unstructured code base (such as configuration confusion, large amount of dynamically generated code)Index quality depends on AST parsing quality, file parsing with syntax error is degradedClean code base first
Offline environment with extreme security requirementsAlthough local full processing is required, the installation process requires access to GitHub Release; Enterprises need intranet distributionBinary distribution can be done through internal product warehouse
Requires 100% accurate call graph **The call graph of dynamic language (Python/JS/PHP) depends on the type inference of Hybrid LSP, and there is a certain loss of precision (Benchmark Q10 attribute query some languages return null)Combined with runtime Profiling tools

5. Core Competence List

5.1 performance indicators

IndicatorValueDescription
Linux kernel full index 3 minutes **28M LOC, 75K file → 4.81M node, 7.72M edge
Linux kernel fast index1 min 12 sec1.88M nodes
Django full index~ 6 seconds49K nodes, 196K edges
Cypher Query Response< 1msRelational Traversal
Dead code detection~ 150msFull image scan + degree filtering
Call path tracing (depth = 5)< 10msBFS traversal
**Token Efficiency 99.2 Reduction *5 Structured Queries Consumption ~ 3,400 tokens vs File-by-File Search ~ 412,000 tokens

5.2 Paper Evaluation Results

The results of the evaluation in the arXiv paper on 31 real warehouses:

- Answer Quality :83% (compared with 92% for file-by-file search)

-Token consumption: reduced by 10 times

-Tool calls: Decrease 2.1 times

-graph native query(hub detection, caller ranking): equal or exceed file search in 19 of 31 languages

5.3 index pipeline

文件发现(gitignore 感知)
  → Tree-sitter AST 解析(158 种语言语法)
    → 多阶段提取流水线(定义、调用、导入、使用、HTTP 路由等)
      → 包/模块解析(package.json, go.mod, Cargo.toml 等 10+ 清单文件)
        → Hybrid LSP 语义解析(9 种语言,类型感知调用解析)
          → RAM-first 流水线(LZ4 压缩 → 内存 SQLite → 单次 dump → 释放内存)

5.4 operation and maintenance characteristics

-Automatic synchronization: background file monitoring (based on git), changes are automatically re-indexed

-Auto Index: The index is automatically triggered for the first MCP connection after 'config set auto_index true'

-Self-update:'codebase-memory-mcp update' one-click update

-Uninstall clean: 'Uninstall' codebase-memory-mcp removes all Agent configurations, skills, hooks, and commands

-CLI mode: supports direct command line calls, such as 'codebase-memory-mcp cli search_graph'{"name_pattern": ". Handler."}''

-cgroup awareness: the CBM_WORKERS environment variable controls the degree of parallelism and adapts to containerized deployment.

5.5 Security and Trust

-100% local processing: the code never leaves the native

-No network call: embedded embedding model, no API Key, no Ollama, no Docker

-Release per version:SHA-256 signature +70 + antivirus engine scan (all 0 checked out)

-SLSA Level 3 Build Proof

-Self-built HTTP server (since v0.8.1): only binding 127.0.0.1, strict HTTP/1.1 parsing, hard request limit

6. Architecture/deployment/integration approach

6.1 project source code structure

src/
├── foundation/      Arena 分配器、哈希表、字符串工具、平台兼容
├── store/           SQLite 图存储(WAL 模式、FTS5)
├── cypher/          Cypher 查询 → SQL 转换
├── mcp/             MCP 服务器(JSON-RPC 2.0 over stdio,14 个工具)
├── pipeline/        多阶段索引流水线
├── httplink.c       HTTP 路由提取(Go/Express/Laravel/Ktor/Python 等框架)
├── discover/        文件发现(gitignore 感知)
├── watcher/         Git 后台自动同步
├── cli/             CLI 子命令(install, update, uninstall, config)
├── ui/              图可视化 HTTP 服务器(自研 httpd.c)
internal/cbm/        Tree-sitter AST 提取(158 种语言,vendored C 语法)
vendored/            sqlite3, yyjson, mimalloc, xxhash, tree-sitter
graph-ui/            React/Three.js 可视化前端
tests/               所有 C 测试文件(5604 个)

6.2 Deployment Architecture

┌─────────────────────────────────────────┐
│            开发者机器                      │
│  ┌───────────┐    ┌────────────────────┐ │
│  │ AI Agent  │◄──►│ codebase-memory-mcp│ │
│  │(MCP Client)│    │  (MCP Server)      │ │
│  │           │    │  - stdio transport  │ │
│  │ Claude Code│   │  - 14 tools         │ │
│  │ Codex CLI  │   │  - SQLite 图存储    │ │
│  │ Gemini CLI │   │  - 文件监控         │ │
│  │ VS Code    │   │  - 3D Graph UI     │ │
│  │ ...        │    │    (:9749 可选)    │ │
│  └───────────┘    └────────┬───────────┘ │
│                            │              │
│                    ~/.cache/              │
│                    codebase-memory-mcp/   │
│                    (SQLite 数据库)         │
└─────────────────────────────────────────┘

6.3 Installation and Integration

One-click installation (macOS / Linux):

# 标准版
curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash

# 带 3D 可视化 UI
curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash -s -- --ui

Windows(PowerShell):

Invoke-WebRequest -Uri https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.ps1 -OutFile install.ps1
.\install.ps1

Package Manager Installation:

# Homebrew
brew install codebase-memory-mcp

# npm
npx codebase-memory-mcp install

# PyPI
uvx codebase-memory-mcp install

# Arch Linux (AUR)
yay -S codebase-memory-mcp-bin

Setup Auto: Detect all installed encoding agents (Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, Kiro) and configure MCP server entries, instruction files, skills, and pre-tool hooks for them.

6.4 Configuration Key

# 启用自动索引(首次连接时自动索引新项目)
codebase-memory-mcp config set auto_index true

# 自动索引文件上限(默认 50000)
codebase-memory-mcp config set auto_index_limit 100000

# 日志级别
export CBM_LOG_LEVEL=debug

# 并行 Workers 数量(容器化环境)
export CBM_WORKERS=4

6.5 build from source code

git clone https://github.com/DeusData/codebase-memory-mcp.git
cd codebase-memory-mcp
git config core.hooksPath scripts/hooks  # 激活 pre-commit 安全检查
scripts/build.sh
# 输出: build/c/codebase-memory-mcp

Prerequisites: C compiler (gcc/clang), make, zlib.

How to use #7.

7.1 developer daily workflow

Step 1: Install and start

After installation, restart your AI coding agent (such as Claude Code). The Agent automatically connects to the MCP server.

Step 2: Index Items

Directly to the Agent: " Index this project ".

Or explicitly call:

codebase-memory-mcp cli index_repository '{"path": "/path/to/project"}'

Step 3: Start asking questions (example)

-"What is the overall architecture of this project?" → Agent calls "get_architecture'

-"Who is the ProcessOrder function called? What is the call chain?" → Agent calls 'trace_path/trace_call_path'

-"Is there a function that has never been called?" → Agent uses Cypher to do dead code detection.

-"What modules will be affected by modifying 'auth.go'?" → Agent calls detect_changes'

-"Look for functions similar to' sendNotification'" → Agent uses semantic_query'

-"List all HTTP REST endpoints" → Agent uses 'search_graph(label = "Route")'

7.2 CLI stand-alone use

You can also query directly in the terminal using the CLI mode:

# 搜索函数
codebase-memory-mcp cli search_graph '{"name_pattern": ".*Handler.*"}'

# Cypher 查询
codebase-memory-mcp cli query_graph '{"query": "MATCH (f:Function)-[:CALLS]->(g) WHERE f.name = \"main\" RETURN g.name"}'

# 获取架构
codebase-memory-mcp cli get_architecture '{}'

7.3 Team Shared Workflow

# 开发者 A:索引后导出团队共享工件
# (索引时自动生成 .codebase-memory/graph.db.zst)

# 提交到仓库
git add .codebase-memory/graph.db.zst .gitattributes
git commit -m "更新代码知识图谱工件"

# 开发者 B:clone 后直接使用
# codebase-memory-mcp 检测到 graph.db.zst → 解压导入 → 仅跑增量索引

8. What can I say before sales

8.1 Core Value Proposition (Elevator Pitch)

"when your development team uses AI Agent to assist in programming, does it find that Agent is often" lost "in the face of large code bases? Frequent file reading, repeated grep,Token consumption is amazing, but the answer quality is unstable? Codebase Memory MCP takes 3 minutes to build a knowledge map for your code base, after which all code structure queries can be completed in milliseconds, and Token consumption is reduced by 99%. It is a 35MB static binary file, no Docker, no API Key, and the code never leaves your machine-it can be used after installation."

8.2 customer communication skills (by role)

To CTO/Technical VP:

"This is an infrastructure investment, not another tool. It allows your AI coding Agent to change from "blind man touching the elephant" to "bird's eye view of the overall situation", directly improving R & D efficiency."

-"MIT Open Source, Zero Vendor Lock. Maintained by the German team, with academic papers and SLSA Level 3 build proofs."

-"It has been verified on the Linux kernel (28 million lines of code), and your code base will not be larger than it."

For architects:

-"Leiden community detection algorithm can automatically discover the functional module boundaries in the code base to help verify or correct your existing architecture division."

-"Support for cross-service HTTP/gRPC routing automatic matching and cross-warehouse dependency analysis, microservice governance weapon."

-"Support ADR (Architecture Decision Record) management, which can persist architecture knowledge between AI sessions."

For Security/Compliance Officer:

-"100 percent local processing, code never leaves your machine. No external API is required."

-"Each Release is scanned by 70 + antivirus engines SHA-256 signature verification."

-"There is a dedicated SECURITY.md and responsible disclosure process."

To Developer Team Lead:

-"The time for new people to start large projects can be shortened from weeks to days-let the AI Agent directly 'understand' the entire code structure before answering questions."

-"Git changes → affected symbols are automatically mapped, and change impact analysis can be done in Pre-commit or CI."

-"Support 11 mainstream AI Agents, which your team can use."

8.3 Horizontal Comparison Advantage

Compare dimensionsCodebase Memory MCPLegacy grep/ripgrepIDE LSP (single file)Other code map tools (e. g. Sourcegraph)
Cross file call graph✅Auto Build❌Manual series connection required⚠️ Single Language
Query speed< 1ms (graph query)Second level (full-warehouse search)Instant (limited to open files)Dependent index
Token EfficiencyExcellent (Structured Output)Poor (Full Text)N/AMedium
Multi-language support158Common textSingle language per LSPLimited
Offline/Localmostly SaaS
Deployment complexitySingle-binary, zero-dependencySingle-binaryComplex (per language installation LSP)Server required
Integration with AI AgentNative MCPAgent self-assembly requiredMCP bridging requiredCustom integration required
Open SourceMITDifferent toolsDifferent toolsMost of them are commercial

8.4 ROI Calculation Ideas

Consider a 10-person development team:

-Each person asks AI Agent 20 code-related questions every day.

-Using grep/file-read method, each question consumes an average of 4000 tokens → 800000 tokens per day

-Average 170 tokens/questions using Codebase Memory MCP → 34000 tokens per day

-Token saves 95.8, priced at Claude Sonnet at about $3/M input tokens, saves about $2.3 per day, ~$580 per person per year

-More importantly, answer quality improvement and developer time savings-reduce incorrect answers and follow-up costs due to insufficient context of the Agent

9. Frequently Asked Customer Questions

Q1: Is the data safe? Is the code uploaded to the outside?

A: It is absolutely safe. 100 percent of all processing is done locally. The code never leaves your machine. The built-in semantic search uses the Nomic embedding model compiled into binary and does not require any external API. Each Release is scanned and SHA-256 signed by 70 + antivirus engines.

Q2: How long does indexing take? Does it affect the development experience?

A: Small and medium-sized projects (such as Django) about 6 seconds; large projects (Linux kernel 28 million lines) about 3 minutes. The memory is released back to the operating system after indexing is complete. Subsequent file changes are automatically incrementally indexed through background monitoring, with almost no sense. You can also enable automatic indexing, which is automatically triggered on the first connection.

Q3: What is the difference between" Go to Definition "and VS Code/IntelliJ?

a: IDE's code navigation is based on single-file/single-language LSP, which mainly serves "people jump in the editor". Codebase Memory MCP builds a cross-file and cross-language global knowledge graph, specifically for AI Agent's semantic understanding service -- Agent can query "the complete call chain of this function", "list of all HTTP endpoints", and "code similar to this module" at one time without dozens of file-by-file jumps.

Q4: Are the 158 supported languages of the same quality?

A: High quality Tier 1(≥ 90%,17 languages) including C/C ++, Kotlin, Lua, Groovy, Swift, Zig, etc. Tier 2(75-89%,16) includes mainstream languages such as Python, TypeScript, Go, Rust, Java, PHP, C#, etc. Nine of these languages have Hybrid LSP deep type resolution. Haskell and OCaml are currently Tier 3(<75%) but continue to improve.

Q5: Can I index multiple warehouses at the same time? Can I do analysis across warehouses?

A: Yes. The same SQLite store can hold multiple indexed repositories. Cross-warehouse nodes are linked by 'CROSS_* 'edges. The 3D visualization UI also supports multi-galaxy layouts to showcase cross-warehouse architectures.

Q6: What is the cost?

A:MIT open source, completely free. No Enterprise Edition, no SaaS subscription fees. The only "cost" is the computational resources (CPU memory) at the time of installation and indexing.

Q7: What permissions do I need?

A: Read your code base (build index) and write to the Agent configuration file (automatically configure MCP entries during installation). No network access required, no sudo required.

Q8: How to update?

a: 'codebase-memory-mcp update' a key to complete. The server also automatically checks for updates when it starts.

10. PoC Recommendations

10.1 PoC Target

Verify the (1) indexing speed, (2) query accuracy, (3) integration experience with existing AI agents, and (4) token saving effect of the Codebase Memory MCP on the customer's real code base.

10.2 PoC scheme (1-2 weeks recommended)

Phase 1: Preparation and Erect (1 day)

  1. Select a representative code repository (recommended: multilingual,> 1000 files, microservices or modular structure, such as the customer's core business repository)
  2. Install Codebase Memory MCP on developer machine (integration with customer's existing AI Agent)
  3. Run the first full index and record the index time, number of nodes, and number of edges.

Phase 2: Functional Verification (2-3 days)

Prepare a Question Set(10-15 questions) covering:

-D1 (Definition/API Discovery):"List all REST API endpoints"/"Where is the project entry point?"

-D2 (relationship/call graph):"the complete call chain of the order process?"/"which packages does the auth module depend on?"

-D3 (Precision Search):"Found the implementation of 'PaymentService.process"

-D4 (Architecture/Structure):"What is the layered architecture of this project and what are the core modules?"

-D5 (cross/semantic):"Is there a function similar to 'sendEmail' but with a different name?"

Answer the same set of questions in two ways:

-Control group: pure AI Agent(grep/read/Glob mode)

-Experimental group:AI Agent Codebase Memory MCP

Comparison records:

-Token consumption

-Answer accuracy (manual score 1-5)

-Answer time

-Number of tool invocations

Phase 3: Deep Scene Verification (2-3 days)

Select 2-3 depth scenes according to the actual pain points of the customer:

-If it is microservice governance → verify that the 'get_architecture HTTP route matches the cross-service call chain.

-If it is legacy system maintenance → verify dead code detection call chain tracking impact analysis ('detect_changes)

-If it is multi-warehouse management → Verify cross-store'CROSS_* 'side multi-warehouse architecture view

-If it is a team collaboration → validation'. codebase-memory/graph.db.zst' artifact sharing process

Phase 4: Summary Report (1 day)

Output a PoC report containing:

-Index performance metrics

-Comparison of accuracy of question answers

-Token savings ratio

-Developer experience feedback

-Recommended team promotion path

10.3 PoC Success Criteria

IndicatorTarget value
Index completion time<0.1 seconds per file of customer code base files
structure query answer accuracy≥ 80% (manual scoring)
Token savings vs pure Agent≥ 80%
Developer's subjective satisfaction≥ 4/5
Install to first availability< 10 minutes

10.4 PoC Considerations

  1. The first index takes some time : do a good job of expected management, emphasizing "one index, long-term benefits"
  2. The effect of dynamic language is slightly lower than that of static language : The call graph accuracy of Python/JS/PHP depends on the type inference of Hybrid LSP. It is recommended that PoC also include warehouses of static languages such as Go/TypeScript/Java
  3. Large warehouse index memory :Linux kernel level index requires about several GB of memory to ensure that PoC machine memory is sufficient (16GB is recommended)
  4. Does not replace code review: graph analysis is auxiliary and should not be used as the only basis for safety/quality
  5. File Monitoring Overhead: The performance of background git monitoring on the Windows requires additional attention.

11. Risks and Considerations

11.1 Technology Risk

RiskSeverityMitigation
Insufficient precision of dynamic language call graphMediumExplicitly inform customers of differences; PoC covers static + dynamic languages; Pay attention to continuous improvement of Hybrid LSP (Java/Kotlin/Rust support has been added in v0.8.0)
Index large warehouse memory peakMediumv0.8.0 optimized: streaming SQLite dump, string interning, allocator page reclaim;cgroup perception + 'CBM_WORKERS' adjustable
C ++ template code parsing stabilitylowv0.8.0 has solved a large number of C ++ template code crashes (10 + issue from the community),16 large warehouse end-to-end verification passed
Code synchronization delayLowThe background git watcher detects changes in real time, and the incremental index is usually <1 second
Non-mainstream language parsing quality LowOCaml/Haskell, etc. is Tier 3(62-72%), which has little impact on major corporate languages

11.2 Commercial/Ecological Risk

RiskSeverityMitigation
The project relies on a single maintainer mediumobserving community activity: v0.8.0 has 16 community contributors and 40 issue reporters; 863 commits, active Release rhythm
MCP Protocol Evolution LowMCP is already an industry standard protocol. The project is compatible with MCP JSON-RPC 2.0 and has been listed in the official MCP Registry and Glama directory
Competition appearsMediumCurrently, there is no direct competition that can reach the same level of speed, language coverage, zero deployment complexity; continuous attention required
Customer maintenance concerns for C language projectsLowPure C is the source of its performance advantages; compiling to a single static binary means that customers do not need to maintain C code
No Commercial Support (MIT Open Source)MediumCustomers need to be clearly informed; For enterprise customers who need SLA support, they need to evaluate whether there is a third-party support plan

11.3 Security Risks

-Code data security: 100 local processing has been confirmed, and the code will not leave the machine✅

-Binary Supply Chain Security:SLSA Level 3 Build Proof + SHA-256 Signature + VirusTotal Scan✅

-The graph artifact 'graph.db.zst' is submitted to the repository: contains code structure information (function name and call relationship), but does not contain source code content; Sensitive code libraries can be configured with '.gitignore' not submitted

-Graph UI: Only 127.0.0.1 is bound and not exposed to the public.✅

12. My Pre-Sales Judgment

12.1 Overall Assessment:★★★★★Strongly recommended

Codebase Memory MCP is a truly differentiated product. It is not another shallow wrapper of "AI code", but solves a real and common pain point at the AI Agent infrastructure layer: context explosion and lack of structure awareness when AI Agents face large code bases. It works in three dimensions:

  1. Extreme Performance : Pure C Writing RAM-first Pipeline LZ4 Compression → Linux Kernel 3-minute Index
  2. Simple and extreme : single binary file, zero dependency, a curl command installation
  3. Open Extreme :MIT Open Source, MCP Standard Protocol, Support for 11 Agents, 158 Languages

12.2 who should try it right away?

-All teams that are using AI coding agents: This is the most effective "Agent Code Understanding Accelerator" currently known"

-Teams managing large code bases (> 1000 files): Revenue is proportional to code base size

-Multi-language/multi-service architecture team: Cross-language and cross-service global knowledge graph cannot be provided by single-language LSP.

-Organizations with strict requirements for security and data privacy: local processing is the core selling point

12.3 who needs to wait and see?

-Team with a very small code base (< 100 file): It may be easier to directly let the Agent read the file.

-Teams not using AI coding agents: Traditional IDE navigation is sufficient

- Teams mainly using Tier 3 languages such as Haskell/OCaml : Parsing quality needs to be improved

-Customers who need commercial SLA support: Pure community-driven MIT project, no official commercial support

12.4 Competitive Landscape

Similar projects in the current market (such as graphify, Sourcegraph Cody, GitNexus, etc.) are not as good as Codebase Memory MCP in terms of open source, deployment complexity, and language coverage. The project has removed the comparison table with GitNexus in the README, indicating that it has sufficient confidence in its competitiveness.

12.5 Pre-Sales Action Recommendations

  1. Priority 1: Recommend PoC to all customers with AI coding Agent usage scenarios (very low cost, one command can start)
  2. Priority 2: Create guidelines and best practice documents for Chinese culture to lower the threshold for Chinese customers.
  3. Priority 3 : Collect PoC data and establish a case base for Chinese customers (real data such as Token saving and index performance)
  4. Priority 4: Focus on project Roadmap, especially Hybrid LSP support and CI/CD integration capabilities in more languages