CLI-Anything - AI Navigation

1. One sentence positioning

CLI-Anything is an open source project that transforms existing software into an Agent-native CLI.

It is not an ordinary command-line tool, nor is it a GUI automation framework, but a set of methodology, plug-ins, sample harness, and CLI ecology that "make any software with a code base into an operational tool for AI agents.

This can be explained to the customer:

In the past, we let AI Agent look at the screen, find buttons and click coordinates like a human. This method is easily affected by interface changes, window size and loading delay. The CLI-Anything idea is the other way around: add a structured command line interface to the software, allowing the Agent to directly operate the real software with commands, status, and JSON results.

2. What problem does it mainly solve

Natural Vulnerability of 2.1 GUI Agent

The summary of arXiv's technical report makes the problem very clear: the current mainstream computer use scheme often allows Agent to operate the application through screenshots, visual recognition and coordinate clicks. This poses several problems:


Pixel-level interaction is fragile	UI skin change, button movement, pop-up window change will fail	Automation has poor stability and high maintenance cost
Time Dependence	Slow Page Load, Animation Delay and Network Fluctuation All Affect Click	Task Success Rate Uncontrollable
The status cannot be read explicitly	The agent can only "see" the interface, so it is difficult to obtain the complete structured status	It is difficult to make reliable judgment and rollback
Output is difficult to verify	The completion of GUI operation does not mean that the file/result is true and correct	High risk in production environment
Difficult to scale	Each software and version must be re-adapted to the visual path	Enterprise landing cost is high

The CLI-Anything solution is: do not let the Agent simulate human visual limitations, but let the software expose interfaces that are more suitable for the Agent.

2.2 Agent lacks real professional software capabilities

Many enterprises want to use professional software for agents, but they will encounter two extremes:

-Direct GUI automation: can use existing software, but not stable.

-Re-write a lightweight alternative: stable, but far less functional than real software.

CLI-Anything emphasis on "real software integration": the CLI generates legitimate project files or scripts and then invokes real backend rendering/exporting. For example:

-LibreOffice: Generate ODF, and then use 'libreoffice -- headless' to export PDF/docx/XLSX/PPTX.

-Blender: Generate bpy script and render it with' blender -- background -- python.

-Inkscape: Operate SVG and export it with Inkscape.

-Shotcut/Kdenlive: Generate MLT XML and render it with 'melt' or 'ffmpeg.

-OBS: Control real OBS with obs-websocket.

This principle is key: it is a "structured interface to real software", not a toy rewrite.

3. What can it mainly do

3.1 Install Existing CLI Using CLI-Hub

The CLI-Hub is the CLI-Anything eco-portal for browsing, searching, installing, updating, uninstalling, and launching the community's existing CLI.

pip install cli-anything-hub

cli-hub list
cli-hub search image
cli-hub info gimp
cli-hub install gimp
cli-hub launch gimp

Pre-Sales Interpretation:

-If the customer wants to verify "whether the Agent can use professional software", he can first find out whether the corresponding tool exists in the CLI-Hub.

-This is faster than building harness from scratch and is suitable for demo and early PoC.

-Note: Some CLIs still rely on real upstream software, such as GIMP, Blender, LibreOffice, which is still installed locally or on the server.

3.2 Generate Agent Available CLI for New Software

If the CLI-Hub does not have the target software, the CLI-Anything 7-stage build process can be used:

Phases	Official Process	Pre-Sales Understanding
1. Analyze	Scan the source code to find GUI actions, data models, back-end interfaces	Find out where the real capabilities of this software are
2. Design	Design command group, state model, output format	Organize software capabilities into operation surfaces that Agent can understand
3. Implement	Use Click to implement CLI, REPL, JSON output, undo/redo	Generate interactive and scriptable command tools
4. Plan Tests	Write the TEST.md test plan first	Avoid only doing demo and not verifying
5. Write Tests	Write single test, end-to-end test, real back-end test	Verify real workflow
6. Document	Update Test Results and Documentation	Make Tools Handover and Maintainable
6.5 Skill	Generate SKILL.md	Let the agent automatically discover and use it later
7. Publish	Generate setup.py and install it on the PATH	Enter the team reusable state

Typical commands:

/cli-anything ./gimp
/cli-anything https://github.com/blender/blender
/cli-anything:refine ./gimp "batch processing and filters"
/cli-anything:test ./inkscape
/cli-anything:validate ./audacity

3.3 Verified harness example

README shows that the project has covered many types of software, including:

Category	Example	Customer Value
Creativity and Media	GIMP, Blender, Inkscape, Audacity, Kdenlive, Shotcut, OBS	Let the Agent automatically generate pictures, videos, live scenes, and audio processing
Office and Knowledge Management	LibreOffice, Zotero, Joplin, Calibre	Automatically generate reports, process documents, manage databases
Charts and Visualizations	Draw.io, Mermaid	Automatically generate architecture diagrams, flowcharts, and explanatory materials
AI/ML platform	ComfyUI, Ollama, etc.	Command-driven model inference and workflow
Development and debugging	LLDB, RenderDoc, Nsight Graphics, and Unreal Insights	Let the agent participate in debugging, performance analysis, and graph analysis
Enterprise Tools	Zoom, CloudAnalyzer, AdGuard Home and more	Conference, Cloud Cost, Security/Network Tools Automation

The latest English page of README shows that 100 percent of 2,461 tests passed, including unit tests, end-to-end tests, Node.js tests, etc. The number in the Chinese README is slightly lower, indicating that the Chinese document may lag behind; the latest status of the English README should be used for pre-sale references, and the date of verification should be indicated.

4. Applicable Scenario

4.1 enterprise internal software agent

Suitable for customers:

-There are a large number of internal tools, backend systems, desktop software or open source custom systems.

-Want the AI Agent to perform business tasks directly, rather than just answering questions.

-The existing system does not have a perfect API, or the API granularity is too low and the documentation is complicated.

Typical Value:

-Organize decentralized functions into a unified CLI.

-Let Agent use '-- help' and' -- json' to self-discover the ability.

-Use tests to ensure reliable command results.

Example:

An enterprise has internal reporting tools, data cleaning tools, and document generation tools. Through CLI-Anything, these toolkits can be installed into a unified CLI, allowing the Agent to automatically generate reports, export files, and check results instead of relying on people to click on the page every time.

4.2 Replace Fragile RPA / GUI Automation

Suitable for customers:

-Currently using RPA, recording screen clicks, browser automation scripts.

-Automation often fails because of interface changes.

-Desiring to reduce maintenance costs and increase mission success.

Pre-sales words:

The problem with RPA is not that it cannot be done, but that it is expensive to maintain for a long time. The CLI-Anything idea is to change "where to point" into "what command to call" and change UI results into JSON and file verification, which is more suitable for Agent and production environment.

4.3 Ecological Construction of AI Agent Tools

Suitable for customers:

-An in-house Agent platform is being built.

-Need to provide a stable toolset for the Agent.

-Want to turn multiple business systems, open source tools, and desktop software into callable tools.

The value of the CLI-Anything is that it is not only for one Agent, but also makes the tool itself Agent-native. As long as the output of CLI and SKILL.md is stable, similar capabilities can be reused in compatible environments such as Claude Code, Codex, OpenClaw, OpenCode, Hermes, and Reasonix.

4.4 content production automation

Suitable scenarios:

-Automatic PPT/Document/PDF generation.

-Automatically make video clips, subtitles, covers.

-Automatically render 3D product drawings.

-Automatically generate flowcharts, architecture diagrams, training materials.

CLI-Anything has an advantage in this kind of scenario because it tends to call real software to export real results, rather than just generating an intermediate file that looks like it.

4.5 Software Vendors Make Agent-ready Transformation

For software vendors, CLI-Anything can be used as a product inspiration:

-Complements the CLI layer for your own software.

-Provides JSON output and a verifiable state.

-Provide SKILL.md or Agent instructions.

-Provide end-to-end task samples and test suites.

This kind of capability may become an important indicator of whether the software is suitable for Agent use in the future.

5. Not suitable for the scene

Not suitable for the scene	Reason	Suggestion
The target software is completely closed source and has no script/API/file format entry	The CLI-Anything mainly depends on the source code, real backend, project files or existing CLI	First evaluate whether there is an official API, SDK, MCP or automation interface
Only want to do one-time simple web page click	May be faster with Playwright/RPA	No need to generate full harness for one-time tasks
The customer lacks a strong model and engineering team	README explicitly mentions the need for a strong base model, and a weak model may generate an incomplete CLI	The implementation team builds it first and then delivers it to the maintenance specification
The target software relies on complex GUI states and has no stable backend	It is difficult to ensure real output and test stability	Do technical feasibility evaluation first
High compliance and security requirements but lack of review process	Auto-generated CLI may touch file writes, system calls, real software execution	Include code review, permission control, sandboxing, and auditing

6. Architecture and Key Design

! CLI-Anything Architecture

6.1 recommended pre-sales understanding architecture

flowchart LR Agent["AI Agent"] --> Skill["SKILL.md / --help
能力发现"] Agent --> CLI["cli-anything-
JSON + Human 输出"] CLI --> State["Session / Project State
undo / redo / history"] CLI --> Native["原生项目文件
ODF / SVG / MLT / bpy / JSON"] Native --> Backend["真实软件后端
LibreOffice / Blender / GIMP / ffmpeg"] Backend --> Artifact["真实产物
PDF / 图片 / 视频 / 音频 / 报告"] CLI --> Tests["Unit + E2E + Subprocess Tests"]

6.2 Core Principles

Principles	Meaning	Why is important
Real software is hard to rely on	Do not replace real software with toys	Ensure that the product is consistent with the customer's real workflow
Dual interactive mode	REPL subcommands	Both agent long sessions and script automation
JSON output	Each command supports '-- json'	Agent can parse stably
Self-describing interfaces	'-- help', 'which', SKILL.md	Agents can discover and learn tools
Strong test	Single test real backend E2E CLI sub-process test	Avoid fake automation that "looks like it can run"

No graceful degradation	Failure if the backend is missing and installation instructions are given	Prevent the production environment from secretly generating wrong results

How to use #7.

7.1 as a CLI-Hub user

pip install cli-anything-hub

cli-hub list
cli-hub search diagram
cli-hub info drawio
cli-hub install drawio
cli-hub launch drawio

Suitable for presentation and quick verification.

7.2 as a Claude Code plug-in user

/plugin marketplace add HKUDS/CLI-Anything
/plugin install cli-anything
/cli-anything ./gimp
/cli-anything:refine ./gimp "batch processing and filters"

7.3 as a Codex Skill user

git clone https://github.com/HKUDS/CLI-Anything.git
bash CLI-Anything/codex-skill/scripts/install.sh

Then natural language trigger in Codex:

Use CLI-Anything to build a harness for ./gimp
Use CLI-Anything to refine ./shotcut for picture-in-picture workflows
Use CLI-Anything to validate ./libreoffice

7.4 using the generated CLI

cd /agent-harness
pip install -e .

which cli-anything-
cli-anything- --help
cli-anything-
cli-anything- --json

Example: Generate a PDF LibreOffice the CLI.

cli-anything-libreoffice document new -o report.json --type writer
cli-anything-libreoffice --project report.json writer add-heading -t "Q1 Report" --level 1
cli-anything-libreoffice --project report.json writer add-table --rows 4 --cols 3
cli-anything-libreoffice --project report.json export render output.pdf -p pdf --overwrite
cli-anything-libreoffice --json document info --project report.json

8. What can I say before sales

8.1 for business

The value of CLI-Anything is to let AI really use the tools that the enterprise has, rather than just giving advice in the chat window. It can turn documents, designs, videos, diagrams, development tools and other software into commands that can be called by the Agent, thus pushing "AI can say" to "AI can do".

8.2 for Technical Leader

It is not traditional RPA, nor is it a screen click script. It emphasizes structured commands, explicit state, JSON output, real back-end execution, and end-to-end validation. For the enterprise Agent platform, this kind of tool interface is easier to maintain, monitor and reuse than GUI click.

8.3 for CIO/Management

Enterprises already have a lot of software assets. CLI-Anything provides a low-intrusive agentization path: start with open source or internal tools, package high-frequency processes into testable CLI, and then gradually incorporate them into the enterprise agent platform. This reduces repetitive manual operations and avoids refactoring all systems at the beginning.

9. Frequently Asked Customer Questions


Is it RPA?	Not traditional RPA. RPA typically simulates a human clicking GUI;CLI-Anything emphasis generates a structured command interface for the software, invokes the real backend and validates the output.
Can it handle closed source commercial software?	If the commercial software has API, scripting interface, CLI, editable project files or MCP services, there is an opportunity; if there is only a black box GUI, the difficulty will increase significantly.
Is the generated CLI reliable?	Depends on target software complexity, base model capability, and test quality. The project methodology requires real back-end E2E testing and output verification, but requires manual review before production.
Will it replace the official software API?	Not necessarily. When the official API is of high quality, it should be used first. CLI-Anything is more suitable for combining API, project file and real software back end into Agent-friendly workflow.
Is it suitable for enterprise internal tools?	Suitable, especially for internal tools with source code or stable scripting interfaces. The internal toolkit can be installed as a unified CLI and then called by the agent.
What are the security risks?	CLI can read and write files, call real software, and execute external commands, so it must have permission boundaries, audit logs, sandboxes, and code reviews.
Why not use GUI Agent directly?	GUI Agent is suitable for temporary operation and non-interface systems, but on long-term production tasks, structured CLI is more stable, testable and replayed.

10. PoC Recommendations

10.1 PoC Topic Selection

It is recommended to choose a "real, valuable, but clearly bounded" task:

-Automatically generate a PDF of customer weekly report.

-Automatically generate a short video from footage.

-Automatically generate a system architecture diagram.

-Automatically download, transcribe and archive meeting recordings.

-Automatically drive the internal reporting tool to export data.

It is not recommended to choose the whole ERP, the whole CAD or the whole office process at the beginning.

10.2 PoC Phase

Phase	Work	Acceptance
Phase 1: Feasibility Assessment	See if the target software has source code, CLI, API, project file or script interface	Find a real back-end portal
Phase 2: Minimum harness	Generate 3-5 core commands	Agent can complete a closed-loop task
Phase 3: Real Output	Call Real Software to Generate PDF/Picture/Video/Report	Product Openable, Checker, Correct Format
Phase 4: Test Replenishment	Single Test, E2E, CLI subprocess Test	Automated Test Passed
Phase 5: Agent Integration	Generate SKILL.md and connect to the enterprise Agent platform	Agent can self-discover and call
Phase 6: Security Reinforcement	Permissions, Audit, Sandbox, Error Handling	Meeting Enterprise Online Requirements

10.3 Suggested Indicators

Metrics	Meaning
Task Success Rate	Whether the same task is repeated stably
End-to-end time consumption	How much faster than manual/RPA
	Failures Diagnosticability	Failure with explicit errors and logs
Output Accuracy	Whether the file format, content, pixel/audio/duration are up to standard
Number of overwritten commands	Degree of coverage of target business process
Maintenance cost	How many commands/tests need to be changed after software upgrade
Agent token consumption	Whether JSON/CLI is less expensive than bare APIs or GUI observations

11. Risks and Considerations

11.1 have requirements for model capability

The limitations section of the README explicitly mentions that it relies on strong base models to reliably generate harness; weaker models may generate incomplete or erroneous CLI, requiring extensive manual correction.

Don't promise to "generate all software CLI with one click and no manual work" before selling ". It would be safer to say:

CLI-Anything provides automated generation and methodology, production quality still requires code review, test completion, and iterative refinement.

11.2 source code and back-end entry to determine the upper limit

If the target software has clear source code, project file format, official CLI, script interface, the success rate will be much higher. On the contrary, if there is only a closed-source GUI and no API/SDK/file format documentation, the CLI-Anything's playing space will be limited.

11.3 auto-generated code requires security review

Enterprise scenarios must consider:

-Whether sensitive files will be accessed.

-Whether the external command will be executed.

-Whether production data will be written.

-Whether permission isolation is required.

-Whether there is an audit log.

-Whether to be able to rollback and replay.

11.4 testing is the key to success or failure

CLI-Anything place great emphasis on testing, which should be made clear in pre-sales. Without real backend E2E and output verification, the generated CLI can easily become a demo that "looks like it can run.

12. My Pre-Sales Judgment

The strategic significance of the CLI-Anything is greater than the individual tools themselves.

It captures a key problem in Agent landing: large models can already reason and plan, but the real work of the enterprise is precipitated in existing software. If the Agent can only look at the screen and click, reliability is difficult to enter production. If each system develops API separately, the cost is too high. CLI-Anything provides a third way: to package software capabilities as Agent-native CLI.

For customer types of focus:

Customers who are building an enterprise Agent platform.
Technical customers with a large number of in-house tools, scripts and desktop software.
Customers with high RPA maintenance costs and want to upgrade to Agent automation.
Customers with strong demand for content production, document generation, design/video/chart automation.
Software vendors want their products to be more easily used by AI Agents.

Pre-sales positioning recommendations:

don't talk about CLI-Anything as "universal one-key automation artifact", but as "methodology and engineering framework for Agent tooling and software Agent-native transformation". Its value lies in structured, testable, reusable and accessible Agent platform.

13. REFERENCE

-GitHub repository:HKUDS/CLI-Anything

-English README:README.md

-Chinese README:README_CN.md

-Methodology Handbook:HARNESS.md

-Technical Report:CLI-Anything: Towards Agent-Native Computer Use

-CLI-Hub:https://hkuds.github.io/CLI-Anything/

-Teaser diagram:teaser.png

-Architecture diagram:architecture.png

information verification date: 2026-06-30. GitHub API anonymous access triggers stream limiting, so this note is not written into real-time stars/forks; The project status, installation method, number of tests and restriction instructions are mainly based on the official README, Chinese README, HARNESS.md and arXiv summaries.