← Back to Project List

1. One sentence positioning

CLI-Anything is an open source project that transforms existing software into an Agent-native CLI.

It is not an ordinary command-line tool, nor is it a GUI automation framework, but a set of methodology, plug-ins, sample harness, and CLI ecology that "make any software with a code base into an operational tool for AI agents.

This can be explained to the customer:

In the past, we let AI Agent look at the screen, find buttons and click coordinates like a human. This method is easily affected by interface changes, window size and loading delay. The CLI-Anything idea is the other way around: add a structured command line interface to the software, allowing the Agent to directly operate the real software with commands, status, and JSON results.

2. What problem does it mainly solve

Natural Vulnerability of 2.1 GUI Agent

The summary of arXiv's technical report makes the problem very clear: the current mainstream computer use scheme often allows Agent to operate the application through screenshots, visual recognition and coordinate clicks. This poses several problems:

Pixel-level interaction is fragileUI skin change, button movement, pop-up window change will failAutomation has poor stability and high maintenance cost
Time DependenceSlow Page Load, Animation Delay and Network Fluctuation All Affect ClickTask Success Rate Uncontrollable
The status cannot be read explicitlyThe agent can only "see" the interface, so it is difficult to obtain the complete structured statusIt is difficult to make reliable judgment and rollback
Output is difficult to verifyThe completion of GUI operation does not mean that the file/result is true and correctHigh risk in production environment
Difficult to scaleEach software and version must be re-adapted to the visual pathEnterprise landing cost is high

The CLI-Anything solution is: do not let the Agent simulate human visual limitations, but let the software expose interfaces that are more suitable for the Agent.

2.2 Agent lacks real professional software capabilities

Many enterprises want to use professional software for agents, but they will encounter two extremes:

-Direct GUI automation: can use existing software, but not stable.

-Re-write a lightweight alternative: stable, but far less functional than real software.

CLI-Anything emphasis on "real software integration": the CLI generates legitimate project files or scripts and then invokes real backend rendering/exporting. For example:

-LibreOffice: Generate ODF, and then use 'libreoffice -- headless' to export PDF/docx/XLSX/PPTX.

-Blender: Generate bpy script and render it with' blender -- background -- python.

-Inkscape: Operate SVG and export it with Inkscape.

-Shotcut/Kdenlive: Generate MLT XML and render it with 'melt' or 'ffmpeg.

-OBS: Control real OBS with obs-websocket.

This principle is key: it is a "structured interface to real software", not a toy rewrite.

3. What can it mainly do

3.1 Install Existing CLI Using CLI-Hub

The CLI-Hub is the CLI-Anything eco-portal for browsing, searching, installing, updating, uninstalling, and launching the community's existing CLI.

pip install cli-anything-hub

cli-hub list
cli-hub search image
cli-hub info gimp
cli-hub install gimp
cli-hub launch gimp

Pre-Sales Interpretation:

-If the customer wants to verify "whether the Agent can use professional software", he can first find out whether the corresponding tool exists in the CLI-Hub.

-This is faster than building harness from scratch and is suitable for demo and early PoC.

-Note: Some CLIs still rely on real upstream software, such as GIMP, Blender, LibreOffice, which is still installed locally or on the server.

3.2 Generate Agent Available CLI for New Software

If the CLI-Hub does not have the target software, the CLI-Anything 7-stage build process can be used:

PhasesOfficial ProcessPre-Sales Understanding
1. AnalyzeScan the source code to find GUI actions, data models, back-end interfacesFind out where the real capabilities of this software are
2. DesignDesign command group, state model, output formatOrganize software capabilities into operation surfaces that Agent can understand
3. ImplementUse Click to implement CLI, REPL, JSON output, undo/redoGenerate interactive and scriptable command tools
4. Plan TestsWrite the TEST.md test plan firstAvoid only doing demo and not verifying
5. Write TestsWrite single test, end-to-end test, real back-end testVerify real workflow
6. DocumentUpdate Test Results and DocumentationMake Tools Handover and Maintainable
6.5 SkillGenerate SKILL.mdLet the agent automatically discover and use it later
7. PublishGenerate setup.py and install it on the PATHEnter the team reusable state

Typical commands:

/cli-anything ./gimp
/cli-anything https://github.com/blender/blender
/cli-anything:refine ./gimp "batch processing and filters"
/cli-anything:test ./inkscape
/cli-anything:validate ./audacity

3.3 Verified harness example

README shows that the project has covered many types of software, including:

CategoryExampleCustomer Value
Creativity and MediaGIMP, Blender, Inkscape, Audacity, Kdenlive, Shotcut, OBSLet the Agent automatically generate pictures, videos, live scenes, and audio processing
Office and Knowledge ManagementLibreOffice, Zotero, Joplin, CalibreAutomatically generate reports, process documents, manage databases
Charts and VisualizationsDraw.io, MermaidAutomatically generate architecture diagrams, flowcharts, and explanatory materials
AI/ML platformComfyUI, Ollama, etc.Command-driven model inference and workflow
Development and debuggingLLDB, RenderDoc, Nsight Graphics, and Unreal InsightsLet the agent participate in debugging, performance analysis, and graph analysis
Enterprise ToolsZoom, CloudAnalyzer, AdGuard Home and moreConference, Cloud Cost, Security/Network Tools Automation

The latest English page of README shows that 100 percent of 2,461 tests passed, including unit tests, end-to-end tests, Node.js tests, etc. The number in the Chinese README is slightly lower, indicating that the Chinese document may lag behind; the latest status of the English README should be used for pre-sale references, and the date of verification should be indicated.

4. Applicable Scenario

4.1 enterprise internal software agent

Suitable for customers:

-There are a large number of internal tools, backend systems, desktop software or open source custom systems.

-Want the AI Agent to perform business tasks directly, rather than just answering questions.

-The existing system does not have a perfect API, or the API granularity is too low and the documentation is complicated.

Typical Value:

-Organize decentralized functions into a unified CLI.

-Let Agent use '-- help' and' -- json' to self-discover the ability.

-Use tests to ensure reliable command results.

Example:

An enterprise has internal reporting tools, data cleaning tools, and document generation tools. Through CLI-Anything, these toolkits can be installed into a unified CLI, allowing the Agent to automatically generate reports, export files, and check results instead of relying on people to click on the page every time.

4.2 Replace Fragile RPA / GUI Automation

Suitable for customers:

-Currently using RPA, recording screen clicks, browser automation scripts.

-Automation often fails because of interface changes.

-Desiring to reduce maintenance costs and increase mission success.

Pre-sales words:

The problem with RPA is not that it cannot be done, but that it is expensive to maintain for a long time. The CLI-Anything idea is to change "where to point" into "what command to call" and change UI results into JSON and file verification, which is more suitable for Agent and production environment.

4.3 Ecological Construction of AI Agent Tools

Suitable for customers:

-An in-house Agent platform is being built.

-Need to provide a stable toolset for the Agent.

-Want to turn multiple business systems, open source tools, and desktop software into callable tools.

The value of the CLI-Anything is that it is not only for one Agent, but also makes the tool itself Agent-native. As long as the output of CLI and SKILL.md is stable, similar capabilities can be reused in compatible environments such as Claude Code, Codex, OpenClaw, OpenCode, Hermes, and Reasonix.

4.4 content production automation

Suitable scenarios:

-Automatic PPT/Document/PDF generation.

-Automatically make video clips, subtitles, covers.

-Automatically render 3D product drawings.

-Automatically generate flowcharts, architecture diagrams, training materials.

CLI-Anything has an advantage in this kind of scenario because it tends to call real software to export real results, rather than just generating an intermediate file that looks like it.

4.5 Software Vendors Make Agent-ready Transformation

For software vendors, CLI-Anything can be used as a product inspiration:

-Complements the CLI layer for your own software.

-Provides JSON output and a verifiable state.

-Provide SKILL.md or Agent instructions.

-Provide end-to-end task samples and test suites.

This kind of capability may become an important indicator of whether the software is suitable for Agent use in the future.

5. Not suitable for the scene

Not suitable for the sceneReasonSuggestion
The target software is completely closed source and has no script/API/file format entryThe CLI-Anything mainly depends on the source code, real backend, project files or existing CLIFirst evaluate whether there is an official API, SDK, MCP or automation interface
Only want to do one-time simple web page clickMay be faster with Playwright/RPANo need to generate full harness for one-time tasks
The customer lacks a strong model and engineering teamREADME explicitly mentions the need for a strong base model, and a weak model may generate an incomplete CLIThe implementation team builds it first and then delivers it to the maintenance specification
The target software relies on complex GUI states and has no stable backendIt is difficult to ensure real output and test stabilityDo technical feasibility evaluation first
High compliance and security requirements but lack of review processAuto-generated CLI may touch file writes, system calls, real software executionInclude code review, permission control, sandboxing, and auditing

6. Architecture and Key Design

! CLI-Anything Architecture

6.1 recommended pre-sales understanding architecture

flowchart LR Agent["AI Agent"] --> Skill["SKILL.md / --help
能力发现"] Agent --> CLI["cli-anything-
JSON + Human 输出"] CLI --> State["Session / Project State
undo / redo / history"] CLI --> Native["原生项目文件
ODF / SVG / MLT / bpy / JSON"] Native --> Backend["真实软件后端
LibreOffice / Blender / GIMP / ffmpeg"] Backend --> Artifact["真实产物
PDF / 图片 / 视频 / 音频 / 报告"] CLI --> Tests["Unit + E2E + Subprocess Tests"]

6.2 Core Principles

PrinciplesMeaningWhy is important
Real software is hard to rely onDo not replace real software with toysEnsure that the product is consistent with the customer's real workflow
Dual interactive modeREPL subcommandsBoth agent long sessions and script automation
JSON outputEach command supports '-- json'Agent can parse stably
Self-describing interfaces'-- help', 'which', SKILL.mdAgents can discover and learn tools
Strong testSingle test real backend E2E CLI sub-process testAvoid fake automation that "looks like it can run"
No graceful degradationFailure if the backend is missing and installation instructions are givenPrevent the production environment from secretly generating wrong results

How to use #7.

7.1 as a CLI-Hub user

pip install cli-anything-hub

cli-hub list
cli-hub search diagram
cli-hub info drawio
cli-hub install drawio
cli-hub launch drawio

Suitable for presentation and quick verification.

7.2 as a Claude Code plug-in user

/plugin marketplace add HKUDS/CLI-Anything
/plugin install cli-anything
/cli-anything ./gimp
/cli-anything:refine ./gimp "batch processing and filters"

7.3 as a Codex Skill user

git clone https://github.com/HKUDS/CLI-Anything.git
bash CLI-Anything/codex-skill/scripts/install.sh

Then natural language trigger in Codex:

Use CLI-Anything to build a harness for ./gimp
Use CLI-Anything to refine ./shotcut for picture-in-picture workflows
Use CLI-Anything to validate ./libreoffice

7.4 using the generated CLI

cd /agent-harness
pip install -e .

which cli-anything-
cli-anything- --help
cli-anything-
cli-anything- --json 

Example: Generate a PDF LibreOffice the CLI.

cli-anything-libreoffice document new -o report.json --type writer
cli-anything-libreoffice --project report.json writer add-heading -t "Q1 Report" --level 1
cli-anything-libreoffice --project report.json writer add-table --rows 4 --cols 3
cli-anything-libreoffice --project report.json export render output.pdf -p pdf --overwrite
cli-anything-libreoffice --json document info --project report.json

8. What can I say before sales

8.1 for business

The value of CLI-Anything is to let AI really use the tools that the enterprise has, rather than just giving advice in the chat window. It can turn documents, designs, videos, diagrams, development tools and other software into commands that can be called by the Agent, thus pushing "AI can say" to "AI can do".

8.2 for Technical Leader

It is not traditional RPA, nor is it a screen click script. It emphasizes structured commands, explicit state, JSON output, real back-end execution, and end-to-end validation. For the enterprise Agent platform, this kind of tool interface is easier to maintain, monitor and reuse than GUI click.

8.3 for CIO/Management

Enterprises already have a lot of software assets. CLI-Anything provides a low-intrusive agentization path: start with open source or internal tools, package high-frequency processes into testable CLI, and then gradually incorporate them into the enterprise agent platform. This reduces repetitive manual operations and avoids refactoring all systems at the beginning.

9. Frequently Asked Customer Questions

Is it RPA?Not traditional RPA. RPA typically simulates a human clicking GUI;CLI-Anything emphasis generates a structured command interface for the software, invokes the real backend and validates the output.
Can it handle closed source commercial software?If the commercial software has API, scripting interface, CLI, editable project files or MCP services, there is an opportunity; if there is only a black box GUI, the difficulty will increase significantly.
Is the generated CLI reliable?Depends on target software complexity, base model capability, and test quality. The project methodology requires real back-end E2E testing and output verification, but requires manual review before production.
Will it replace the official software API?Not necessarily. When the official API is of high quality, it should be used first. CLI-Anything is more suitable for combining API, project file and real software back end into Agent-friendly workflow.
Is it suitable for enterprise internal tools?Suitable, especially for internal tools with source code or stable scripting interfaces. The internal toolkit can be installed as a unified CLI and then called by the agent.
What are the security risks?CLI can read and write files, call real software, and execute external commands, so it must have permission boundaries, audit logs, sandboxes, and code reviews.
Why not use GUI Agent directly?GUI Agent is suitable for temporary operation and non-interface systems, but on long-term production tasks, structured CLI is more stable, testable and replayed.

10. PoC Recommendations

10.1 PoC Topic Selection

It is recommended to choose a "real, valuable, but clearly bounded" task:

-Automatically generate a PDF of customer weekly report.

-Automatically generate a short video from footage.

-Automatically generate a system architecture diagram.

-Automatically download, transcribe and archive meeting recordings.

-Automatically drive the internal reporting tool to export data.

It is not recommended to choose the whole ERP, the whole CAD or the whole office process at the beginning.

10.2 PoC Phase

PhaseWorkAcceptance
Phase 1: Feasibility AssessmentSee if the target software has source code, CLI, API, project file or script interfaceFind a real back-end portal
Phase 2: Minimum harnessGenerate 3-5 core commandsAgent can complete a closed-loop task
Phase 3: Real OutputCall Real Software to Generate PDF/Picture/Video/ReportProduct Openable, Checker, Correct Format
Phase 4: Test ReplenishmentSingle Test, E2E, CLI subprocess TestAutomated Test Passed
Phase 5: Agent IntegrationGenerate SKILL.md and connect to the enterprise Agent platformAgent can self-discover and call
Phase 6: Security ReinforcementPermissions, Audit, Sandbox, Error HandlingMeeting Enterprise Online Requirements

10.3 Suggested Indicators

MetricsMeaning
Task Success RateWhether the same task is repeated stably
End-to-end time consumptionHow much faster than manual/RPA
Failures DiagnosticabilityFailure with explicit errors and logs
Output AccuracyWhether the file format, content, pixel/audio/duration are up to standard
Number of overwritten commandsDegree of coverage of target business process
Maintenance costHow many commands/tests need to be changed after software upgrade
Agent token consumptionWhether JSON/CLI is less expensive than bare APIs or GUI observations

11. Risks and Considerations

11.1 have requirements for model capability

The limitations section of the README explicitly mentions that it relies on strong base models to reliably generate harness; weaker models may generate incomplete or erroneous CLI, requiring extensive manual correction.

Don't promise to "generate all software CLI with one click and no manual work" before selling ". It would be safer to say:

CLI-Anything provides automated generation and methodology, production quality still requires code review, test completion, and iterative refinement.

11.2 source code and back-end entry to determine the upper limit

If the target software has clear source code, project file format, official CLI, script interface, the success rate will be much higher. On the contrary, if there is only a closed-source GUI and no API/SDK/file format documentation, the CLI-Anything's playing space will be limited.

11.3 auto-generated code requires security review

Enterprise scenarios must consider:

-Whether sensitive files will be accessed.

-Whether the external command will be executed.

-Whether production data will be written.

-Whether permission isolation is required.

-Whether there is an audit log.

-Whether to be able to rollback and replay.

11.4 testing is the key to success or failure

CLI-Anything place great emphasis on testing, which should be made clear in pre-sales. Without real backend E2E and output verification, the generated CLI can easily become a demo that "looks like it can run.

12. My Pre-Sales Judgment

The strategic significance of the CLI-Anything is greater than the individual tools themselves.

It captures a key problem in Agent landing: large models can already reason and plan, but the real work of the enterprise is precipitated in existing software. If the Agent can only look at the screen and click, reliability is difficult to enter production. If each system develops API separately, the cost is too high. CLI-Anything provides a third way: to package software capabilities as Agent-native CLI.

For customer types of focus:

  1. Customers who are building an enterprise Agent platform.
  2. Technical customers with a large number of in-house tools, scripts and desktop software.
  3. Customers with high RPA maintenance costs and want to upgrade to Agent automation.
  4. Customers with strong demand for content production, document generation, design/video/chart automation.
  5. Software vendors want their products to be more easily used by AI Agents.

Pre-sales positioning recommendations:

don't talk about CLI-Anything as "universal one-key automation artifact", but as "methodology and engineering framework for Agent tooling and software Agent-native transformation". Its value lies in structured, testable, reusable and accessible Agent platform.

13. REFERENCE

-GitHub repository:HKUDS/CLI-Anything

-English README:README.md

-Chinese README:README_CN.md

-Methodology Handbook:HARNESS.md

-Technical Report:CLI-Anything: Towards Agent-Native Computer Use

-CLI-Hub:https://hkuds.github.io/CLI-Anything/

-Teaser diagram:teaser.png

-Architecture diagram:architecture.png

information verification date: 2026-06-30. GitHub API anonymous access triggers stream limiting, so this note is not written into real-time stars/forks; The project status, installation method, number of tests and restriction instructions are mainly based on the official README, Chinese README, HARNESS.md and arXiv summaries.