← Back to Project List
Pixelle-Video is AIDC-AI open source AI full-automatic short video engine. positioning is "input a theme to automatically complete copy, map/video, voice, BGM and template synthesis". It is most suitable for demonstrating low-threshold content production lines to customers, especially for scenes such as knowledge popularization, marketing short videos, digital population broadcasting, and graphic videos. It can be described as an "AIGC video factory prototype with replaceable models and workflows" before sales, but production-level landing still focuses on evaluating material copyright, generation quality, audit compliance, cost and batch stability.

1. Project Overview

DimensionInformation
ProjectAIDC-AI/Pixelle-Video
PositioningAI automatic short video engine
Main LanguagePython
Open Source LicenseApache-2.0
Created2025-11-07
Recently pushed2026-06-14
GitHub Hot2026-06-30 Query: About 23.9k stars, 3.4k forks, 140 open issues
Run portalWindows integration package or 'uv run streamlit run web/app.py'
Default interfaceStreamlit Web UI, default http:// localhost:8501

The core of the Pixelle-Video is not a single model, but a short video production pipeline. The official README summarizes the process as "copy generation-> layout planning-> frame-by-frame processing-> video composition", and splits LLM, image/video generation, TTS, BGM, template, size and other capabilities into configurable modules in WebUI.

Key Schematic Diagram:

! Pixelle-Video Web UI

! Pixelle-Video Flowchart

2. What does it mostly do?

CapabilitiesDescriptionsPre-Sales Value
Topic to VideoAutomatically generate commentary, split mirror, map/video, voice and film after inputting topicQuickly prove that "content production automation" is feasible
Fixed copy generationUse existing copy to skip AI writing and directly enter dubbing and picture generationSuitable for existing content library, course manuscript and marketing manuscript of enterprises
Image/video generationSupports ComfyUI, RunningHub, and direct-connected APIs such as DashScope, OpenAI, ARK, Kling, etc.Can replace the underlying capabilities according to the customer's existing model supplier
TTS/timbresupports workflows such as Edge-TTS and Index-TTS, and some modes support reference audiocan demonstrate brand oral broadcasting, explanation audio and sound cloning
Template systemTemplates such as 'static_.html', 'image_.html', and 'video_*.html'You can customize the style of enterprise brand videos
Multi-size OutputVertical Screen, Horizontal Screen, Square, etc.Covering Channels such as Chattering/WeChat Channels/Little Red Riding Book/official website/Large Screen, etc.
extension moduledigital population broadcast, graphic video, action migration, custom materialextension demo suitable for "AI content middle stage"

3. Applicable Scenario

ScenarioFitTypical customer
Enterprise Short Video Mass ProductionGaoMarketing Department, New Media Team, E-commerce Operation
Knowledge Science Popularization/Training VideoHighEducation, Enterprise Training, Government and Enterprise Publicity
Digital Population Broadcast demoMedium and HighBrand Promotion, Customer Service Training, Sea Content
Local AIGC Workflow DemoHighCustomers with privatization, model replacement, workflow orchestration needs
Serious commercials/film and television-level productionMedium and lowTeams with extremely high requirements for lens language, aesthetic consistency and copyright review
High Concurrent SaaS Production PlatformMediumRequires Secondary Development Task Queue, Billing, Review, Permission, Monitoring

The most recommended way to cut in before sales is not to package it as "mature editing SaaS", but to package it as "AIGC video pipeline reference implementation that can be quickly landed". If the customer asks "can we automatically turn the enterprise knowledge base, commodity library and marketing copy into short videos", the Pixelle-Video is very suitable for visual PoC.

4. Not quite the scene

Unsuitable pointCause
has strong director control over the aesthetics of the filmthe project is more automatic assembly line, fine lens scheduling and manual post-production still need professional tools
High requirements for compliance reviewGenerated content requires additional access to sensitive words, copyrights, portrait rights, material sources and content review
Large-scale commercial concurrencyThe official focus is WebUI/local workflow, and production queues, flexible resources, and failure compensation need to be self-built
Customer has no model budgetLocal Ollama ComfyUI is available, but actual performance and speed depends on local hardware and model capabilities

5. Architecture and Integration Understanding

Pixelle-Video can be split into five layers:

  1. Content planning layer: LLM generates scripts, mirrors and prompt words according to themes or fixed copy.
  2. Media generation layer: call ComfyUI/RunningHub or directly connect image and video models.
  3. Voice layer: TTS workflow generates narration, supports reference audio and multi-language timbre capability.
  4. Template rendering layer: HTML template determines screen layout, subtitles, background and media presentation.
  5. Video synthesis layer: rely on tools such as ffmpeg to generate the final video file.

When explaining to customers, the emphasis is on "every layer can be replaced": enterprises can use their own big model, their own cloud vendor media model, their own template system, their own audit system. This is also the place where it has more pre-sale value than closed short video tools.

How to use #6.

Windows users can directly download the officially release one-click integration package, run' start.bat' after decompression, and the browser open' http:// localhost:8501 '.

Source code way:

git clone https://github.com/AIDC-AI/Pixelle-Video.git
cd Pixelle-Video
uv run streamlit run web/app.py

First-time use needs to be configured in the WebUI:

Configuration ItemRole
LLM APIGenerate copy, mirror, and prompt words
ComfyUI / RunningHubGenerate image, video, or voice from a workflow
API Media ModelsConnect OpenAI, DashScope, Seedream, Seedance, Kling, and more
TTS WorkflowSelect Edge-TTS, Index-TTS, or Custom Voice Flow
Template and sizeSelect vertical screen/horizontal screen/square template to determine the output style

7. What can I say before sales

One-sentence positioning:

"Pixelle-Video is an open source AI short video production line, which can automatically convert themes, scripts or enterprise materials into videos with pictures, narration, BGM and template packaging."

Customer Value:

Customer Pain PointsPixelle-Video Value
Short video production depends on editing manpowerAutomatically disassemble copy, picture matching, dubbing and synthesis, reducing the cost of the first edition
High frequency of multi-platform content updatesIdeal for fast generation of multi-version, multi-size, multi-topic content
Enterprises want to reuse existing model assetsSupport multi-model vendor and ComfyUI workflows to facilitate access to existing AI bases
Brand vision should be unifiedTemplate mechanism can be used as enterprise fixed format, subtitles and visual style
Hope to be privatized or demonstrated locally.Python Streamlit ffmpeg has a clear structure, which is convenient for PoC and second opening.

8. Demo/PoC Suggestions

PoC ItemAcceptance Method
Theme to VideoGenerate vertical screen video for 5 real business themes, evaluate copy availability and generation duration
Fixed copy to videoUse the customer's existing training/marketing copy to test the picture matching and subtitle accuracy
Brand TemplateCreate an enterprise template to verify whether Logo, color and subtitle specifications are controllable
Model replacementConnect to the local ComfyUI and cloud media APIs respectively to compare cost, speed, and quality
Review processAccess the manual confirmation or content review node to evaluate the wind control closed loop before release

Suggested indicators:

IndicatorDescription
Time taken to generate a single videoStatistics by 30 seconds, 60 seconds and 90 seconds respectively
One-pass rate of slicesThe proportion that can be used without manual regeneration
Single costLLM, image/video, TTS, cloud computing cost disassembling calculation
Template reuse efficiencySpeed of applying enterprise templates to new themes
Labor savingsFirst edition production time compared with traditional editing process

9. Frequently Asked Customer Questions

Can it be directly commercialized?The code is Apache-2.0, but commercialization depends on the authorization and compliance of the access model, TTS tone, material and generated content.
Can it be privatized?Yes, the local ComfyUI/Ollama/ffmpeg route is suitable for the implementation of the PoC; the production needs to supplement the queue, permissions, logs, and review.
Is the effect stable?The stability of the pipeline depends on the model service, network, prompt words, templates, and retry mechanism, and needs to be tested with real business samples.
Can you be a digital person?The official has the extended display of digital population broadcast, but the specific mouth shape, image consistency and copyright authorization need to be verified separately.
and MoneyPrinterTurbo is the difference?Both can do short video automation. More emphasis Pixelle-Video be placed on the atomic capability combination of ComfyUI/RunningHub/direct media APIs and WebUI configuration.

10. Risks and Considerations

  1. Content compliance: AI copy, images, videos, and dubbing may produce factual errors, sensitive content, or copyright disputes.
  2. Model dependency: The project itself is an orchestration tool, and the final quality is highly dependent on LLM, image model, video model and TTS.
  3. The cost is not negligible: the cost and delay of the video generation model may be much higher than that of text generation.
  4. Insufficient production engineering capacity: large customers usually need task queue, user rights, material library, audit flow, failure retry and monitoring.
  5. Open source projects are updated quickly: 2025-12 to 2026-06 are updated frequently, and the version should be fixed and dependencies should be managed during the second opening.

11. My Pre-Sales Judgment

Pixelle-Video is very suitable for customer demonstration of "AI content production automation", especially when customers have clear requirements for short video operation, knowledge popularization, education and training, and mass production of marketing materials, it can quickly turn abstract AIGC capabilities into films that can be seen.

Its greatest value lies in the "assembly line assembly ability", rather than a single point model effect. It is suggested to use it as a PoC prototype when promoting pre-sales: generate 3-5 videos with the real theme of the customer, so that the customer can see the improvement of process, cost and human efficiency, and then discuss whether to turn it into an enterprise-level system. Don't start by promising film-and-television quality or large-scale automated releases.

12. REFERENCE

-GitHub: https://github.com/AIDC-AI/Pixelle-Video

-Official Document: https://aidc-ai.github.io/Pixelle-Video/zh

-Releases: https://github.com/AIDC-AI/Pixelle-Video/releases

-License: https://github.com/AIDC-AI/Pixelle-Video/blob/main/LICENSE