Trusted Local AI Actor Teams — Part 2: Codebase Gap Analysis & Business Case
With the vision set, which existing open-source foundation should we build on? Part 2 assesses PraisonAI, ChatDev 2.0 and Magec against the requirements, recommends a foundation, and lays out the gaps, architecture and business case.
This is Part 2 of a two-part series. It builds directly on the vision and requirements in Part 1: Trusted Local AI Actor Teams — Part 1: The Vision & Requirements.
A codebase and documentation assessment, shared for discussion — not a penetration test, production-readiness certification or guarantee of API stability. 3 July 2026.
Recommendation Use a pinned and independently testable fork of PraisonAI as the orchestration and agent-execution kernel. Build the actual product as a separate local-first Actor Team Control Plane around it. Borrow ChatDev’s visual workflow and human-review ideas, and Magec’s approachable self-hosted administration patterns, but do not adopt either repository unchanged as the full product foundation.
The decision is not that PraisonAI already is the envisioned product. None of the three projects provides the complete combination of versioned AI and Human Actors, explicit role boundaries, Agile work decomposition, Kanban task ownership, long-running watchdogs, trusted data egress, full prompt transparency, local organisational memory and enterprise-grade governance. The selection question is therefore: which project supplies the hardest reusable execution primitives while leaving us free to own the product, user experience and values?
PraisonAI is the strongest technical starting point because its documented capabilities already cover per-agent models, local and cloud providers, agent hand-offs, routing, loops, evaluator-optimiser patterns, persistent sessions, checkpoints, scheduling, background tasks, policy rules, human approval hooks, sandboxing, telemetry and cost-aware model routing. These are difficult foundations that would otherwise consume a large part of the build. Its MIT licence also permits a genuinely local and client-controlled product. [P1-P5]
ChatDev 2.0 is the strongest conceptual and visual runner-up. It most clearly demonstrates agents communicating through a visible workflow, supports human nodes, review loops, shared memory attachments, subgraphs and local Docker deployment. It would be the fastest route to a compelling visual demonstration. However, it is weaker as the long-term core because the current product model is a workflow canvas rather than a durable organisational control plane. Cost governance, scheduled watchdogs, first-class project/task management, Actor lifecycle, access control and production-grade resumability would still require substantial new backend work. [C1-C3]
Magec is attractive but currently too immature to carry the product vision. Its Go/Vue application is easy to self-host and already provides per-agent models, prompts, tools, visual flows, cron triggers and local memory services. Its gaps are more fundamental: no first-class Human Actor workflow, weak documented multi-user governance, isolated agent memories rather than a designed shared project memory, no cost-aware model router, no Agile work plane and a much smaller contributor and adoption base. [M1-M7]
What we would actually build
- An Actor Registry for versioned AI Actors and Human Actors, including responsibilities, exclusions, escalation rules, model, tools, memory, authority and budget.
- A Project and Work Plane for outcomes, epics, stories, tasks, dependencies, acceptance criteria, Kanban states and human work queues.
- A Trusted Model Plane that chooses local or cloud models by capability, price, privacy and policy, while minimising and recording what leaves the client environment.
- A durable orchestration service that can pause, resume, schedule, redirect and audit long-running work, including architecture, scope, quality, security and cost watchdogs.
- A Prompt and Trace Laboratory showing the complete rendered request, inserted memory, tool definitions, raw output, tool calls, cost, review decisions and course corrections.
Decision status: Recommended foundation subject to a short technical spike. This is a repository and documentation assessment, not a penetration test, production readiness certification or guarantee of API stability.
Comparative result
| Candidate | ** Weighted fit** | ** Best use** | ** Decision** |
|---|---|---|---|
| PraisonAI | ** 84/100** | Best execution kernel | Strongest reuse of difficult backend primitives; weak unified product UX. |
| ChatDev 2.0 | ** 78/100** | Best visual runner-up | Closest visible team-workflow experience; more backend control-plane work. |
| Magec | ** 65/100** | Promising prototype shell | Approachable and local, but material maturity, memory and governance gaps. |
The score measures documented fit against the requirements, not general popularity. It weights local open-source control, Actor/team semantics, workflow durability, model and cost routing, human participation, memory, auditability, visual usability, extensibility and maturity.
The product opportunity
The product can be understood as a “Virtual Private AI Team”. A virtual private cloud allowed organisations to use shared infrastructure while retaining their own networks, policies and direction. This platform would allow organisations to use local models and upstream cloud models while retaining their own orchestration, Actor personas, knowledge, approvals, values, audit record and ability to change providers.
For enterprises, the commercial value is vendor optionality, lower and measurable model expenditure, controlled data disclosure, reusable organisational knowledge, auditable human oversight and a practical route from disconnected AI experiments to managed delivery teams. Revenue can come from discovery, private deployment, Actor-team design, workflow engineering, integrations, model benchmarking, governance, training, support and managed operation inside the client’s own environment.
For community groups, not-for-profits and individuals, the same open platform can provide access to a structured “ten-person team” of specialised Actors without handing ownership of their data, prompts or accumulated knowledge to a proprietary platform. Enterprise revenue can cross-subsidise pro bono deployments, community templates, training clinics and sponsored model usage. The social purpose is capacity building rather than dependency.
Broad work division
| Lead | ** Primary workstream** | ** Early deliverables** |
|---|---|---|
| Geoff | Product vision, Actor/persona model, Human Actor experience, ethics, community and enterprise discovery. | Reference requirements; Actor schema; use cases; human approval patterns; social-enterprise offer. |
| Steve | Execution kernel, model/provider integration, durable jobs, policies, sandboxing, deployment and technical assurance. | PraisonAI spike and fork strategy; provider gateway; checkpoint test; security baseline; Docker deployment. |
| Davin | Control-plane application, visual workflow, Kanban/task experience, dashboards, trace views and integration APIs. | Actor registry UI; project board; run monitor; prompt/cost trace; human work inbox. |
Immediate decision gate
- Run the same six-to-ten Actor reference project through PraisonAI, ChatDev and Magec.
- Prove mixed local/cloud models, reviewer loops, human approval, persistent shared state, restart/resume, scheduling, cost capture and exact prompt tracing.
- Fork and pin PraisonAI only if those tests pass without a mandatory hosted service. Place all upstream calls behind our own compatibility interface so the kernel can later be replaced.
Bottom line PraisonAI should be treated as replaceable open-source infrastructure, not as the product identity. The durable value we create is the Actor model, trusted control plane, reusable team designs, human governance, community knowledge and implementation capability.
Contents and reading guide
-
Purpose, scope and conclusion
-
Requirements used for assessment
-
Evaluation method and weighted result
-
Candidate overview: PraisonAI
-
Candidate overview: ChatDev 2.0
-
Candidate overview: Magec
-
Comparative requirements gap matrix
-
Detailed gap analysis for the recommended PraisonAI foundation
-
Why the other candidates are not the primary starting point
-
Other codebases reviewed and why they were not selected
-
Recommended target product architecture
-
Delivery backlog and division of work
-
Proof-of-concept and decision gates
-
Business case and social-purpose model
-
Benefits by stakeholder
-
Risks, ethics and governance
-
Recommended next steps
Appendix A. Detailed scoring
Appendix B. Source register
How to use this document Pages 1 and 2 contain the decision and business rationale. Sections 4 to 10 explain the evidence. Sections 11 to 13 translate the gaps into architecture and delivery work. Sections 14 to 16 connect the technical choice to enterprise value, community benefit and ethical governance.
1. Purpose, scope and conclusion
This report assesses three open-source projects as possible foundations for the Trusted Local AI Actor Team Orchestration Platform described in the requirements document: PraisonAI, ChatDev 2.0 and Magec. It also briefly revisits other projects reviewed earlier to explain why they are better treated as components, inspirations or later integrations than as the primary codebase.
The assessment focuses on the full product vision rather than on whether a repository can run a simple multi-agent demonstration. The intended product must organise specialised AI and Human Actors into accountable teams, decompose work, enforce role boundaries, use different local and cloud models, support long-running critique and course correction, retain local knowledge, expose prompts and costs, and remain usable without an indispensable proprietary control plane.
Conclusion Select PraisonAI as the provisional execution kernel, behind an internal compatibility layer and subject to a technical spike. Build a new application control plane and domain model rather than attempting to stretch PraisonAI’s current user interfaces into the complete product. Keep ChatDev 2.0 as the main visual and interaction reference, and Magec as a secondary reference for simple self-hosted administration and scheduled clients.
This recommendation deliberately separates the foundation from the product. PraisonAI would provide reusable orchestration primitives. The product’s identity and defensible value would come from the Actor definition, trusted model gateway, project and Kanban layer, Human Actor experience, local memory and audit model, prompt laboratory, watchdog patterns, templates and implementation expertise.
The recommendation is provisional because documentation is not a substitute for operating the code. Before an irreversible commitment, the team should complete the reference implementation described in Section 13 and verify licence, dependency, security, state recovery and provider behaviour directly.
2. Requirements used for assessment
The executive requirements were converted into ten assessment themes. These are not generic agent-framework criteria. They reflect the specific aim of creating a locally controlled organisational system in which AI and humans act through defined roles and governance.
| Assessment theme | ** Meaning for this product** |
|---|---|
| Open-source and local control | All indispensable orchestration, scheduling, task state, prompt inspection, memory, approvals, cost tracking, export and recovery must run under a recognised open-source licence without a compulsory hosted control plane. |
| First-class Actors and teams | AI Actors and Human Actors require persistent identities, responsibilities, exclusions, escalation rules, models, tools, permissions, memory scopes, budgets and review relationships. |
| Work decomposition and orchestration | The system must translate outcomes into epics, stories, tasks and dependencies, then execute sequential, parallel, conditional, looping, scheduled and human-gated work. |
| Model and provider optionality | Different Actors must use different local or cloud models. Selection should consider capability, cost, privacy, context, latency and reliability, with fallback and provider independence. |
| Human participation | Humans must be able to own work, provide experience, approve, reject, modify, pause and redirect AI activity rather than merely respond to a terminal prompt. |
| Memory and learning | Project, role, organisation and task memory must be separated, locally retained, permissioned, versioned and available for improving Actor personas and workflows. |
| Transparency and audit | The product should reveal the complete rendered request, inserted context, tool definitions, raw output, tool actions, costs, evaluations, reviews and changes of direction. |
| Trusted operation and security | Data egress, secrets, filesystem and tool permissions, code execution, network access and destructive actions require explicit policy and containment. |
| Product usability | A practical control plane needs an Actor registry, team builder, project/Kanban view, visual workflow, run monitor, human inbox, cost dashboard and export. |
| Maturity and maintainability | The base should have sufficient activity, documentation, tests, contributors and architectural clarity to justify extension, while being isolated from upstream churn. |
3. Evaluation method and weighted result
Each candidate was scored from 0 to 5 against the ten themes, then multiplied by the weighting below. The result is an architectural selection aid, not a claim that one framework is objectively superior for every purpose. A visual low-code prototype and a durable multi-tenant product would naturally weight criteria differently.
| Criterion | ** Weight** | ** PraisonAI** | ** ChatDev 2.0** | ** Magec** |
|---|---|---|---|---|
| Open source and local control | 15 | 5.0 | 5.0 | 5.0 |
| Actor and team semantics | 15 | 4.0 | 4.0 | 4.0 |
| Workflow capability and durability | 15 | 4.5 | 3.5 | 3.5 |
| Multi-model and cost routing | 10 | 5.0 | 3.0 | 2.5 |
| Human Actor participation | 10 | 3.0 | 4.0 | 1.5 |
| Memory and persistence | 10 | 4.5 | 3.5 | 2.5 |
| Prompt transparency and audit | 10 | 3.5 | 3.5 | 2.5 |
| Visual/product usability | 5 | 2.5 | 5.0 | 4.5 |
| Extensibility and integrations | 5 | 5.0 | 4.0 | 3.5 |
| Maturity and community | 5 | 3.5 | 3.5 | 1.0 |
| Weighted total | ** 100** | ** 84** | ** 78** | ** 65** |
Scores are based on official repository and documentation evidence available on 3 July 2026. Missing documentation was treated as a gap or uncertainty rather than assumed functionality.
4. Candidate overview: PraisonAI
PraisonAI is an MIT-licensed agent framework and ecosystem spanning Python, JavaScript and Rust components. It presents itself as a way to operate agents and agent teams with memory, workflows, tools and broad model-provider support. Its official material documents local Ollama and vLLM access alongside major hosted providers, and lists workflow patterns such as routing, parallel execution, loops, evaluator-optimiser cycles and orchestrator-worker designs. [P1]
Why it maps well to the envisioned execution layer
- Per-Actor model choice: agents can be configured separately, and the model router supports task-based, cost-optimised and performance-oriented selection with fallbacks and provider health patterns. [P2]
- Long-running primitives: sessions can auto-save across restarts; workflow checkpoints, background tasks and scheduling are documented; repeated work and evaluator loops are supported. [P1, P4]
- Control hooks: a policy engine can allow, deny or request approval for tools, while human approval examples support approve, deny or modify decisions. [P3, P5]
- Operational instrumentation: OpenTelemetry, cost tracking, tracing integrations, model profiles, rate limits, thinking budgets and context compaction are documented. [P1, P6]
- Broad extensibility: MCP, A2A, custom tools, external coding agents, database persistence, sandboxing and multiple model providers reduce the amount of low-level integration code required. [P1]
Where it falls short as a product
- The ecosystem contains several interfaces and integrations rather than one coherent organisational control plane. A framework, CLI, dashboard, chat UI and Langflow integration do not by themselves create the intended Actor Team product.
- Agents are technical runtime objects, not yet the complete versioned Actor records required by the vision. Responsibilities, explicit exclusions, authority, escalation contracts, team membership, human equivalence and organisational lifecycle need a separate domain model.
- Human approval is primarily documented as a control around tool use. The product needs Human Actors with work ownership, queues, comments, lived-experience testing, acceptance decisions and accountability across the project lifecycle.
- PraisonAI does not supply the required project hierarchy, Agile planning model or Kanban board. It can execute a workflow but does not natively manage why a task exists, which outcome it supports, its dependencies, acceptance criteria or portfolio status.
- Prompt and trace data can be instrumented, but a dedicated local prompt laboratory that reconstructs the exact provider request, shows all injected memory and compares Actor versions would still need to be built.
- The very rapid release cadence is both a strength and a risk. A production product should not directly bind its database or UI to changing framework internals.
Evidence note: PraisonAI’s official repository lists persistent sessions, model routing, checkpoints, telemetry, policy, background tasks, sandbox execution, human approval, cost tracking and scheduling. Its advanced multi-provider documentation describes different LLMs per agent and cost/performance routing. [P1-P6]
5. Candidate overview: ChatDev 2.0
ChatDev began as a virtual software company in which named roles communicated through staged development, testing and review. ChatDev 2.0, released in January 2026, broadens that concept into a zero-code multi-agent platform with a FastAPI backend, Vue web console, YAML workflows and Apache 2.0 licensing. The original software-company metaphor remains highly relevant to the Actor Team vision even though the current platform is general purpose. [C1]
Why it is the closest visible representation of the idea
- The visual canvas directly represents multi-agent work rather than hiding it behind one chat window.
- Agent nodes can specify provider, model, base URL, prompts, tools, thinking and memories. Human nodes can pause the workflow and loop work back until the reviewer accepts it. [C2]
- Workflows support conditions, edge processors, loop counters and timers, subgraphs, parallel map/tree execution and shared memory declarations. [C2, C3]
- The web interface exposes real-time execution, intermediate artefacts and context snapshots, making it a strong reference for transparent inter-Actor communication. [C1-C3]
- The Apache 2.0 repository can be run locally through Docker Compose and is modular enough to extend with providers, nodes and tools. [C1]
Why it is not the recommended core
- Its central abstraction is still the workflow graph. The envisioned product needs a persistent organisation containing Actors, teams, projects, policies, budgets and memories that can participate in many workflows over time.
- There is no documented native Agile project and task plane, cost-aware model catalogue, enterprise budget governance or local portfolio dashboard.
- A human node is useful, but it does not yet equal a Human Actor with profile, role, workload, authority, notifications and responsibility across multiple projects.
- The official workflow guide documents session context snapshots and loop guards, but durable queue semantics, restart-safe jobs, distributed workers and operational recovery need hands-on verification and likely extension.
- The platform’s recent transformation means its large historical popularity should not be mistaken for years of production experience with the 2.0 architecture.
Evidence note: The official workflow guide documents agent and human nodes, per-node provider settings, shared memory, conditions, subgraphs, loop guards and map/tree execution. Context snapshots can be inspected in the UI and local session warehouse. [C1-C3]
6. Candidate overview: Magec
Magec is an Apache 2.0 self-hosted platform written primarily in Go and Vue. It packages an Admin UI, Voice UI, agents, visual flows, local and cloud backends, MCP tools, Redis session memory, PostgreSQL/pgvector long-term memory, webhooks and cron clients. Its one-line local installation and “your server, your data, your rules” positioning align strongly with the local-first aspiration. [M1]
Why it is attractive
- An agent can have its own backend, model, system prompt, memory, tools and output key, which closely resembles an early AI Actor profile. [M2]
- The visual flow editor supports sequential, parallel, looped and nested steps, and scheduled commands can trigger an agent or flow through cron. [M3, M5]
- Its single self-contained application and local Docker Compose deployment provide a clearer starting shell than a code-only framework. [M1]
- The Go server, Vue interface, REST API and MCP support may appeal to a team wanting a compact, deployable product footprint.
Why it is not yet a safe primary foundation
- The project is very young and has a much smaller contributor and user base. This increases bus-factor, upgrade, security-review and long-term maintenance risk.
- The documented memory design isolates each agent’s long-term memory. That protects separation, but the product vision requires controlled shared project memory, organisational memory and role learning across Actors. [M2, M4]
- No first-class Human Actor or human approval node is documented. Human participation would need to be designed across the API, workflow engine, database and interface.
- There is no documented model router based on cost, capability, data policy or budget, and no project/Kanban control plane.
- The current administration model is comparatively simple. A production service requires multi-user identity, roles, client tenancy, granular permissions, audit history and secure secret management. [M6, M7]
- Voice and chat integrations are useful but are not central to the initial product. They do not compensate for missing governance and durable work-management foundations.
Evidence note: Magec’s official repository and documentation describe per-agent models and tools, visual flows, local Docker deployment, Redis/PostgreSQL memory, cron clients and administrative configuration. The maturity and missing-feature judgements are conclusions from the documented architecture, not claims by the project. [M1-M7]
7. Comparative requirements gap matrix
The matrix below distinguishes between functionality that appears reusable, functionality that exists only partially and functionality that would need to be built as part of the product. “Strong” does not mean production-ready without testing.
| Requirement | ** PraisonAI** | ** ChatDev** | ** Magec** | ** Gap implication** |
|---|---|---|---|---|
| Recognised open-source licence | ** Strong** | ** Strong** | ** Strong** | All three qualify at repository level; dependency audit still required. |
| Fully local orchestration | ** Strong** | ** Strong** | ** Strong** | All can run locally; optional cloud models remain possible. |
| Per-Actor model/provider | ** Strong** | ** Strong** | ** Strong** | Praison has the richest routing and fallback story. |
| Persistent Actor registry | ** Partial** | ** Partial** | ** Partial** | Each has agent configuration, but not the full Actor lifecycle/domain model. |
| AI and Human Actors as peers | ** Weak** | ** Partial** | ** Weak** | ChatDev has human nodes; none has the intended human work model. |
| Responsibilities and explicit exclusions | ** Partial** | ** Partial** | ** Partial** | Prompts can express boundaries; structured enforceable fields are absent. |
| Dynamic team creation | ** Partial** | ** Partial** | ** Weak** | Agent generation exists in Praison; product governance remains missing. |
| Sequential/parallel/conditional flows | ** Strong** | ** Strong** | ** Strong** | All support core patterns, with different maturity. |
| Critique and revision loops | ** Strong** | ** Strong** | ** Strong** | All can construct loops; Praison and ChatDev have stronger evaluation patterns. |
| Durable restart/resume | ** Strong** | ** Partial** | ** Partial** | Praison documents autosave/checkpoints; others need operational proof. |
| Scheduled watchdog work | ** Strong** | ** Weak** | ** Strong** | Praison and Magec document scheduling/cron. |
| Pause, redirect and cancel active work | ** Partial** | ** Partial** | ** Weak** | Needs product-level job control and event model. |
| Projects, epics, stories and tasks | ** Weak** | ** Weak** | ** Weak** | Major common gap. |
| Kanban and task ownership | ** Weak** | ** Weak** | ** Weak** | Major common gap. |
| Human approval and work inbox | ** Partial** | ** Partial** | ** Weak** | Must be elevated into a first-class product layer. |
| Shared project memory | ** Partial** | ** Strong** | ** Weak** | ChatDev’s graph memory is closest; governance/versioning still absent. |
| Role and organisational memory | ** Partial** | ** Partial** | ** Weak** | Requires dedicated scoped memory model. |
| Persona/version learning | ** Partial** | ** Partial** | ** Weak** | Evaluation exists, but controlled Actor evolution is not a product feature. |
| Exact prompt envelope inspection | ** Partial** | ** Partial** | ** Weak** | Needs a dedicated trace and reconstruction layer. |
| Token and cost accounting | ** Strong** | ** Weak** | ** Weak** | Praison has documented cost/model-routing primitives. |
| Budget limits and efficiency policy | ** Partial** | ** Weak** | ** Weak** | Must become project and Actor-level governance. |
| Trusted data egress gateway | ** Weak** | ** Weak** | ** Weak** | Major common gap and product differentiator. |
| Tool policy and sandboxing | ** Strong** | ** Partial** | ** Partial** | Praison provides strongest documented control hooks. |
| Multi-user RBAC and tenancy | ** Weak** | ** Weak** | ** Weak** | Required for organisations and client-hosted service. |
| Visual workflow editor | ** Partial** | ** Strong** | ** Strong** | Praison can integrate with visual tools, but lacks one unified product canvas. |
| Enterprise audit and reporting | ** Partial** | ** Partial** | ** Weak** | Instrumentation exists, but governance reports need building. |
| Community templates/Actor packs | ** Partial** | ** Partial** | ** Partial** | Technically possible; curation, portability and governance are product work. |
8. Detailed gap analysis for the recommended PraisonAI foundation
The recommended architecture treats PraisonAI as a replaceable kernel behind a product-owned API. This section identifies what can probably be reused, what should be wrapped or adapted, and what must be designed as original product capability.
8.1 Reuse with validation
| Reusable primitive | ** Required validation or adaptation** |
|---|---|
| Agent execution and teams | Use PraisonAI’s agent, multi-agent and hand-off primitives to execute AI Actor work. Validate concurrency, cancellation, error propagation and deterministic structured outputs. |
| Workflow patterns | Reuse route, parallel, loop, repeat, evaluator-optimiser and orchestrator-worker patterns. Expose them through a product-owned workflow definition rather than storing framework-specific objects in the application database. |
| Provider access | Reuse broad local and cloud provider support, including Ollama, vLLM and OpenAI-compatible endpoints. All calls should pass through the Trusted Model Gateway. |
| Model routing | Reuse cost/performance routing concepts and fallback mechanisms. Replace model-price data and data-policy decisions with client-controlled catalogues. |
| Sessions, checkpoints and scheduling | Reuse after proving restart behaviour, idempotency, lock handling and persistence. The product must add a durable job and event record independent of the framework. |
| Policies, guardrails and approvals | Reuse tool allow/deny/ask hooks as enforcement points. Add organisational policies, Human Actor routing, audit context and a safer default-deny posture. |
| Memory, retrieval and knowledge connectors | Reuse adapters where their licences and deployment modes fit. Maintain product-owned memory metadata, scopes, retention and provenance. |
| Telemetry and evaluation | Reuse OpenTelemetry and evaluation hooks. Store essential traces locally in the product database rather than requiring an external tracing SaaS. |
| Sandbox and code tools | Reuse only after security testing. Place coding Actors in disposable containers or worktrees with constrained filesystem and network access. |
8.2 Build as product-owned capability
Actor Registry: Versioned AI and Human Actors; identity, role, purpose, responsibilities, exclusions, escalation, authority, tools, memory scopes, model policy, budget, quality rules and ownership.
Team Designer: Reusable teams, reporting lines, consultation rules, substitutions, human membership, templates and an Orchestration Actor that proposes teams subject to approval.
Project and Work Plane: Outcomes, requirements, epics, stories, tasks, subtasks, dependencies, acceptance criteria, risk, estimate, priority, status and evidence.
Kanban and Human Workbench: Boards, queues, notifications, comments, attachments, approvals, rejection, modification, lived-experience testing and responsibility history.
Trusted Model Gateway: Provider catalogue, local/cloud routing, data classification, minimisation, redaction, disclosure preview, policy enforcement, budgets, provider health and complete request logging.
Durable Job and Event Service: Product-owned run IDs, leases, queues, state transitions, checkpoints, pause/resume/cancel, idempotency, retries, scheduled triggers and event-driven watchdogs.
Scoped Memory Service: Project, task, Actor-role, team and organisational memory with provenance, permissions, retention, review, promotion and deletion.
Prompt and Trace Laboratory: Versioned prompt assembly, exact message envelope, injected memory, tools, model parameters, cache use, raw responses, post-processing, cost and side-by-side comparisons.
Watchdog and Evaluation Layer: Architecture, scope, security, quality, usability, cost and ethics watchdogs with read-only defaults and controlled escalation.
Security and Governance Plane: Identity, RBAC, client tenancy, secrets, tool permissions, network egress, code sandboxing, audit, backup, export, retention and policy reports.
Template and Community Library: Portable Actor profiles, team blueprints, workflow patterns, evaluation sets and deployment packs without creating a proprietary marketplace dependency.
8.3 Architecture isolation required
The product should not allow PraisonAI objects to become the permanent application schema. That would exchange vendor lock-in for framework lock-in. A narrow “Execution Kernel Interface” should translate product concepts into PraisonAI calls and translate results back into product events.
- Define product-owned JSON schemas for Actors, teams, work items, policies, model profiles, memory references, approvals and run events.
- Persist product state in PostgreSQL before invoking the kernel. Treat the kernel as a worker, not the system of record.
- Wrap model, tool, session, checkpoint and scheduler APIs behind internal interfaces with contract tests.
- Pin a known upstream version, maintain a small fork only where necessary and periodically rebase after automated compatibility testing.
- Create an alternative kernel adapter proof, even if minimal, using ChatDev or a small internal executor. This proves replaceability and protects the anti-lock-in promise.
8.4 Critical technical uncertainties
| Uncertainty | ** Required proof** |
|---|---|
| Checkpoint semantics | Confirm whether active workflows survive hard process termination, container restart and partial tool failure without duplicating side effects. |
| Scheduler persistence | Confirm schedules, missed-run behaviour, timezone handling, concurrency and high-availability semantics. |
| Cancellation and interruption | Confirm that a Human Actor or watchdog can stop an in-flight model/tool operation and leave a coherent recoverable state. |
| Prompt completeness | Confirm access to every message actually sent to each provider, including framework-added system instructions, tool schemas and compaction summaries. |
| Cost accuracy | Confirm provider usage metadata and local-model cost models; implement estimates when providers do not return reliable prices. |
| Policy coverage | Confirm policy checks apply to all tool paths, external agents and MCP transports, with no bypass through custom tools. |
| Memory provenance | Confirm retrieval results can retain source, version, permission and reason for inclusion. |
| Thread and tenant safety | Confirm sessions and global registries do not leak state across users, projects or concurrent runs. |
| Licence and dependencies | Generate a software bill of materials and identify source-available, copyleft, model and data licences before distribution. |
9. Why the other candidates are not the primary starting point
9.1 ChatDev 2.0: more visible product, less complete execution control
ChatDev would be a reasonable choice if the immediate objective were a compelling desktop demonstration of agents talking, reviewing and looping through a visual canvas. It should remain part of the proof-of-concept comparison because it may reveal that its runtime is more durable than its documentation currently demonstrates.
For the full vision, however, adopting ChatDev as the base would likely require replacing or heavily extending the backend precisely where the product differentiates: budget-aware provider choice, long-running schedules, organisational task state, policy enforcement, Human Actor work, enterprise access control and reliable operational recovery. Its interface could therefore become an attractive shell around a growing parallel control plane. That creates a risk of maintaining two overlapping workflow models.
The better use of ChatDev is as a design reference and optional adapter. Its human review loop, visible inter-agent context, graph authoring and “virtual software company” heritage should directly inform the Actor Team experience. Selected frontend concepts may be reused subject to architectural and licence review, but the product should not inherit its workflow graph as the sole organisational model.
9.2 Magec: strong shell, high foundation risk
Magec is appealing because it already looks like a self-contained product rather than a library. Its Go binary, Vue UI, local deployment, visual flows and cron access could make an early prototype feel integrated quickly.
The danger is that the team would need to build several hard platform capabilities directly inside a young codebase: shared scoped memory, Human Actors, model and cost routing, job governance, multi-user security, audit and project work management. That work would be coupled to a smaller upstream community and would involve both product and foundational runtime changes at once.
Magec is best treated as a secondary prototype and source of practical ideas. If its maintainer community and governance features grow, it could later become another execution-kernel adapter or deployment option. It should not currently be the single point on which the wider service and social mission depend.
10. Other codebases reviewed and why they were not selected
| Project | ** Reason not selected as primary foundation** |
|---|---|
| Flowise | A strong Apache 2.0, self-hosted visual workflow platform. It is an excellent reference or potential embedded workflow editor, but its core abstraction is a general component graph rather than persistent organisational Actors, Human Actors and project work. |
| Langflow | Very useful for visual prompt, model and tool experimentation, and it can complement PraisonAI. It remains a development canvas rather than the complete governed work-management product. |
| CrewAI | Its MIT-licensed role, goal, task and crew concepts are close to Actor personas. The open core is strong, but the polished AMP control plane is commercial, and the local core would still require the same project, human, memory, governance and UI layers. PraisonAI’s broader documented routing, checkpoint, scheduling and policy set made it the better initial kernel for this requirement. |
| Mission Control | A promising MIT-licensed self-hosted dashboard for agents, tasks, costs and Kanban-style work. It is relevant inspiration and may provide reusable concepts, but it is a control plane rather than the reasoning and orchestration engine required underneath. |
| Kandev | A useful AI Kanban and development environment, but focused on coding agents rather than a general Actor system for community, organisational and personal work. |
| AgentTeams / HiClaw | Interesting for containerised manager-worker teams and separate workspaces. It is operationally heavier and lacks the intended visual project, human and business-governance layers. |
| Hermes Agent and OpenClaw | Potentially valuable worker runtimes, gateways and personal agents. Their current multi-agent models do not provide the structured, durable project-team orchestration and work plane required here. |
| CAMEL and Microsoft Agent Framework | Capable engineering libraries for multi-agent patterns. They would require more original framework and product development than PraisonAI for the target feature set. |
| LangGraph | A capable open-source graph runtime, but much of the polished operational experience is associated with LangSmith. Starting here would involve building more model routing, Actor abstractions, scheduling, human experience and local observability. |
| n8n and Dify | Both are useful self-hosted products, but their licensing does not meet the strict objective of a fully open, non-crippled foundation with no essential proprietary layer. |
Evidence note: These projects were considered in the earlier landscape review. The brief descriptions reflect their official repositories and documentation available at the assessment date. [O1-O10]
11. Recommended target product architecture
The product should be a local-first control plane with a replaceable execution kernel. The architecture below keeps client knowledge, governance and project state independent of any model provider and independent of PraisonAI itself.
1. Experience layer Actor Registry; Team Designer; project and Kanban boards; visual workflow; Human Actor inbox; run monitor; prompt/cost explorer; administration.
2. Application control plane Product-owned APIs and domain services for Actors, teams, work, approvals, policies, budgets, templates and client tenancy.
3. Durable orchestration service Job queue, event store, scheduler, checkpoints, leases, retries, pause/resume/cancel and watchdog events.
4. Execution Kernel Interface Adapters translating product-owned workflow commands to PraisonAI and, later, alternative kernels.
5. Trusted Model Gateway Provider and model catalogue; local/cloud endpoints; cost and capability routing; data classification; redaction; disclosure policy; request/response capture.
6. Tool and sandbox plane MCP and native tools; per-Actor permission; isolated containers/worktrees; network policy; secret brokering; approval interception.
7. Memory and knowledge plane PostgreSQL metadata, object storage, vector/keyword retrieval, provenance, access scopes, retention and promotion of lessons.
8. Audit and evaluation plane OpenTelemetry, local trace store, prompt reconstruction, cost accounting, evaluation sets, reviewer outcomes and governance reports.
9. Deployment plane Single-user local Docker; team server; client VPC/private cloud; optional managed operation inside the client’s account.
Virtual Private AI Team analogy Just as a virtual private cloud combines shared infrastructure with an organisation’s own network boundaries and policies, the Trusted AI Actor Platform would combine local and upstream models with client-controlled orchestration, memory, permissions, values, approvals and provider choice. The client can gain scale without surrendering the organisational brain of the system.
Core data objects
| Object | ** Purpose** |
|---|---|
| Actor | AI or human identity, persona, capability, responsibility, exclusion, authority, memory, model/tool policy, budget and version. |
| Team | Purpose, membership, reporting and consultation routes, shared memory, policies and reusable template. |
| Work Item | Outcome, epic, story, task or subtask; owner; dependencies; acceptance evidence; risk; status and budget. |
| Workflow Definition | Reusable pattern connecting Actor roles, conditions, reviews, human gates, schedules and watchdogs. |
| Run | A durable execution instance with state, events, messages, costs, artefacts, approvals and checkpoints. |
| Policy | Rules controlling model data egress, tool access, spend, autonomy, memory and required human review. |
| Memory Record | Content, provenance, scope, permission, confidence, retention and promotion status. |
| Artefact | Code, document, image, report, test result or decision with ownership and version. |
12. Delivery backlog and division of work
The gap can be divided into coherent workstreams. The allocations below are suggested starting points based on the roles implied in the discussion and should be adjusted to actual strengths and availability. Every workstream should have a primary owner and a peer reviewer.
| Phase | ** Workstream** | ** Suggested lead** | ** Deliverable** |
|---|---|---|---|
| P0 | ** Kernel spike and licence baseline** | Steve | Clone, pin and test PraisonAI; SBOM; dependency/licence report; internal execution interface; alternate-kernel stub. |
| P0 | ** Reference scenario and acceptance suite** | Geoff + Steve | Ten-Actor software project; model mix; human gates; budget; restart; reviewer loops; expected evidence. |
| P0 | ** Product domain model** | Geoff + Davin | Actor, Human Actor, team, work item, policy, run, memory and artefact schemas. |
| P1 | ** Actor Registry and Team Designer** | Davin | Create/version Actors, boundaries, model/tool/memory policy, templates and team membership. |
| P1 | ** Trusted Model Gateway** | Steve | Provider registry, local/cloud endpoints, capability and price catalogue, egress policy, fallback and request capture. |
| P1 | ** Project/Kanban work plane** | Davin | Outcomes, epics, stories, tasks, dependencies, acceptance criteria, status and Actor assignment. |
| P1 | ** Human Actor inbox** | Davin + Geoff | Approvals, modifications, review, feedback, lived-experience tests, notifications and audit. |
| P1 | ** Local persistence and export** | Steve | PostgreSQL, local artefact store, backup, portable JSON/YAML export and restore. |
| P1 | ** Prompt and run trace** | Steve + Davin | Final rendered messages, memory/tool injection, raw outputs, tool calls, costs and review decisions. |
| P2 | ** Durable job service** | Steve | Queue, state machine, pause/resume/cancel, retries, schedules, idempotency and worker leases. |
| P2 | ** Scoped memory service** | Steve + Geoff | Project, task, role and organisation scopes, provenance, retention, promotion and permissions. |
| P2 | ** Orchestration Actor** | Geoff + Steve | Team proposal, decomposition, model allocation, budget planning, escalation and human approval. |
| P2 | ** Watchdog Actors** | Geoff + Steve | Architecture, scope, cost, security and quality checks with safe intervention rules. |
| P2 | ** Sandbox and tool permissions** | Steve | Container/worktree isolation, network egress, secrets, default-deny policy and approval hooks. |
| P2 | ** Evaluation and persona improvement** | Geoff + Davin | Actor benchmarks, prompt versions, reviewer outcomes, regression tests and controlled promotion. |
| P3 | ** Multi-user, tenancy and RBAC** | Steve + Davin | Organisations, clients, roles, SSO options, audit and private deployments. |
| P3 | ** Community template library** | Geoff + Davin | Open Actor packs, team blueprints, NFP deployment packs, documentation and contribution governance. |
| P3 | ** Enterprise governance reports** | Geoff + Steve | Data egress, provider use, spend, human approvals, risk, model choice and policy compliance. |
Suggested team responsibilities
Geoff: product, Actor design and social purpose
Own the definition of value, Actor semantics, human participation, persona boundaries, community and enterprise use cases, ethics, data-disclosure principles, evaluation criteria and service proposition. Translate real projects into reference workflows and make sure the system supports people rather than merely displaying technical cleverness.
Steve: execution, model infrastructure and assurance
Own the PraisonAI spike and fork strategy, provider/model gateway, durable execution, checkpoints, scheduler, tool policy, sandboxing, persistence, deployment, security baselines and technical evidence. Keep the kernel replaceable and prevent hidden cloud dependencies.
Davin: control-plane experience and application integration
Own the web application and interaction model for Actors, teams, projects, Kanban, visual workflows, run monitoring, prompt/cost inspection and Human Actor work. Ensure technical events become understandable decisions and actions for ordinary users.
Each major feature should be reviewed by one of the other two. The product, execution and interface layers are tightly coupled enough that isolated ownership would recreate the silos the platform is intended to avoid.
13. Proof-of-concept and decision gates
A four-part technical spike should precede full development. The same reference project should be run through each candidate so that the decision is based on observable behaviour rather than feature lists.
Reference project
Build a small local-first volunteer-management application using FastAPI, PostgreSQL, a JavaScript frontend, authentication, Docker Compose and automated tests. Use at least the following Actors: Product Owner, Business Analyst, Scrum Master, Architect, JavaScript Developer, API Developer, Database Developer, Infrastructure Engineer, Tester, Reviewer and Cost Watchdog.
Mandatory proof points
| Proof point | ** Pass condition** |
|---|---|
| Mixed models | At least two local models and two optional cloud providers, assigned by Actor and switchable without workflow redesign. |
| Boundaries | The JavaScript Actor can request but not independently make database or infrastructure redesigns. |
| Structured hand-off | Each Actor passes defined artefacts and decisions, not an uncontrolled transcript. |
| Iteration | Build, test, critique, fix and retest loops occur until acceptance or budget exhaustion. |
| Human control | A human can approve architecture, modify a proposed tool action, reject a result, pause a run and redirect work. |
| Watchdogs | Architecture, scope and cost checks run on schedules or events and can raise a pause request. |
| Restart recovery | Terminate the process mid-work and prove consistent resumption without duplicate commits or payments. |
| Prompt evidence | Capture the exact request sent to every model, including memory and tool schemas. |
| Cost evidence | Show per-Actor, per-task and per-provider usage, including local compute estimates. |
| Local ownership | Disconnect from all optional cloud services and retain projects, prompts, memories, logs and artefacts. |
| Portability | Export Actors, team, workflow, project and run history, then restore into a clean deployment. |
| Security | Demonstrate default-deny tools, isolated code execution and blocked unauthorised data egress. |
Go/no-go rules for PraisonAI
- Go if the native framework can support per-Actor provider choice, loops, checkpoints, scheduling and policy hooks without a compulsory hosted service.
- Proceed with conditions if gaps can be isolated inside the kernel adapter and covered by tests without maintaining a broad fork.
- Do not use it as the primary kernel if prompt construction cannot be inspected, state leaks between sessions, recovery duplicates side effects, policy hooks can be bypassed, or essential capabilities depend on source-available or proprietary services.
- Retain ChatDev as the fallback for a visual MVP and as the second adapter used to prove the product is not married to one engine.
14. Business case and social-purpose model
14.1 The problem being solved
Many organisations are accumulating disconnected subscriptions, pilot projects and model-specific assistants. Their prompts, operational knowledge, evaluations and workflows become dispersed across providers. They may gain impressive demonstrations without gaining an organisational capability that they own.
At the same time, small organisations and individuals cannot afford a literal team of product, research, engineering, testing, finance and governance specialists. They may have access to a chatbot, but not to a managed process that decomposes work, checks quality, controls cost and retains learning.
The proposed platform addresses both problems by turning models into an accountable team of Actors controlled by the user or organisation. It makes model providers substitutable resources while making the client’s Actor personas, workflows, memory and governance the durable asset.
14.2 Enterprise value proposition
Reduce avoidable model spend: Use small local or cheaper hosted models for bounded work, reserve expensive reasoning models for difficult decisions, and stop loops when value no longer justifies cost.
Retain provider optionality: Move between local models, cloud APIs and specialist providers without rebuilding the organisation’s workflow or surrendering its knowledge base.
Keep organisational IP: Store prompts, Actor designs, evaluations, decisions, memory and artefacts in the client’s environment.
Make AI governable: Apply human gates, data-egress rules, tool permissions, budgets and auditable reasons for model and workflow decisions.
Turn experiments into delivery: Connect AI work to outcomes, tasks, acceptance criteria, reviews and releases rather than isolated chat sessions.
Develop reusable capability: Improve Actor personas and team patterns over time, creating a compounding organisational asset.
14.3 Community and individual value
Access to a structured team: A community organisation can assemble grant research, budgeting, policy review, communications and volunteer coordination Actors rather than relying on one general chatbot.
Capacity building: Open templates and training help people understand and modify their own systems. The service should leave them more capable, not permanently dependent.
Local ownership and dignity: Sensitive organisational or personal work can remain local or be selectively disclosed according to the user’s own risk judgement.
Fair access: Paid enterprise work can fund free or low-cost deployments, training and model credits for public-benefit projects.
Values-led workflows: Teams can encode accessibility, environmental responsibility, care, inclusion and community accountability as review roles and acceptance criteria.
14.4 Commercial model without open-core lock-in
The same complete open-source core should be available to community and enterprise users. Commercial value should come from skilled implementation and dependable service, not from withholding the crucial orchestration component.
| Service | ** What the client buys** |
|---|---|
| AI capability and workflow assessment | Map current costs, providers, risks, knowledge and candidate work processes. |
| Private deployment | Install locally, in a client virtual private cloud or in a client-owned cloud account. |
| Actor-team and workflow design | Create personas, boundaries, hand-offs, review loops, evaluation sets and human roles for specific work. |
| Integration services | Connect documents, databases, Git, ticketing, communications and line-of-business systems. |
| Model benchmarking and cost optimisation | Test local and cloud models against the client’s tasks and configure routing and budgets. |
| Governance and security | Configure data egress, tool permissions, logging, retention, approvals, audit and safe deployment patterns. |
| Training and enablement | Teach staff to create, evaluate and improve Actor teams rather than consume a black-box service. |
| Support and managed operation | Provide upgrades, monitoring and service levels while the client retains data and deployment ownership. |
14.5 Cross-subsidy model
Enterprise revenue could support a transparent public-benefit programme rather than a vague promise. Examples include:
- a defined proportion of consulting time allocated to not-for-profit and community deployments;
- an enterprise “sponsor a community team” option covering setup, training or model usage;
- open Actor packs for volunteer coordination, grant preparation, community events, accessibility review and public-interest research;
- regular free installation and design clinics;
- public documentation and evaluation sets that remain useful without purchasing support.
This model aligns revenue with impact. Enterprises pay for reliability, integration, governance and expertise. Community users receive the same open foundation, with assistance targeted according to capacity and social benefit.
14.6 Environmental and social responsibility
The platform should not claim that AI is automatically environmentally beneficial. It can, however, make waste visible and reduce unnecessary use by routing bounded tasks to smaller models, using existing local hardware, caching repeated context, stopping unproductive loops and measuring the compute and provider spend associated with accepted results. Any claim of carbon reduction should be based on measured evidence rather than marketing.
Values and ethics should be operational rather than decorative. A client should be able to require accessibility review, environmental impact consideration, privacy minimisation, human consent or community benefit as actual workflow gates and Actor responsibilities.
15. Benefits by stakeholder
| Stakeholder | ** Benefit** |
|---|---|
| Geoff, Steve and Davin | Lower personal cloud spend; better use of local AI hardware; a reusable product and consulting capability; deeper technical understanding; a practical project aligned with social purpose. |
| Enterprise clients | Provider choice, private deployment, auditable governance, cost control, reusable organisational intelligence, integration with existing work and reduced dependence on mega-vendor control planes. |
| Not-for-profits and community groups | Access to structured specialist capacity, reusable templates, transparent operation, ownership of data and knowledge, and affordable or sponsored implementation. |
| Individuals and small teams | A manageable AI team for research, planning, building, review and learning, with local memory and the ability to choose which information leaves their machine. |
| Open-source contributors | A meaningful platform where contributions to Actors, workflows, evaluation and governance remain broadly reusable. |
| Model providers | A fair opportunity to compete on capability, cost, privacy and reliability rather than winning solely through ecosystem lock-in. |
16. Risks, ethics and governance
| Risk | ** Why it matters** | ** Mitigation** |
|---|---|---|
| Upstream churn | PraisonAI has a rapid release cadence and broad scope. | Pin versions; isolate behind interfaces; maintain contract tests; adopt upgrades deliberately. |
| Maintainer concentration | A project may depend heavily on a small number of contributors. | Keep fork capability, document internals, contribute upstream and prove an alternative adapter. |
| Security of tools and code execution | Autonomous tools can modify files, systems or external services. | Default-deny, least privilege, sandboxing, worktrees, network controls, secrets brokering and human gates. |
| False confidence and circular review | Multiple Actors may repeat the same model’s blind spots. | Use heterogeneous models, independent tests, evidence-based acceptance and Human Actors. |
| Cost escalation | Loops, long context and frontier models can consume funds without proportional value. | Per-task budgets, cost watchdogs, stop rules, model routing and accepted-value metrics. |
| Privacy and unwanted disclosure | Cloud calls can expose more context than the user realises. | Data classification, disclosure preview, minimisation, redaction, local defaults and complete egress logs. |
| Mission drift | Enterprise priorities may crowd out community benefit. | Public-benefit commitments, transparent reporting, cross-subsidy targets and separate community governance. |
| Open-source dilution | A future commercial layer could become essential and recreate lock-in. | Commit to a complete open core, open formats and client-owned deployment; sell expertise and support. |
| Scope overload | Attempting every feature at once could prevent delivery. | Reference scenario, narrow MVP, staged backlog and measurable acceptance gates. |
| Ethical misuse | The same orchestration can scale harmful or exploitative work. | Acceptable-use principles, client due diligence, permissioned tools, transparency and refusal of harmful engagements. |
Governance principles
- The client owns its data, prompts, Actor definitions, workflows, memory, evaluations and outputs.
- The platform must remain functional without a specific model vendor, hosted tracing service or proprietary orchestration service.
- Every Actor must have visible authority, limits, escalation rules and a responsible human owner.
- Humans are not treated as inconvenient exceptions. They are Actors with essential judgement, lived experience and accountability.
- Local operation is the default capability; cloud use is an explicit, inspectable choice.
- Values such as accessibility, environmental care, privacy and community benefit can be encoded as work and acceptance criteria.
- Automation should increase human capacity and agency rather than conceal responsibility or eliminate meaningful consent.
17. Recommended next steps
1. Freeze the reference requirement: Approve the executive requirements and the mandatory proof points before choosing implementation details.
2. Create a shared technical repository: Store architecture decisions, candidate adapters, benchmark tasks, deployment files and evidence.
3. Complete the three-candidate spike: Run the same project through PraisonAI, ChatDev and Magec and record actual behaviour.
4. Decide the kernel: Confirm PraisonAI, choose ChatDev instead, or implement a narrower kernel based on evidence.
5. Build the product-owned domain model: Actors, teams, work, policies, memory, runs and approvals before investing heavily in UI.
6. Deliver a local single-user MVP: Actor registry, project board, human approval, one provider gateway, one local model, one cloud model and complete local trace.
7. Pilot on a real internal project: Use the platform to help build itself or deliver a bounded community project; measure cost, course correction and accepted outcomes.
8. Develop the service offer: Create an enterprise assessment package and a community deployment package using the same open core.
9. Publish principles and contribution rules: Make the anti-lock-in and public-benefit commitments visible before commercial pressures shape the platform.
Final recommendation Proceed with a PraisonAI-centred technical spike, while naming and designing the product independently. The goal is not to resell an agent framework. It is to create an open, trusted and locally controlled way to assemble AI and Human Actors into teams that plan, deliver, review, learn and remain accountable to human values.
Appendix A. Detailed scoring rationale
| Criterion | ** Wt** | ** P** | ** PraisonAI rationale** | ** C** | ** ChatDev rationale** | ** M** | ** Magec rationale** |
|---|---|---|---|---|---|---|---|
| Open source/local control | 15 | 5.0 | MIT and fully local core; audit optional dependencies. | 5.0 | Apache 2.0 and local Docker. | 5.0 | Apache 2.0 and self-contained local deployment. |
| Actor/team semantics | 15 | 4.0 | Strong configurable agents/teams, but product Actor lifecycle absent. | 4.0 | Strong visible agent nodes and organisational metaphor. | 4.0 | Per-agent model/prompt/tools, but limited organisation layer. |
| Workflow/durability | 15 | 4.5 | Rich patterns, autosave, checkpoints and scheduling documented. | 3.5 | Rich graph patterns; production restart durability uncertain. | 3.5 | Flows and cron strong; durable governance less proven. |
| Multi-model/cost routing | 10 | 5.0 | Best documented provider breadth, routing, fallback and cost concepts. | 3.0 | Per-node providers, but no equivalent cost governance layer. | 2.5 | Multiple backends, no documented cost/capability router. |
| Human participation | 10 | 3.0 | Approval hooks, but not Human Actors or workbench. | 4.0 | Human nodes and review loops are visually native. | 1.5 | No documented human workflow primitive. |
| Memory/persistence | 10 | 4.5 | Sessions, persistence and multiple memory approaches. | 3.5 | Shared graph memories and snapshots; scope/lifecycle gaps. | 2.5 | Good local services but agent memories are isolated. |
| Prompt/audit | 10 | 3.5 | Telemetry/tracing hooks, but unified exact-envelope UX absent. | 3.5 | Visible context/logs, but full envelope and enterprise audit uncertain. | 2.5 | Admin/logging exists; limited documented trace depth. |
| Visual/product UX | 5 | 2.5 | Fragmented UI ecosystem; requires new control plane. | 5.0 | Best integrated visual workflow experience. | 4.5 | Strong compact admin and visual flows. |
| Extensibility | 5 | 5.0 | Broad providers, tools, MCP, A2A, adapters and languages. | 4.0 | Modular nodes/providers/tools and SDK. | 3.5 | MCP, REST, clients and clean Go/Vue stack. |
| Maturity/community | 5 | 3.5 | Active and broad, but extreme release velocity increases churn risk. | 3.5 | Large community, but 2.0 architecture is recent. | 1.0 | Very young and small community. |
P = PraisonAI, C = ChatDev 2.0, M = Magec. Weighted totals are rounded to whole numbers in the executive summary.
Appendix B. Source register
Sources were accessed for the assessment dated 3 July 2026. Repository statistics and features can change rapidly. Official project repositories and documentation were preferred. URLs are included so the team can reproduce and update the review.
[P1] PraisonAI GitHub repository.Open source. MIT licence, providers, workflows, sessions, checkpoints, policy, telemetry, cost tracking, scheduling and feature index.
[P2] PraisonAI Advanced Multi-Provider Patterns.Open source. Different models per agent; routing strategies, fallbacks, load balancing, provider health and cost optimisation.
[P3] PraisonAI Policy Engine.Open source. Declarative allow, deny and approval controls for agent behaviour and tools.
[P4] PraisonAI Workflow Checkpoints and Sessions.Open source. Checkpoint/resume and persistent-session concepts requiring direct verification.
[P5] PraisonAI Human Approval examples.Open source. Approval, denial and modification patterns in official examples.
[P6] PraisonAI CLI and telemetry documentation.Open source. Scheduler, model router, checkpoints, background tasks, UI and operational features.
[C1] ChatDev 2.0 GitHub repository.Open source. Apache 2.0; zero-code visual platform; local Docker; FastAPI/Vue architecture; repository activity.
[C2] ChatDev Workflow Authoring Guide.Open source. Agent/human nodes, provider settings, conditions, subgraphs, loops, map/tree execution and local context snapshots.
[C3] ChatDev Memory Guide.Open source. Shared memory declarations, local memory options, attachments and session context.
[M1] Magec GitHub repository.Open source. Apache 2.0; local installation; Go/Vue stack; per-agent models; visual flows; cron; local memory infrastructure.
[M2] Magec Agents.Open source. Agent prompts, models, tools, memory and configuration.
[M3] Magec Agentic Flows.Open source. Sequential, parallel, looped and nested visual workflows.
[M4] Magec Memory.Open source. Session and semantic memory configuration and agent isolation.
[M5] Magec Cron and Commands.Open source. Scheduled commands against agents and flows.
[M6] Magec Flow Control.Open source. Shared state and control within flows.
[M7] Magec Administration and Security.Open source. Current administration authentication model and deployment warnings.
[O1] Flowise.Open source. Apache 2.0 self-hosted visual AI workflow builder.
[O2] Langflow.Open source. Open-source visual AI application and prompt workflow builder.
[O3] CrewAI.Open source. MIT-licensed role-based agent and crew framework; commercial AMP control plane is separate.
[O4] Mission Control.Open source. Self-hosted task, agent, cost and Kanban-style management concepts.
[O5] Kandev.Open source. Self-hosted AI Kanban and coding-agent environment.
[O6] AgentTeams / HiClaw.Open source. Containerised manager-worker multi-agent runtime.
[O7] Hermes Agent.Open source. Local agent runtime and sub-agent delegation.
[O8] OpenClaw.Open source. Multi-agent workspaces and model configuration.
[O9] Microsoft Agent Framework.Open source. Open-source agent and workflow framework.
[O10] LangGraph.Open source. Open-source graph execution library with commercial operational ecosystem around LangSmith.
Assessment limitations
- The review primarily examines official repositories, documentation and examples. It does not replace source-code review, load testing, threat modelling or penetration testing.
- Repository stars, forks and release counts are contextual signals, not evidence of production suitability.
- Features described in documentation may be experimental, adapter-specific or subject to breaking changes.
- The recommended architecture intentionally assumes that no single upstream project should own the permanent product data model.