PuranOS Architecture

Building an AI-Native Industrial Firm

Industrial firms don't lack software. They lack a shared operating model connecting engineering computation, project execution, commercial follow-through, and operational memory. PuranOS is our attempt to close that gap.

The Problem

Most industrial firms attempting AI adoption start with chatbot wrappers or document retrieval systems. These fail at scale because they have no structured understanding of the business domain. There is no schema for equipment. No ontology for process units. No typed relationship between a vendor quote and the equipment it prices.

The result: every project reinvents workflows, knowledge evaporates between projects, and AI integration attempts fail because there is no schema for the AI to operate on.

PuranOS takes a different path. Instead of bolting AI onto existing ad-hoc systems, it builds the schema'd substrate first and makes AI a native consumer of that substrate.

Five Architectural Bets

1. Schema'd state over memory

A properly schema'd database substrate is the primary knowledge store — not a vector store or RAG layer. The ontology comes from three sources: enterprise OSS schemas (OpenProject, CRM, inventory, CMMS), purpose-built engineering schemas (31-component plant-state model, ISA 5.1 instrumentation, DEXPI equipment classes), and custom domain schemas (procurement, compliance).

When an engineer writes a work order in the CMMS, that IS the memory of what maintenance was done. A separate "memory layer" is architecturally redundant.

2. PM tool as coordination substrate

OpenProject serves as the shared board for human and AI bidirectional task delegation. Explicit per-task state beats prompt memory — moving from scratchpad-based state to externalized state took target recovery from 0.213 to 1.000 in multi-agent coordination benchmarks.

The PM tool is not a reporting dashboard. It is the runtime coordination layer.

3. Skills as captured expertise

Reusable operating procedures encode how work should be done, not just what tools to call. Each successful project produces workflows that can be encoded as skills. An experienced process engineer describing "how I size an MLE bioreactor" produces a more valuable skill than a developer writing Python — because the engineer encodes decision logic, not just computation.

Skills are not prompt templates. They are structured operating procedures with typed inputs, typed outputs, and decision trees.

4. Engineering engines as first-class citizens

Deterministic simulation engines (QSDsan, WaterTAP) are infrastructure, not afterthoughts. Both provide session persistence, cross-engine handoffs via typed converters, and model credibility metadata on every result — so downstream consumers know whether a number came from a screening estimate or a calibrated model.

An agent-generated sizing is not automatically "engineering grade." The credibility metadata is what makes it auditable.

5. Tool-level governance

Side-effect permissions are enforced at the MCP server boundary, not in prompts. When an agent sends an email, the server enforces which identity it sends from. When an agent modifies a vendor record, the action is logged with agent identity and trigger task.

Prompt-level governance ("please don't do bad things") is not governance. Server-boundary enforcement with audit trails is governance.

Architecture at a Glance

External systems and human operators

            ↓

Communication / orchestration runtime

            ↓

Persona layer + reusable skills

            ↓

MCP server layer

            ↓

Business systems + engineering engines

The most important design choice is the hybrid state model: OpenProject stores long-running collaborative project state; PostgreSQL stores execution reliability state (deduplication, attempts, transitions, leases, side-effect audit records). That split keeps project work legible to humans without sacrificing the exact semantics required for reliable agent execution.

What This Is Not

Not a chatbot wrapper. There is no "ask PuranOS" interface. Agents execute structured workflows against typed tools.

Not Palantir-style data unification. The ontology is not retrofitted over existing ad-hoc systems. It is the schema of the tools themselves, from day one.

Not a SaaS product. PuranOS is self-hosted infrastructure for a specific operating firm, designed so other firms with similar needs can adopt the architecture.

Not "just add AI" to existing tools. The tools, schemas, and operating model are co-designed. AI is a native participant, not an afterthought.

Research Backing

Every architectural decision is anchored to specific research, including the counter-evidence. The evidence stack spans peer-reviewed publications, preprints, technical reports, and experiment repos — research-informed, not research-settled.

Citations tagged: peer-reviewed preprint technical report repo experiment

Production evidence: bounded autonomy wins

MAP: Measuring Agents in Production technical report (2025)

306 practitioners, 86 deployed systems. 68% execute ≤10 steps before human intervention. 79% use manual prompt construction. 74% depend on human evaluation. Best current reality check for "bounded autonomy + explicit workflow + human checkpoints."

Why Do Multi-Agent LLM Systems Fail? peer-reviewed — NeurIPS 2025

MAST-Data: 1,600+ annotated traces, 14 failure modes across 3 categories. Coordination failures are often architectural, not merely model-capability failures. 41-87% failure rates in evaluated SOTA MAS.

AgentArch preprint (2025)

18 architectures evaluated on enterprise tasks. Peaks at 35.3% (customer routing) and 70.8% (time-off). Function calling generally beats ReAct, but model-specific preferences exist.

Enterprise-specific benchmarks (2026)

WoW-bench: World of Workflows preprint (Jan 2026)

ServiceNow-based: 4,000+ business rules, 55 active workflows, 234 evaluation tasks. Much closer to enterprise operations than older generic agent benchmarks.

Agent-Diff preprint (Feb 2026)

224 enterprise API tasks (scheduling, messaging, file management, project management). Tests effect of API documentation — relevant to typed tool contracts and state-diff verification.

FireBench preprint (Mar 2026)

2,400+ samples across output format compliance, ordered responses, ranking, overconfidence, required inclusions/exclusions. Supports emphasis on deterministic artifact envelopes.

Memory, schema, and structured state

AMA-Bench preprint (Feb 2026)

Many current agent memory systems underperform long-context baselines on long-horizon tasks. Bottleneck is memory-system design, not base model capability. Important corrective against simplistic memory claims.

StructMemEval preprint (Feb 2026)

Structured memory outperforms unstructured only when organized into task-appropriate structure. Application schemas ARE task-appropriate structure — a procurement schema is a purpose-built ledger.

AgentSM preprint (Jan 2026)

Stores prior execution traces as structured programs for text-to-SQL, enabling reuse across larger schemas and harder questions. Best current citation for structured reusable reasoning in enterprise data tasks.

Skills: curated helps, self-generated mostly doesn't

SkillsBench preprint (Feb 2026)

84 tasks, 11 domains. Curated skills improve average pass rate by 16.2 percentage points. Self-generated skills provide no average benefit and often regress. Exactly the pattern PuranOS implies: skills help when curated and scoped.

SWE-Skills-Bench preprint — work in progress (Mar 2026)

49 SWE skills, 565 task instances. Average pass-rate gain only ~1.2pp; 39 of 49 skills show zero improvement. Skills should be narrow, selective, and compatibility-checked.

For Curious Industrial Engineers

This is not a product pitch. There is no signup page, no demo request form, no pricing tier.

We publish the architecture because the reasoning behind it is worth sharing — and because the intersection of AI and industrial process engineering is underserved. We are interested in whether anyone else is building something similar.

Who this is for

Process engineers who wonder how AI could make sizing and deliverable generation more consistent without replacing engineering judgment.

Design-build project managers who see the coordination overhead from disconnected systems.

Firm principals who recognize that institutional knowledge walks out the door with senior engineers.

Technical leaders evaluating AI strategy for industrial firms who want an architecture that treats engineering computation as a first-class concern.

Let's compare notes

If any of this resonates — whether you want to discuss the architecture, explore how it applies to your firm, or just compare notes on AI and industrial engineering.

No form to fill out. No sales funnel. Just a conversation between practitioners.

Schedule a Conversation