PuranOS Architecture

Building an AI-Native Industrial Firm

Industrial firms don't lack software. They lack a shared operating model connecting engineering computation, project execution, commercial follow-through, and operational memory. PuranOS is our attempt to close that gap.

Read the Documentation

Schedule a Conversation

The Problem

Most industrial firms attempting AI adoption start with chatbot wrappers or document retrieval systems. These fail at scale because they have no structured understanding of the business domain. There is no schema for equipment. No ontology for process units. No typed relationship between a vendor quote and the equipment it prices.

The result: every project reinvents workflows, knowledge evaporates between projects, and AI integration attempts fail because there is no schema for the AI to operate on.

PuranOS takes a different path. Instead of bolting AI onto existing ad-hoc systems, it builds the schema'd substrate first and makes AI a native consumer of that substrate.

Six Architectural Bets

1. Schema'd state over memory

A properly schema'd database substrate is the primary knowledge store — not a vector store or RAG layer. The ontology comes from three sources: enterprise OSS schemas (OpenProject, CRM, inventory, CMMS), 19 standards-aligned engineering contract schemas (stream-state, equipment identity with OPC UA DI/AAS, ISA-18.2 alarms, ISA-5.2 cause-effect matrices, ISA-88 control execution, IEC 61882 HAZOP, NPDES compliance with lab/QA/QC, EIA-748 project controls, operations runtime, reliability feedback, and Ensaras exchange contracts), and custom domain schemas (procurement, bid review). Schema'd state is the authority for canonical objects — equipment, streams, tasks, costs. When unstructured documents matter — vendor submittals, permits, field reports — they are served via document management with IDs linked from schema'd records, so retrieval always returns evidence tied to the object it concerns, not floating chunks of text.

When an operator writes a work order in the CMMS, that IS the memory of what maintenance was done. A separate "memory layer" is architecturally redundant.

2. Knowledge Wiki for unstructured context

Not all relevant context fits in a schema'd database. Meeting synthesis, research papers, design notes, competitive intel, and commissioning lessons are inherently unstructured. The Knowledge Wiki — a file-backed markdown vault exposed to agents through the wiki-graph MCP server, with wikilink graph traversal, fragment reads, and an append-only operations log — gives this material a governed home. Drop-flow agents write raw captures into dated channel folders; a dedicated knowledgebase agent reconciles raw into linked compiled articles on a nightly schedule. Engineers query the wiki by asking an agent, not by opening the vault directly. The wiki complements the schema'd substrate rather than competing with it: structured records remain the authority for canonical objects, while the wiki holds the narrative context that makes those objects interpretable.

A schema tells you the equipment tag and its last maintenance date. The wiki tells you why the maintenance interval was shortened after the 2024 commissioning incident.

3. PM tool as coordination substrate

OpenProject serves as the shared board for human and AI bidirectional task delegation. Explicit per-task state beats prompt memory — moving from scratchpad-based state to externalized state took target recovery from 0.213 to 1.000 in multi-agent coordination benchmarks. In a typical DBO project lifecycle, 60-70% of tasks are irreducibly human: permit meetings, construction inspections, commissioning walkthroughs. The project board is the single surface where a project manager, a project engineer, and an AI agent all see the same task state and dependencies. This is not agent coordination overhead — it is the record of decision for the entire project.

The PM tool is not a reporting dashboard. It is the runtime coordination layer.

4. Skills as captured expertise

Reusable operating procedures encode how work should be done, not just what tools to call. Each successful project produces workflows that can be encoded as skills. An experienced process engineer describing "how I size an MLE bioreactor" produces a more valuable skill than a developer writing Python — because the engineer encodes decision logic, not just computation.

Skills are not prompt templates. They are structured operating procedures with typed inputs, typed outputs, and decision trees.

5. Engineering engines as first-class citizens

Deterministic simulation engines (QSDsan, WaterTAP) are infrastructure, not afterthoughts. Both provide session persistence, cross-engine handoffs via typed converters, and model credibility metadata on every result — so downstream consumers know whether a number came from a screening estimate or a calibrated model.

An agent-generated sizing is not automatically "engineering grade." The credibility metadata is what makes it auditable.

6. Tool-level governance

Side-effect permissions are enforced at the MCP server boundary, not in prompts. When an agent sends an email, the server enforces which identity it sends from. When an agent modifies a vendor record, the action is logged with agent identity and trigger task. A model 10x more capable is 10x more dangerous with unrestricted side-effect permissions. Scoped permissions and audit trails become more important as capability increases, not less.

Prompt-level governance ("please don't do bad things") is not governance. Server-boundary enforcement with audit trails is governance.

Architecture at a Glance

External systems and human operators

↓

Communication / orchestration runtime

↓

Persona layer + reusable skills

↓

MCP server layer

↓

Business systems + engineering engines

The most important design choice is the hybrid state model: OpenProject stores long-running collaborative project state; PostgreSQL stores execution reliability state (deduplication, attempts, transitions, leases, side-effect audit records). That split keeps project work legible to humans without sacrificing the exact semantics required for reliable agent execution.

Engineering schemas align with established industry standards where they exist: DEXPI (equipment classes, piping), ISA-95 (equipment hierarchy), ISA 5.1 (instrumentation tagging), ISA-18.2 (alarm management), ISA-88 (batch control), IEC 62424 (instrumentation), IEC 61882 (HAZOP), CFIHOS (capital handover), OPC UA DI (device identity), IDTA AAS (digital nameplate), and BCF-XML (issue exchange). PuranOS implements practical subsets of these standards — not exhaustive compliance, but interoperability with industry practice.

What This Is Not

Not a chatbot wrapper. There is no "ask PuranOS" interface. Agents execute structured workflows against typed tools.

Governed identity from day one. Every piece of equipment has a canonical identity from the moment it appears on a drawing — an ISA 5.1 tag naming the functional position, a UUID connecting every system. When the CMMS needed a native equipment_tag field, we forked the open-source codebase and deployed the change in under a minute. That is the operational meaning of self-hosted OSS: when the ontology demands a schema change, you make it — on your timeline, not a vendor's feature backlog.

Self-hosted and forkable. Every enterprise system — CMMS, project management, inventory, CRM — is open-source with an inspectable Postgres schema. PuranOS is infrastructure you own and can modify, not a SaaS product you rent. Other firms with similar needs can adopt the architecture.

Not "just add AI" to existing tools. The tools, schemas, and operating model are co-designed. AI is a native participant, not an afterthought.

Why This Gets Used

Enterprise software adoption fails primarily because the system of record and the system of work are different systems. People work in email, conversation, and personal tools. They are asked to transcribe into a structured system that serves someone else's reporting needs. Every transcription step is where shadow processes form — the personal spreadsheet, the text to a colleague, the whiteboard note that actually runs the operation while the official system collects dust.

PuranOS eliminates the transcription step. The primary interaction surface is natural language — email and messaging channels that people already use. AI agents handle structured capture. A field technician emails that they replaced a bearing on 340-P-01A; the communication agent creates the typed work order, links the equipment tag, logs the hours, and flags the failure mode. The technician's path of least resistance is the system of record. There is no separate data-entry step to skip, shortcut, or forget.

Agents are the most reliable users

AI agents execute defined workflows, update task state, produce typed artifacts, and log results — every time. No shortcuts, no "I'll update it later," no personal spreadsheet on the side. The more workflow that flows through agents, the higher the fidelity of the system of record. Data quality is inversely proportional to the number of structured-data-entry touchpoints that require human discipline.

Residual risk: interpretation, not compliance

If the agent consistently misclassifies equipment or picks the wrong work order type, users lose trust and shadow processes re-emerge — from accuracy distrust, not friction. The mitigation is the approval-gate pattern: agents draft structured records, humans confirm or correct in-channel before commit. That confirmation must be low-friction (a reply in the email thread, not a navigation to a separate UI) or it becomes the new transcription tax.

Shadow processes are system bugs, not user bugs

When work happens outside the schema'd substrate, the first question is what the system failed to capture — not who failed to use it. Shadow processes are a leading indicator that the system has a gap: a missing skill, a workflow the AI doesn't handle, or a confirmation step that's too heavy. The response is to fix the system, not monitor the user.

The natural-language-first, agent-mediated-capture architecture is probably the most important adoption decision in PuranOS — more important than the schema design or the engine integration. It attacks the root cause of shadow processes rather than making the transcription more pleasant.

Research Backing

Every architectural decision is anchored to specific research, including the counter-evidence. The evidence stack spans peer-reviewed publications, preprints, technical reports, and experiment repos — research-informed, not research-settled.

Citations tagged: peer-reviewed preprint technical report repo experiment

Production evidence: bounded autonomy wins

MAP: Measuring Agents in Production technical report (2025)

306 practitioners, 86 deployed systems. 68% execute ≤10 steps before human intervention. 79% use manual prompt construction. 74% depend on human evaluation. Best current reality check for "bounded autonomy + explicit workflow + human checkpoints."

Why Do Multi-Agent LLM Systems Fail? peer-reviewed — NeurIPS 2025

MAST-Data: 1,600+ annotated traces, 14 failure modes across 3 categories. Coordination failures are often architectural, not merely model-capability failures. 41-87% failure rates in evaluated SOTA MAS.

AgentArch preprint (2025)

18 architectures evaluated on enterprise tasks. Peaks at 35.3% (customer routing) and 70.8% (time-off). Function calling generally beats ReAct, but model-specific preferences exist.

PuranOS design response: multiple agents only where domain boundaries require it, function-calling over ReAct, shallow workflows with human gates, and a shared project board as the coordination primitive.

Enterprise-specific benchmarks (2026)

WoW-bench: World of Workflows preprint (Jan 2026)

ServiceNow-based: 4,000+ business rules, 55 active workflows, 234 evaluation tasks. Much closer to enterprise operations than older generic agent benchmarks.

Agent-Diff preprint (Feb 2026)

224 enterprise API tasks (scheduling, messaging, file management, project management). Tests effect of API documentation — relevant to typed tool contracts and state-diff verification.

FireBench preprint (Mar 2026)

2,400+ samples across output format compliance, ordered responses, ranking, overconfidence, required inclusions/exclusions. Supports emphasis on deterministic artifact envelopes.

Memory, schema, and structured state

AMA-Bench preprint (Feb 2026)

Many current agent memory systems underperform long-context baselines on long-horizon tasks. Bottleneck is memory-system design, not base model capability. Important corrective against simplistic memory claims.

StructMemEval preprint (Feb 2026)

Structured memory outperforms unstructured only when organized into task-appropriate structure. Application schemas ARE task-appropriate structure — a procurement schema is a purpose-built ledger.

AgentSM preprint (Jan 2026)

Stores prior execution traces as structured programs for text-to-SQL, enabling reuse across larger schemas and harder questions. Best current citation for structured reusable reasoning in enterprise data tasks.

PuranOS design response: avoid a separate memory layer entirely. Application schemas are the task-appropriate structure that StructMemEval identifies as the condition for structured memory to outperform unstructured.

Skills: curated helps, self-generated mostly doesn't

SkillsBench preprint (Feb 2026)

84 tasks, 11 domains. Curated skills improve average pass rate by 16.2 percentage points. Self-generated skills provide no average benefit and often regress. Exactly the pattern PuranOS implies: skills help when curated and scoped.

SWE-Skills-Bench preprint — work in progress (Mar 2026)

49 SWE skills, 565 task instances. Average pass-rate gain only ~1.2pp; 39 of 49 skills show zero improvement. Skills should be narrow, selective, and compatibility-checked.

PuranOS design response: skills are human-curated, narrow, and compatibility-checked — validating the SWE-Skills-Bench finding that auto-generated skills show near-zero improvement.

Why these bets are durable

Credibility metadata gets more valuable as models improve

More capable models produce more convincing but still physically wrong answers. The metadata that distinguishes a deterministic engine result from an LLM-generated number becomes more important, not less.

Tool-level governance scales with capability

Scoped permissions and audit trails are permanent infrastructure. A more capable model with unrestricted side-effect permissions is more dangerous, not safer. Governance constraints become load-bearing as capability increases.

Greenfield advantage compounds

The advantage is not avoiding bad code — AI coding agents are making software refactoring cheap. The advantage is avoiding organizational path dependence: the informal processes, workarounds, and muscle memory that form around ad-hoc systems over years. Redirecting that human inertia is the expensive part of digital transformation, and no coding agent resolves it. A firm that starts with schema'd tools from day one never accumulates that inertia. Employees form their workflows around the ontology rather than in spite of it.

For Curious Industrial Engineers

This is not a product pitch. There is no signup page, no demo request form, no pricing tier.

We publish the architecture because the reasoning behind it is worth sharing — and because the intersection of AI and industrial process engineering is underserved. We are interested in whether anyone else is building something similar.

Who this is for

Process engineers who wonder how AI could make sizing and deliverable generation more consistent without replacing engineering judgment.

Design-build project managers who see the coordination overhead from disconnected systems.

Firm principals who recognize that institutional knowledge walks out the door with senior engineers.

Technical leaders evaluating AI strategy for industrial firms who want an architecture that treats engineering computation as a first-class concern.

Let's compare notes

If any of this resonates — whether you want to discuss the architecture, explore how it applies to your firm, or just compare notes on AI and industrial engineering.

No form to fill out. No sales funnel. Just a conversation between practitioners.