// full feature list — live + proposed — v1.0 build

THE COMPLETE
ENRICHMENT FLOW

Every action, every channel, every capture surface — live and proposed together. One entry point. Two parallel pipelines. Nothing invisible.

Live — Core

Live — KB+KG

Live — Scanner

Live — Symbols

Live — Scaffold

Capture Surface

Proposed

01 — intake — single entry point

helix_action — Unified Action Tool

helix-cortex / routers / action.py — the only tool the model needs to call

building — phase 1a▾

The model calls one thing. Helix decides what to do with it. No cognitive load about pipelines, scanning, or which tool to use. All intelligence lives inside Helix — not in the model's head.

file_write — Full pipeline: version → scan → KB → KG → observer → scaffold returned.building

file_patch — Targeted str_replace. JSON payload — no shell escaping issues.building

file_move — Moves on disk + updates KG path labels. Content hash is identity, path is a label.building

file_delete — Deletes + logs to observer. Deletions are never invisible.building

file_read — Returns content with metadata. Read-only — no pipeline fires.live

file_list — Directory listing. Read-only — no pipeline fires.live

command — Executes on server. Auto-detects output: stack traces, service names, package installs → KG. Always → observer.building

scaffold_query — Explicit scaffold request. Returns imports[], boilerplate, related_files[], confidence.building

every action passes through the smart content detector

Smart Content Detector

helix-cortex / services / detector.py — shared by both pipelines and all 13 capture surfaces

building — phase 1b▾

Classifies any content: code | config | prose | structured-data | binary. Selects pipeline automatically.building

Used by every input surface: helix_action, file scanner, SSH capture, conversation chunks, git diffs, log lines.building

Config: YAML/TOML/INI → infra atoms. Docker Compose → container + port + network entities in KG.building

Stack traces: Python/Node/Go format detection → file, function, error type → failure record.building

fans into two parallel pipelines simultaneously — both fire on every action

02 — parallel pipelines — fire simultaneously on every action

Pipeline A — Intelligence · async · fire-and-forget · never blocks response

Atom Scanner

SCANNER / ATOMS — tier 1 · 2 · 3

live▾

Tier 1: Tree-sitter — 100+ languages, ~$0.00/file.live

Tier 2: Haiku LLM — purpose, type, quality, reuse potential. ~$0.001/file.live

Tier 3: Self-generated heuristics. After N analyses, generates own rules. $0.00 forever.live

Structural fingerprint per atom. Matching fingerprints auto-register as templates.live

Config parser plugin: Docker Compose → KG entities.building

Import resolution (P3): scan imports → package public API as atoms.p3

Knowledge Base + Knowledge Graph

CORTEX / KB + KG — postgres (pgvector) + neo4j

live▾

KB: FTS5 + pgvector semantic search. 158+ docs indexed.live

Entity upsert: container, service, port, domain, project, tool, person, dependency. 463 entities.live

Relationships: DEPENDS_ON, CONTAINS, TOUCHES, PRODUCES.live

Exchange extraction: decisions/failures/patterns auto-extracted async.building

Operational patterns: restart sequences, port conflicts — learned from SSH stream.building

Symbol Engine

COMPRESSION — § promotion · token savings

live▾

High-frequency phrase tracking. Daily promotion. 400+ active symbols.live

§ prefix namespace. LLM receives full phrase — context carries compressed form.live

Observer

LOGGING — every action · archive collections

live▾

Logs every tool call: type, params, result, timing, session_id.live

Archive collections: decisions, failures, patterns, sessions, project_archive, snapshots.live

Auto conversation extraction (P1): async worker — helix_exchange_post becomes optional.p1

Pipeline B — Scaffolding · fast · returns with action result · <200ms

Intent Parser

helix-cortex / services / intent_parser.py

building — phase 1d▾

Signal 1 — Message intent: tokens matched against atom vocabulary from your codebase.building

Signal 2 — Current context: file/directory/project from recent helix_action calls.building

Signal 3 — Recency window: just wrote a model → bias scaffold toward router or test.building

Vocabulary grows automatically. Skip logic: no tokens, read-only, low confidence → skip.building

Atom Matcher

pgvector similarity search — 990 atoms live

building — phase 1d▾

Structural similarity search on pgvector embeddings. Returns top-N atoms by type.building

Context bias: same project/directory atoms weighted higher.building

Confidence 0.0–1.0. 990 atoms live today, growing every session.990 live

Dependency Cascader

KG traversal — related files + cascade suggestions

building — phase 1d▾

KG traversal: what files relate to this path? What imports this? What depends on this config?building

Cascade: new router → register in main.py + test file. New service → compose + Traefik + health check.building

Scaffold Assembler

boilerplate + imports + type signatures + test stubs

building — phase 1d▾

Combines matched atoms into scaffold. Returns imports[], boilerplate, related_files[], confidence.building

Test stub generation: implementation written → test file scaffold returned automatically.building

Flywheel: Claude completes scaffold → Pipeline A learns → next scaffold improves.building

pipeline A stores atoms · pipeline B scaffold + context → MemBrain injection

03 — capture surfaces — all inputs to pipeline A

13 CAPTURE SURFACES·everything flows through the enrichment engine·nothing is invisible·you never configure what gets scanned

File Operations

helix_action file_write / patch / move / delete

P0▾

file_write: full pipeline. Blob stored in Garage.live

file_patch: targeted str_replace. No sed mangles.live

file_move: disk move + KG path label update by content hash.live

file_delete: delete + observer log.live

SSH Command Capture

POST /capture/ssh — auto from gateway

P0▾

Every gateway__ssh_execute auto-posts. One webhook = 100% SSH coverage.building

Parsed: file paths → scan. Stack traces → failure records. Service names → KG entities.building

pip/npm → dependency entities. Port/hostname → network topology. Always → observer.building

Existing Files on Disk

background file scanner — scheduled worker

P0▾

Worker walks configured directories on schedule. Hash-indexed — only rescans changed files.building

Skips: node_modules, .git, __pycache__, binaries, >1MB. Config via .helix/config.yml.building

Git Events

POST /capture/git — post-commit hook

P1▾

helix init installs post-commit hook. Diff + commit message → decision record.building

Stack Traces

auto via SSH capture stderr

P1▾

Python/Node/Go format detection. Extracts file, function, line, error type → failure record + KG.building

Conversation Extraction

async worker — automatic post-session

P1▾

Async worker extracts decisions, failures, patterns. helix_exchange_post becomes optional.building

Desktop Sidecar — Claude Desktop Capture

C:/tools/helix-sidecar/ · log_turn + flush_session · MCP stdio · installed

installed∨

Installed at C:/tools/helix-sidecar/server.py. Registered as helix-sidecar MCP server in Desktop config.json alongside helix-node and ssh-vps2.live

log_turn() — Claude calls after every response to capture conversation turns into Helix Pipeline A.live

flush_session() — batch-writes buffered turns at natural breakpoints.live

backfill.py — one-time backfill of existing Desktop conversation history.live

Requires system prompt: “After every response call log_turn(). Call flush_session() at breakpoints.” Relies on Claude cooperation — interim until HTTPS proxy (Layer 3) ships.needs system prompt

Desktop needs restart to activate. Node agent (helix-node) also installed: file_write/read/patch/list/move/delete/command on local machine.needs restart

Package Dependencies

SSH capture + watcher

P2▾

SSH capture catches pip/npm install. Periodic scan of requirements.txt, package.json, go.mod.building

Config Files

infra atom parser — YAML / TOML / INI

P2▾

Docker Compose → container/port/network entities in KG. Infrastructure topology atoms.building

Runtime Logs

log tailer — signal only

P2▾

Tails configured containers. Signal-only: errors + warnings → log_event KB entries.building

File Moves / Renames

POST /capture/move

P2▾

Updates path labels on atoms by content hash. No migration — label update only.building

Import Resolution

post-scan dependency resolver

P3▾

Resolve imports → index package public API as external atoms.building

Browser / MemBrain Hook

POST /capture/browser — opt-in

P3▾

Visiting GitHub/docs.*/npm → optionally POST page content. Opt-in per domain.building

DB Schema / Migrations

POST /capture/migration

P3▾

sql_diff + name → schema entities in KG. Migration name → decision record.building

pipeline B scaffold + context → MemBrain pre-generation injection

04 — pre-generation scaffold injection — before the llm generates

MemBrain — Pre-Generation Injection

extension content script — intercepts outgoing messages before Claude

building — phase 6▾

The LLM never sees a blank page. MemBrain intercepts the outgoing message, queries the enrichment engine, injects scaffold context before the message reaches Claude. User workflow unchanged.

Signal 1 — Message intent: tokens matched against atom vocabulary from your codebase.building

Signal 2 — Context: file/directory/project from recent helix_action calls.building

Signal 3 — Recency window: just wrote a model → scaffold biases toward router or test.building

Injected as <helix_scaffold>: imports[], boilerplate, related_files[], confidence. Round trip <200ms.building

llm completes scaffold · output → helix_action → pipeline A · cycle improves

LLM Output → Compression Loop

CONTENT OUTPUT — response loop · symbol injection

live▾

LLM completes scaffold. Output → helix_action → Pipeline A compresses → atom store improves.live

§ symbols replace high-frequency phrases. Token savings compound every session.live

Every generation makes the next generation cheaper. Scaffolds improve. LLM writes less from scratch.improving

📥

Action Arrives

helix_action receives any operation

→

🗜️

Pipeline A Compresses

Raw code → atoms → KG → KB → embeddings

→

🔍

Pipeline B Decompresses

Intent → embeddings → scaffold in <200ms

→

✍️

LLM Completes

Fills what's novel. Output feeds Pipeline A. Cycle improves.

04b — output layer — synapse · cockpit · runbook — live

Entire output layer is live — these were missing from this site. Synapse (9,077 chunks searchable), Cockpit (692 anomalies, 340 nudges, 12 project states), Runbook (8 pages, dynamic injection), Exchange System (420 entries, auto-compiled) are all running in production.

Synapse — Context Assembler + Conversation RAG

services/synapse.py · 9,077 chunks · FTS5 + pgvector hybrid

live∨

assemble_context() calls conversation_store.hybrid_search() — real chunks returned.live

/context/inject endpoint — tiered injection: Tier 1 (400 tokens, every message), Tier 2 (800 tokens, session start), Tier 3 (on-demand search).live

ext_ingest auto-summarization: every MemBrain flush → Haiku background job → decisions → structured_archive.live

9,077 conversation chunks / 601 sessions indexed. 6,928 structured archive entries.live

Cockpit — Intelligence Dashboard

692 anomalies · 340 nudges · 12 project states · API live

live∨

692 anomalies (detected contradictions/regressions with lifecycle state).live

340 nudges (actionable suggestions with lifecycle state).live

12 project_state rows — one-line status per active project. Replaces master context blob.live

/cockpit/overview, /cockpit/anomalies, /cockpit/nudges, /cockpit/sessions APIs live. Dashboard UI specced.live

Runbook — Dynamic Context Injection

8 pages live · trigger list · Tier 1 always-on (400 tokens)

live∨

Replaces 3,000-token master context blob. 8 pages. Tier 1: runbook + alerts + project_state + dictionary on every call.live

Trigger list: SSH work → operating-procedures, writing code → conventions, gateway changes → nohup warning, etc.live

Exchange System

420 exchanges · auto-compiled · helix_exchange_post + helix_exchange_search

live∨

Auto-exchange compilation: observer.js detects 30s gap → compiles exchange → auto-POSTs. No manual saves needed.live

Intelligence write tools: helix_archive_record, helix_entity_upsert, helix_relationship_create — all live MCP tools.live

1,036 per-user compression_profiles. 619 decisions. 584 conventions. 692 anomalies. All in Postgres.live

05 — storage + self-contained backup — no host cron

blob store

Garage building

helix-workspace/ (file version blobs)

helix-backups/postgres · neo4j · redis

Apache 2.0 · ~40MB idle · internal only

STORAGE_MODE: embedded / external / milly

relational + vector

Postgres + pgvector live

KB documents (FTS5 + semantic)

Atom pool (990+ atoms)

Symbol dictionary (400+)

Backup: pg_dump → Garage every 6hrs

graph database

Neo4j live

463 entities live

DEPENDS_ON · CONTAINS · TOUCHES · PRODUCES

Backup → Garage daily

cache + scheduler

Redis + APScheduler building

Cache + worker queue

Postgres: 6hrs · Neo4j+Redis: daily → Garage

POST /admin/restore · No host cron

06 — build phases — in order

Phase 1 — Core Pipeline (~16 hrs)

enrichment engine · both pipelines · unified file ops

P0▾

1a. helix_action unified tool — all 8 action types✅ live

1b. Smart content detector — code|config|prose|data|binary~2 hrs

1c. Background file scanner — scheduled worker, hash-indexed~4 hrs

1d. Pipeline B — intent parser + atom matcher + cascader + assembler~6 hrs

Phase 2 — Knowledge Capture (~11 hrs)

git · ssh · auto extraction · stack traces

P1▾

2a. SSH capture endpoint + gateway webhook~3 hrs

2b. Git hook — helix init + post-commit~2 hrs

2c. Auto conversation extraction worker~3 hrs

2d. Stack trace parser~2 hrs

2e. Package dependency watcher~1 hr

Phase 3 — Runtime Intelligence (~6 hrs)

log tailer · config parser

P2▾

3a. Log tailer — signal-only filtering~3 hrs

3b. Config file infra atom parser — YAML/TOML/INI → KG~3 hrs

Phase 4 — Forge Internalization (~1 day)

absorb scanner + workspace into helix-cortex

critical▾

Move scanner.py, workspace.py into helix-cortex. Remove external HTTP to The Forge.~1 day

Deprecate the-forge container. Wire Garage into blob storage. Remove MinIO from product.~

Phase 5 — Backup Scheduler (~6 hrs)

APScheduler · Garage · restore endpoint

critical▾

APScheduler in helix-cortex. Postgres 6hrs, Neo4j+Redis daily → Garage.~3 hrs

POST /admin/restore. Retention enforcement. No host cron.~3 hrs

Phase 6 — MemBrain Phase 2 (~half day)

pre-generation scaffold injection

phase 6▾

Intercept layer + intent parser (JS) + scaffold injection before send. Transparent to user.~4 hrs

Phases 7–9 — P3 + Federation

import resolution · browser hook · multi-node

post-v1▾

Import resolution · Browser hook · DB migration hook · Multi-node federationpost-v1

To v1.0: Phases 1–5 ≈ 3–4 focused sessions. ~45 hours total. All inside helix-cortex. Only new container: helix-garage — already running.

07 — product layers — path to multi-tenant platform

The product is built in 7 layers. Each layer unlocks the next. Layer 0 is the foundation — without it nothing else ships. To first paying client: Layers 0 + 1 + 2 + 6d ≈ 6–8 sessions. Full product: ~15–20 sessions.

Layer 0 — Multi-Tenancy

foundation — everything else requires this

build first▾

tenants table — tenant_id, subdomain, api_key_hash, plan, helix_schema, node_agent_tokenbuilding

Schema-per-tenant in existing Postgres. Not container-per-tenant until enterprise needed.building

RLS policies on all existing tables. Every query scoped to tenant_id automatically.building

Traefik wildcard: *.helix.millyweb.com → Helix with tenant extracted from subdomain.building

API key auth middleware on all Helix endpoints. Tenant provisioning endpoint.building

Unlocks: Ashley's instance, client instances, installer token flow, usage metering, collective intelligence isolation.

Layer 1 — Helix as Gateway

100% observer coverage — core product differentiator

~2–3 sessions▾

1a. stdio transport — Desktop spawns Helix directly. One config entry replaces three. 100% observer coverage on all Desktop tool calls.~2 hrs

1b. Tool routing in /mcp/ — Helix proxies any registered MCP server. Provisioner becomes optional. Every tool call logged before routing.~1 session

1c. helix_init() tool — fires on first Desktop connect. Checks if node agent registered. Returns setup URL with pre-auth token if not.~2 hrs

Why it matters: Helix can only learn from what it sees. Making it the gateway means it sees everything. 30% coverage today → 100%.

Layer 2 — Frictionless Installer

OAuth-style onboarding — non-technical clients

~1–2 sessions▾

Setup landing page — helix.millyweb.com/setup?token=XYZ. OS detection. Token pre-authenticates — user never types URL or API key.building

Windows installer (.exe / MSIX) — installs node agent as service, generates + installs CA cert, configures proxy, patches Desktop config. One UAC prompt.building

Mac installer (.pkg) — same flow. LaunchAgent for background service. Keychain cert install.building

User experience: click link in Claude → browser opens → download .exe → run → one Yes → restart Desktop. Identical to installing Zoom.

Layer 3 — Desktop Capture

fully transparent — MemBrain parity on Desktop

~1 session▾

HTTPS proxy (bidirectional) — intercepts Desktop → api.anthropic.com. Injects <helix_context> into outgoing messages. Captures responses. Transparent, no Claude cooperation.building

IndexedDB watcher (fallback) — watches Desktop LevelDB files. Catches any turns the proxy misses. Runs as part of node agent service.building

Desktop does NOT pin certificates. Uses system cert store. Electron apps respect installed trusted CAs.

Layer 4 — Enrichment Engine

intelligence gets better — phases 1b through 2

~3–4 sessions▾

Phase 1b: Smart content detector — code|config|prose|data|binary. Shared by both pipelines.~2 hrs

Phase 1c: Background file scanner — scheduled worker, hash-indexed, walks /opt/projects.~4 hrs

Phase 1d: Pipeline B scaffolding engine — intent parser + atom matcher + cascader + assembler.~6 hrs

Phase 2: Capture surfaces — SSH capture, git hooks, conversation extraction, stack trace parser.~11 hrs

Layer 5 — Collective Intelligence

network effect — gets smarter with every client

~2–3 sessions▾

Shared atom catalog — structural fingerprints (normalized hashes) federated opt-in. No code shared. Client A solves a pattern → Client B gets scaffold on day 1.building

Failure pattern federation — error signature + known fixes shared anonymously. “2,400 Helix users hit this error. Here are the 3 solutions.”building

Symbol compression sharing — high-frequency phrases promoted to global symbol table. Every tenant benefits from everyone’s compression savings.building

Private layer always: file content, conversation prose, credentials. Shared layer: structural fingerprints, boilerplate, error signatures. Opt-in: patterns, decisions.

Layer 6 — Product Surface

paying clients — marketplace — onboarding

~2–3 sessions▾

MillyGate tenant routing — *.helix.millyweb.com subdomain per tenant. ryan / ashley / client-X all isolated, all on same infrastructure.building

MCP server marketplace — catalog of 47+ servers. “Connect” button → credential form → Infisical injection → provisioner registration. Smithery experience + Helix intelligence.building

Usage metering dashboard — observer already logs every tool call with timing. Tool calls/day, tokens saved, atoms extracted, sessions. Billing hooks built in.building

Tenant onboarding flow — sign up → subdomain → API key → config snippet pre-filled → helix_init() handles the rest.building

👥

Client Uses Claude

Any surface — Desktop, web, API

→

👁️

Helix Sees Everything

Gateway = 100% coverage. Every tool call observed.

→

🧠

Intelligence Compounds

Atoms, patterns, failures, symbols — all improve

→

🌍

Everyone Benefits

Shared catalog gets richer. New clients start smarter.

The pitch: “AI that gets smarter every time you use it — and the more people use it, the smarter it gets for everyone.” Training data is frozen. Helix compounds daily from real users solving real problems.

THE COMPLETEENRICHMENT FLOW

Action Arrives

Pipeline A Compresses

Pipeline B Decompresses

LLM Completes

Client Uses Claude

Helix Sees Everything

Intelligence Compounds

Everyone Benefits

THE COMPLETE
ENRICHMENT FLOW