// full feature list — live + proposed — v1.0 build
THE COMPLETE
ENRICHMENT FLOW
Every action, every channel, every capture surface — live and proposed together. One entry point. Two parallel pipelines. Nothing invisible.
Live — Core
Live — KB+KG
Live — Scanner
Live — Symbols
Live — Scaffold
Capture Surface
Proposed
01 — intake — single entry point
helix_action — Unified Action Tool
helix-cortex / routers / action.py — the only tool the model needs to call
The model calls one thing. Helix decides what to do with it. No cognitive load about pipelines, scanning, or which tool to use. All intelligence lives inside Helix — not in the model's head.
file_write — Full pipeline: version → scan → KB → KG → observer → scaffold returned.buildingfile_patch — Targeted str_replace. JSON payload — no shell escaping issues.buildingfile_move — Moves on disk + updates KG path labels. Content hash is identity, path is a label.buildingfile_delete — Deletes + logs to observer. Deletions are never invisible.buildingfile_read — Returns content with metadata. Read-only — no pipeline fires.livefile_list — Directory listing. Read-only — no pipeline fires.livecommand — Executes on server. Auto-detects output: stack traces, service names, package installs → KG. Always → observer.buildingscaffold_query — Explicit scaffold request. Returns imports[], boilerplate, related_files[], confidence.buildingevery action passes through the smart content detector
Smart Content Detector
helix-cortex / services / detector.py — shared by both pipelines and all 13 capture surfaces
Classifies any content: code | config | prose | structured-data | binary. Selects pipeline automatically.building
Used by every input surface: helix_action, file scanner, SSH capture, conversation chunks, git diffs, log lines.building
Config: YAML/TOML/INI → infra atoms. Docker Compose → container + port + network entities in KG.building
Stack traces: Python/Node/Go format detection → file, function, error type → failure record.building
fans into two parallel pipelines simultaneously — both fire on every action
02 — parallel pipelines — fire simultaneously on every action
Pipeline A — Intelligence · async · fire-and-forget · never blocks response
Atom Scanner
SCANNER / ATOMS — tier 1 · 2 · 3
Tier 1: Tree-sitter — 100+ languages, ~$0.00/file.live
Tier 2: Haiku LLM — purpose, type, quality, reuse potential. ~$0.001/file.live
Tier 3: Self-generated heuristics. After N analyses, generates own rules. $0.00 forever.live
Structural fingerprint per atom. Matching fingerprints auto-register as templates.live
Config parser plugin: Docker Compose → KG entities.building
Import resolution (P3): scan imports → package public API as atoms.p3
Knowledge Base + Knowledge Graph
CORTEX / KB + KG — postgres (pgvector) + neo4j
KB: FTS5 + pgvector semantic search. 158+ docs indexed.live
Entity upsert: container, service, port, domain, project, tool, person, dependency. 463 entities.live
Relationships: DEPENDS_ON, CONTAINS, TOUCHES, PRODUCES.live
Exchange extraction: decisions/failures/patterns auto-extracted async.building
Operational patterns: restart sequences, port conflicts — learned from SSH stream.building
Symbol Engine
COMPRESSION — § promotion · token savings
High-frequency phrase tracking. Daily promotion. 400+ active symbols.live
§ prefix namespace. LLM receives full phrase — context carries compressed form.live
Observer
LOGGING — every action · archive collections
Logs every tool call: type, params, result, timing, session_id.live
Archive collections: decisions, failures, patterns, sessions, project_archive, snapshots.live
Auto conversation extraction (P1): async worker — helix_exchange_post becomes optional.p1
Pipeline B — Scaffolding · fast · returns with action result · <200ms
Intent Parser
helix-cortex / services / intent_parser.py
Signal 1 — Message intent: tokens matched against atom vocabulary from your codebase.building
Signal 2 — Current context: file/directory/project from recent helix_action calls.building
Signal 3 — Recency window: just wrote a model → bias scaffold toward router or test.building
Vocabulary grows automatically. Skip logic: no tokens, read-only, low confidence → skip.building
Atom Matcher
pgvector similarity search — 990 atoms live
Structural similarity search on pgvector embeddings. Returns top-N atoms by type.building
Context bias: same project/directory atoms weighted higher.building
Confidence 0.0–1.0. 990 atoms live today, growing every session.990 live
Dependency Cascader
KG traversal — related files + cascade suggestions
KG traversal: what files relate to this path? What imports this? What depends on this config?building
Cascade: new router → register in main.py + test file. New service → compose + Traefik + health check.building
Scaffold Assembler
boilerplate + imports + type signatures + test stubs
Combines matched atoms into scaffold. Returns
imports[], boilerplate, related_files[], confidence.buildingTest stub generation: implementation written → test file scaffold returned automatically.building
Flywheel: Claude completes scaffold → Pipeline A learns → next scaffold improves.building
pipeline A stores atoms · pipeline B scaffold + context → MemBrain injection
03 — capture surfaces — all inputs to pipeline A
13 CAPTURE SURFACES·everything flows through the enrichment engine·nothing is invisible·you never configure what gets scanned
File Operations
helix_action file_write / patch / move / delete
file_write: full pipeline. Blob stored in Garage.live
file_patch: targeted str_replace. No sed mangles.live
file_move: disk move + KG path label update by content hash.live
file_delete: delete + observer log.live
SSH Command Capture
POST /capture/ssh — auto from gateway
Every gateway__ssh_execute auto-posts. One webhook = 100% SSH coverage.building
Parsed: file paths → scan. Stack traces → failure records. Service names → KG entities.building
pip/npm → dependency entities. Port/hostname → network topology. Always → observer.building
Existing Files on Disk
background file scanner — scheduled worker
Worker walks configured directories on schedule. Hash-indexed — only rescans changed files.building
Skips: node_modules, .git, __pycache__, binaries, >1MB. Config via .helix/config.yml.building
Git Events
POST /capture/git — post-commit hook
helix init installs post-commit hook. Diff + commit message → decision record.building
Stack Traces
auto via SSH capture stderr
Python/Node/Go format detection. Extracts file, function, line, error type → failure record + KG.building
Conversation Extraction
async worker — automatic post-session
Async worker extracts decisions, failures, patterns. helix_exchange_post becomes optional.building
Desktop Sidecar — Claude Desktop Capture
C:/tools/helix-sidecar/ · log_turn + flush_session · MCP stdio · installed
Installed at
C:/tools/helix-sidecar/server.py. Registered as helix-sidecar MCP server in Desktop config.json alongside helix-node and ssh-vps2.livelog_turn() — Claude calls after every response to capture conversation turns into Helix Pipeline A.liveflush_session() — batch-writes buffered turns at natural breakpoints.livebackfill.py — one-time backfill of existing Desktop conversation history.liveRequires system prompt: “After every response call log_turn(). Call flush_session() at breakpoints.” Relies on Claude cooperation — interim until HTTPS proxy (Layer 3) ships.needs system prompt
Desktop needs restart to activate. Node agent (helix-node) also installed: file_write/read/patch/list/move/delete/command on local machine.needs restart
Package Dependencies
SSH capture + watcher
SSH capture catches pip/npm install. Periodic scan of requirements.txt, package.json, go.mod.building
Config Files
infra atom parser — YAML / TOML / INI
Docker Compose → container/port/network entities in KG. Infrastructure topology atoms.building
Runtime Logs
log tailer — signal only
Tails configured containers. Signal-only: errors + warnings → log_event KB entries.building
File Moves / Renames
POST /capture/move
Updates path labels on atoms by content hash. No migration — label update only.building
Import Resolution
post-scan dependency resolver
Resolve imports → index package public API as external atoms.building
Browser / MemBrain Hook
POST /capture/browser — opt-in
Visiting GitHub/docs.*/npm → optionally POST page content. Opt-in per domain.building
DB Schema / Migrations
POST /capture/migration
sql_diff + name → schema entities in KG. Migration name → decision record.building
pipeline B scaffold + context → MemBrain pre-generation injection
04 — pre-generation scaffold injection — before the llm generates
MemBrain — Pre-Generation Injection
extension content script — intercepts outgoing messages before Claude
The LLM never sees a blank page. MemBrain intercepts the outgoing message, queries the enrichment engine, injects scaffold context before the message reaches Claude. User workflow unchanged.
Signal 1 — Message intent: tokens matched against atom vocabulary from your codebase.building
Signal 2 — Context: file/directory/project from recent helix_action calls.building
Signal 3 — Recency window: just wrote a model → scaffold biases toward router or test.building
Injected as
<helix_scaffold>: imports[], boilerplate, related_files[], confidence. Round trip <200ms.buildingllm completes scaffold · output → helix_action → pipeline A · cycle improves
LLM Output → Compression Loop
CONTENT OUTPUT — response loop · symbol injection
LLM completes scaffold. Output → helix_action → Pipeline A compresses → atom store improves.live
§ symbols replace high-frequency phrases. Token savings compound every session.live
Every generation makes the next generation cheaper. Scaffolds improve. LLM writes less from scratch.improving
Action Arrives
helix_action receives any operation
→
Pipeline A Compresses
Raw code → atoms → KG → KB → embeddings
→
Pipeline B Decompresses
Intent → embeddings → scaffold in <200ms
→
LLM Completes
Fills what's novel. Output feeds Pipeline A. Cycle improves.
04b — output layer — synapse · cockpit · runbook — live
Entire output layer is live — these were missing from this site. Synapse (9,077 chunks searchable), Cockpit (692 anomalies, 340 nudges, 12 project states), Runbook (8 pages, dynamic injection), Exchange System (420 entries, auto-compiled) are all running in production.
Synapse — Context Assembler + Conversation RAG
services/synapse.py · 9,077 chunks · FTS5 + pgvector hybrid
assemble_context() calls conversation_store.hybrid_search() — real chunks returned.live
/context/inject endpoint — tiered injection: Tier 1 (400 tokens, every message), Tier 2 (800 tokens, session start), Tier 3 (on-demand search).live
ext_ingest auto-summarization: every MemBrain flush → Haiku background job → decisions → structured_archive.live
9,077 conversation chunks / 601 sessions indexed. 6,928 structured archive entries.live
Cockpit — Intelligence Dashboard
692 anomalies · 340 nudges · 12 project states · API live
692 anomalies (detected contradictions/regressions with lifecycle state).live
340 nudges (actionable suggestions with lifecycle state).live
12 project_state rows — one-line status per active project. Replaces master context blob.live
/cockpit/overview, /cockpit/anomalies, /cockpit/nudges, /cockpit/sessions APIs live. Dashboard UI specced.live
Runbook — Dynamic Context Injection
8 pages live · trigger list · Tier 1 always-on (400 tokens)
Replaces 3,000-token master context blob. 8 pages. Tier 1: runbook + alerts + project_state + dictionary on every call.live
Trigger list: SSH work → operating-procedures, writing code → conventions, gateway changes → nohup warning, etc.live
Exchange System
420 exchanges · auto-compiled · helix_exchange_post + helix_exchange_search
Auto-exchange compilation: observer.js detects 30s gap → compiles exchange → auto-POSTs. No manual saves needed.live
Intelligence write tools: helix_archive_record, helix_entity_upsert, helix_relationship_create — all live MCP tools.live
1,036 per-user compression_profiles. 619 decisions. 584 conventions. 692 anomalies. All in Postgres.live
05 — storage + self-contained backup — no host cron
blob store
Garage building
helix-workspace/ (file version blobs)
helix-backups/postgres · neo4j · redis
Apache 2.0 · ~40MB idle · internal only
STORAGE_MODE: embedded / external / milly
relational + vector
Postgres + pgvector live
KB documents (FTS5 + semantic)
Atom pool (990+ atoms)
Symbol dictionary (400+)
Backup: pg_dump → Garage every 6hrs
graph database
Neo4j live
463 entities live
DEPENDS_ON · CONTAINS · TOUCHES · PRODUCES
Backup → Garage daily
cache + scheduler
Redis + APScheduler building
Cache + worker queue
Postgres: 6hrs · Neo4j+Redis: daily → Garage
POST /admin/restore · No host cron
06 — build phases — in order
Phase 1 — Core Pipeline (~16 hrs)
enrichment engine · both pipelines · unified file ops
1a. helix_action unified tool — all 8 action types✅ live
1b. Smart content detector — code|config|prose|data|binary~2 hrs
1c. Background file scanner — scheduled worker, hash-indexed~4 hrs
1d. Pipeline B — intent parser + atom matcher + cascader + assembler~6 hrs
Phase 2 — Knowledge Capture (~11 hrs)
git · ssh · auto extraction · stack traces
2a. SSH capture endpoint + gateway webhook~3 hrs
2b. Git hook — helix init + post-commit~2 hrs
2c. Auto conversation extraction worker~3 hrs
2d. Stack trace parser~2 hrs
2e. Package dependency watcher~1 hr
Phase 3 — Runtime Intelligence (~6 hrs)
log tailer · config parser
3a. Log tailer — signal-only filtering~3 hrs
3b. Config file infra atom parser — YAML/TOML/INI → KG~3 hrs
Phase 4 — Forge Internalization (~1 day)
absorb scanner + workspace into helix-cortex
Move scanner.py, workspace.py into helix-cortex. Remove external HTTP to The Forge.~1 day
Deprecate the-forge container. Wire Garage into blob storage. Remove MinIO from product.~
Phase 5 — Backup Scheduler (~6 hrs)
APScheduler · Garage · restore endpoint
APScheduler in helix-cortex. Postgres 6hrs, Neo4j+Redis daily → Garage.~3 hrs
POST /admin/restore. Retention enforcement. No host cron.~3 hrs
Phase 6 — MemBrain Phase 2 (~half day)
pre-generation scaffold injection
Intercept layer + intent parser (JS) + scaffold injection before send. Transparent to user.~4 hrs
Phases 7–9 — P3 + Federation
import resolution · browser hook · multi-node
Import resolution · Browser hook · DB migration hook · Multi-node federationpost-v1
To v1.0: Phases 1–5 ≈ 3–4 focused sessions. ~45 hours total. All inside helix-cortex. Only new container: helix-garage — already running.
07 — product layers — path to multi-tenant platform
The product is built in 7 layers. Each layer unlocks the next. Layer 0 is the foundation — without it nothing else ships. To first paying client: Layers 0 + 1 + 2 + 6d ≈ 6–8 sessions. Full product: ~15–20 sessions.
Layer 0 — Multi-Tenancy
foundation — everything else requires this
tenants table — tenant_id, subdomain, api_key_hash, plan, helix_schema, node_agent_tokenbuildingSchema-per-tenant in existing Postgres. Not container-per-tenant until enterprise needed.building
RLS policies on all existing tables. Every query scoped to tenant_id automatically.building
Traefik wildcard:
*.helix.millyweb.com → Helix with tenant extracted from subdomain.buildingAPI key auth middleware on all Helix endpoints. Tenant provisioning endpoint.building
Unlocks: Ashley's instance, client instances, installer token flow, usage metering, collective intelligence isolation.
Layer 1 — Helix as Gateway
100% observer coverage — core product differentiator
1a. stdio transport — Desktop spawns Helix directly. One config entry replaces three. 100% observer coverage on all Desktop tool calls.~2 hrs
1b. Tool routing in /mcp/ — Helix proxies any registered MCP server. Provisioner becomes optional. Every tool call logged before routing.~1 session
1c. helix_init() tool — fires on first Desktop connect. Checks if node agent registered. Returns setup URL with pre-auth token if not.~2 hrs
Why it matters: Helix can only learn from what it sees. Making it the gateway means it sees everything. 30% coverage today → 100%.
Layer 2 — Frictionless Installer
OAuth-style onboarding — non-technical clients
Setup landing page — helix.millyweb.com/setup?token=XYZ. OS detection. Token pre-authenticates — user never types URL or API key.building
Windows installer (.exe / MSIX) — installs node agent as service, generates + installs CA cert, configures proxy, patches Desktop config. One UAC prompt.building
Mac installer (.pkg) — same flow. LaunchAgent for background service. Keychain cert install.building
User experience: click link in Claude → browser opens → download .exe → run → one Yes → restart Desktop. Identical to installing Zoom.
Layer 3 — Desktop Capture
fully transparent — MemBrain parity on Desktop
HTTPS proxy (bidirectional) — intercepts Desktop → api.anthropic.com. Injects <helix_context> into outgoing messages. Captures responses. Transparent, no Claude cooperation.building
IndexedDB watcher (fallback) — watches Desktop LevelDB files. Catches any turns the proxy misses. Runs as part of node agent service.building
Desktop does NOT pin certificates. Uses system cert store. Electron apps respect installed trusted CAs.
Layer 4 — Enrichment Engine
intelligence gets better — phases 1b through 2
Phase 1b: Smart content detector — code|config|prose|data|binary. Shared by both pipelines.~2 hrs
Phase 1c: Background file scanner — scheduled worker, hash-indexed, walks /opt/projects.~4 hrs
Phase 1d: Pipeline B scaffolding engine — intent parser + atom matcher + cascader + assembler.~6 hrs
Phase 2: Capture surfaces — SSH capture, git hooks, conversation extraction, stack trace parser.~11 hrs
Layer 5 — Collective Intelligence
network effect — gets smarter with every client
Shared atom catalog — structural fingerprints (normalized hashes) federated opt-in. No code shared. Client A solves a pattern → Client B gets scaffold on day 1.building
Failure pattern federation — error signature + known fixes shared anonymously. “2,400 Helix users hit this error. Here are the 3 solutions.”building
Symbol compression sharing — high-frequency phrases promoted to global symbol table. Every tenant benefits from everyone’s compression savings.building
Private layer always: file content, conversation prose, credentials. Shared layer: structural fingerprints, boilerplate, error signatures. Opt-in: patterns, decisions.
Layer 6 — Product Surface
paying clients — marketplace — onboarding
MillyGate tenant routing — *.helix.millyweb.com subdomain per tenant. ryan / ashley / client-X all isolated, all on same infrastructure.building
MCP server marketplace — catalog of 47+ servers. “Connect” button → credential form → Infisical injection → provisioner registration. Smithery experience + Helix intelligence.building
Usage metering dashboard — observer already logs every tool call with timing. Tool calls/day, tokens saved, atoms extracted, sessions. Billing hooks built in.building
Tenant onboarding flow — sign up → subdomain → API key → config snippet pre-filled → helix_init() handles the rest.building
Client Uses Claude
Any surface — Desktop, web, API
→
Helix Sees Everything
Gateway = 100% coverage. Every tool call observed.
→
Intelligence Compounds
Atoms, patterns, failures, symbols — all improve
→
Everyone Benefits
Shared catalog gets richer. New clients start smarter.
The pitch: “AI that gets smarter every time you use it — and the more people use it, the smarter it gets for everyone.” Training data is frozen. Helix compounds daily from real users solving real problems.