Architecture

EchOS is structured as a pnpm monorepo where a central agent core handles all reasoning, and separate packages implement each interface and storage layer. This document covers how a message moves through the system, how packages depend on each other, and how the three-layer storage architecture keeps data consistent.

Data Flow

User Input (any interface)
    │
    ▼
┌─────────────────────────────┐
│  Interface Adapter           │
│  (Telegram / Web / CLI)      │
│  - Auth verification         │
│  - Message normalization     │
│  - Response streaming        │
└──────────┬──────────────────┘
           │
           ▼
┌─────────────────────────────┐
│  Agent Core (pi-agent-core)  │
│  - System prompt + context   │
│  - LLM reasoning (pi-ai)    │
│  - Tool selection & calling  │
│  - Session persistence       │
└──────────┬──────────────────┘
           │
     ┌─────┼─────────┐
     ▼     ▼         ▼
┌───────┐ ┌────────┐ ┌──────────┐
│ Core  │ │ Plugin │ │ Scheduler│
│ Tools │ │ Tools  │ │ (BullMQ) │
└───┬───┘ └───┬────┘ └─────┬───┘
    │         │            │
    └─────────┴────────────┘
              │
              ▼
┌─────────────────────────────┐
│  Storage Layer               │
│  - Markdown files (source    │
│    of truth, git-friendly)   │
│  - SQLite (metadata + FTS5)  │
│  - LanceDB (vector search)  │
└─────────────────────────────┘

Package Dependencies

@echos/shared          ← no dependencies (types, config, security, logging, NotificationService)
@echos/core            ← shared (storage, agent, plugin system)
@echos/telegram        ← shared, core (grammY bot, notification service)
@echos/web             ← shared, core (Fastify server)
@echos/cli             ← shared, core, plugin-article, plugin-youtube, plugin-twitter (CLI binary — standalone terminal interface)
@echos/scheduler       ← shared, core, plugin-article, plugin-youtube (BullMQ workers)
@echos/plugin-youtube  ← shared, core (YouTube transcript extraction)
@echos/plugin-article  ← shared, core (web article extraction)
@echos/plugin-twitter  ← shared, core (Twitter/X tweet and thread extraction)

Scheduler & Notifications

The scheduler package (@echos/scheduler) runs background jobs via BullMQ + Redis. It is always enabled and requires a running Redis instance. Notification delivery is decoupled via NotificationService (defined in @echos/shared). The Telegram package provides the concrete implementation; the scheduler receives it via dependency injection and never imports @echos/telegram directly. When Telegram is disabled, a log-only fallback is used. Workers:

Digest: Creates a throwaway AI agent to summarize recent notes and reminders, broadcasts the result
Reminder check: Queries SQLite for overdue reminders and sends notifications
Content processing: Processes article/YouTube URLs queued by the agent

See Scheduler for configuration and usage details.

Export Utility

Location: packages/core/src/export/index.ts The export utility provides pure serialization functions for converting notes into downloadable file formats. It is used by the export_notes agent tool and is independent of any interface or storage layer.

Formats

Format	Function	Output
`markdown`	`exportToMarkdown(note)`	Full markdown file with YAML frontmatter (reads raw file from disk when available; reconstructs from SQLite otherwise)
`text`	`exportToText(note)`	Plain text with markdown syntax stripped (headings, bold, links, list markers, etc. removed)
`json`	`exportToJson(notes)`	JSON array of `{ metadata, content }` objects
`zip`	`exportToZip(notes)`	ZIP archive of `.md` files (one per note), deduplicated filenames

`export_notes` Agent Tool

Location: packages/core/src/agent/tools/export-notes.ts The tool selects notes (by ID or filter), serializes them, and returns an ExportFileResult JSON string in its tool result. Interfaces intercept this via the tool_execution_end agent event (which exposes event.result) and deliver the file:

Single note, markdown/text → returned inline (no file written to disk)
Multiple notes, or json/zip format → written to data/exports/export-{timestamp}.{ext}

Auto-format upgrade: if multiple notes are requested with markdown or text format, the tool automatically upgrades to zip.

Exports Directory

Export files are written to data/exports/ (configurable via exportsDir in AgentDeps). Files are cleaned up automatically after 1 hour by the export cleanup cron job in the scheduler (export-cleanup, runs hourly at 0 * * * *).

Interface Delivery

Interface	Delivery mechanism
Telegram	`ctx.replyWithDocument(new InputFile(buffer, fileName))` after agent completes; temp file deleted immediately
CLI	Inline content → stdout; file exports → `--output` path or `./fileName` in CWD; path printed to stderr
Web	`GET /api/export/:fileName` download endpoint; agent includes the URL in its text response

Plugin Architecture

Content processors live in plugins/ as separate workspace packages. Each plugin:

Implements the EchosPlugin interface from @echos/core
Returns agent tools from its setup(context) method
Receives a PluginContext with storage, embeddings, logger, and config
Is registered via PluginRegistry in the application entry point

Core tools (create_note, search, get, list, update, delete, reminders, memory, linking, categorize_note, save_conversation, mark_content, export_notes) remain in @echos/core. Domain-specific processors (YouTube, article, image, etc.) are plugins. Plugins can optionally use the AI categorization service from @echos/core to automatically extract category, tags, gist, summary, and key points from content. See Categorization for details.

Available Plugins

article: Web article extraction using Readability
youtube: YouTube video transcript extraction
twitter: Twitter/X tweet and thread extraction via FxTwitter API (no API key required)
image: Image storage with metadata extraction (format, dimensions, EXIF)
content-creation: Content generation tools

See Building a Plugin for detailed plugin documentation.

Storage Architecture

SQLite (better-sqlite3): Structured metadata index, FTS5 full-text search, memory store, reminders. The memory table stores long-term personal facts with a confidence score (0–1) and kind (fact, preference, person, project, expertise). Notes also store a content_hash (SHA-256) used to detect changes and skip unnecessary re-embedding. The status column tracks content lifecycle (saved, read, archived) and input_source records how content was captured (text, voice, url, file, image). For images, additional columns store image_path (local file path), image_url (source URL), image_metadata (JSON with dimensions, format, EXIF), and ocr_text (for future OCR support). LanceDB (embedded): Vector embeddings for semantic search. No server process needed. Markdown files: Source of truth. YAML frontmatter with structured metadata. Directory layout: knowledge/{type}/{category}/{date}-{slug}.md. Images are stored in knowledge/image/{category}/{hash}.{ext} and referenced from markdown notes.

Storage Sync

EchOS keeps the three storage layers in sync automatically, even when markdown files are added or edited outside the application: Startup reconciliation (reconcileStorage in packages/core/src/storage/reconciler.ts): Runs once at boot. Scans all .md files in the knowledge directory and compares them against SQLite using the content_hash column:

New file → full upsert to SQLite + generate embedding in LanceDB
Content changed → update SQLite + re-embed (OpenAI called only when content hash differs)
File moved (same hash, different path) → update file path in SQLite only, no re-embed
No change → skipped entirely
SQLite record with no file on disk → deleted from SQLite and LanceDB

Live file watcher (createFileWatcher in packages/core/src/storage/watcher.ts): Uses chokidar to watch knowledge/**/*.md while the app is running. Events are debounced (500 ms) and awaitWriteFinish is enabled to handle atomic saves from editors (VS Code, Obsidian, etc.):

add / change → parse, compare content hash, upsert if changed (re-embed only on content change)
unlink → look up note by file path in SQLite, delete from SQLite + LanceDB

Both paths use the same content hash check, so the OpenAI embeddings API is only called when note body text actually changes — metadata-only edits (frontmatter, tags, title) do not trigger re-embedding.

Search

Hybrid search combines three strategies via Reciprocal Rank Fusion (RRF):

Keyword (FTS5): BM25-ranked full-text search across title, content, tags
Semantic (LanceDB): Cosine similarity on OpenAI embeddings
Hybrid: RRF fusion of keyword + semantic results

Memory System

Long-term memory (remember_about_me / recall_knowledge tools) uses a hybrid strategy to balance cost and recall:

At agent creation (including after /reset): the top 15 memories ranked by confidence DESC, updated DESC are injected directly into the system prompt as “Known Facts About the User”. This ensures core personal facts are always available without an explicit tool call.
On-demand retrieval: if more than 15 memories exist, recall_knowledge searches the full memory table using word-tokenised LIKE queries. The system prompt notes additional memories are available so the agent knows to use the tool.

This means /reset only clears the conversation history — all stored memories persist in SQLite and are reloaded into the next session automatically.

Custom Agent Message Types

EchOS extends the AgentMessage union from @mariozechner/pi-agent-core via TypeScript declaration merging in packages/core/src/agent/messages.ts.

`echos_context`

declare module '@mariozechner/pi-agent-core' {
  interface CustomAgentMessages {
    echos_context: EchosContextMessage;
  }
}

Used to inject structured context (e.g. current date/time) into each turn without string-concatenating it onto the user message in every interface adapter. The custom convertToLlm function (echosConvertToLlm) prepends the context content to the immediately following user message before the LLM call. Custom messages are preserved in agent.state.messages for debugging but never sent standalone to the LLM. Helpers exported from @echos/core:

createContextMessage(content) — creates an echos_context message
createUserMessage(content) — creates a typed user message

Usage in interfaces:

await agent.prompt([
  createContextMessage(`Current date/time: ${now.toISOString()} UTC`),
  createUserMessage(userInput),
]);

All interfaces (Telegram, Web, CLI) use this pattern.

AI Categorization — Streaming with Progressive JSON

The categorization service (packages/core/src/agent/categorization.ts) uses streamSimple from @mariozechner/pi-ai instead of a blocking fetch. As the LLM streams its JSON response, parseStreamingJson parses each partial chunk — which never throws, always returning {} on incomplete input. When new fields become fully formed in the partial JSON, an optional onProgress callback fires:

"Category: programming" — as soon as category is resolved
"Tags: typescript, api" — updated each time a new tag appears
"Gist: One sentence summary." — once the gist looks complete (>20 chars, ends with punctuation) — full mode only

Both categorizeLightweight and processFull accept onProgress?: (message: string) => void. Callers that don’t need progressive updates (e.g. the scheduler digest worker) pass no callback and get the same blocking behaviour as before.

Context Overflow Detection

The agent uses a two-layer approach to context window management: Layer 1 — Proactive pruning (createContextWindow in context-manager.ts): Runs before every LLM call via transformContext. Estimates token usage and slides the message window back to the nearest user-turn boundary until the budget fits. This should prevent overflows under normal operation. Layer 2 — Reactive detection (isAgentMessageOverflow in context-manager.ts): If a provider rejects the request despite pruning (e.g. single oversized message, model switch, token estimation drift), the last assistant message is checked against isContextOverflow from @mariozechner/pi-ai, which matches provider-specific error patterns for Anthropic, OpenAI, Gemini, Groq, Mistral, OpenRouter, and others. On overflow detection:

Telegram: Replies with “Conversation history is too long. Use /reset to start a new session.” instead of a raw provider error string.
Web API: Returns HTTP 413 with a structured error body ({ error: "Conversation history is too long. Please reset your session." }).

The helper isAgentMessageOverflow(message, contextWindow) is exported from @echos/core for use in any interface adapter.

Prompt Caching

Each agent instance is assigned a sessionId at creation time, forwarded to LLM providers that support session-aware prompt caching:

Interface	Session ID format
Telegram	`telegram-{userId}`
Web	`web-{userId}`
CLI (`pnpm echos`)	`cli-local`

pi-ai applies cache_control markers to the system prompt and last user message automatically. The TTL is controlled via the cacheRetention option passed to streamSimple — pi-ai also reads a legacy PI_CACHE_RETENTION env var for backward compatibility, but EchOS passes it explicitly so behavior is consistent and provider-aware.

Cache retention

`CACHE_RETENTION`	TTL	Description
`long` (default)	1 hour	Best for normal usage — nearly every message hits the cache
`short`	5 minutes	Matches pi-ai’s own default; useful during development
`none`	disabled	No `cache_control` markers sent

Provider support:

Anthropic models: Full prompt caching with configurable TTL. The stable system prompt (~800 tokens + up to 15 injected memories) is cached, reducing input token costs by ~90% on cache hits.
Custom OpenAI-compatible endpoints (LLM_BASE_URL set): cacheRetention is forced to 'none' regardless of CACHE_RETENTION. These endpoints do not support Anthropic-style prompt caching.
Caching is not applied to the categorization service (categorize_note tool) — it has no stable system prompt, so caching would add no value.

Startup log

On startup, cacheRetention is logged alongside the model and thinking level:

{ "model": "claude-haiku-4-5-20251001", "thinkingLevel": "off", "cacheRetention": "long", ... }

Verifying cache hits

With LOG_LLM_PAYLOADS=true, the raw request payload will show cache_control markers on the system prompt. After the second message in a session, usage stats should show non-zero cacheReadTokens.

Multi-Provider LLM Support

EchOS supports any LLM provider available through pi-ai (23+ providers) as well as custom OpenAI-compatible endpoints such as DeepInfra and Groq. The provider is selected entirely through environment variables — no code changes are needed.

Configuration

Environment variable	Purpose
`ANTHROPIC_API_KEY`	API key for Anthropic (existing behaviour)
`LLM_API_KEY`	API key for any other provider (Groq, DeepInfra, etc.)
`LLM_BASE_URL`	Base URL for a custom OpenAI-compatible endpoint
`DEFAULT_MODEL`	Model spec string (see formats below)

At least one of ANTHROPIC_API_KEY or LLM_API_KEY must be set. If LLM_BASE_URL is set, LLM_API_KEY is required. Validation is enforced at startup.

Model Spec Formats

Format	Example	Provider used
Bare model ID	`claude-haiku-4-5-20251001`	Inferred (Anthropic)
`provider/model`	`groq/llama-3.3-70b-versatile`	Explicit (Groq)
Any string + `LLM_BASE_URL`	`meta-llama/Meta-Llama-3.1-70B-Instruct`	Custom OpenAI-compatible endpoint

Usage Examples

Existing Anthropic setup — no change needed:

ANTHROPIC_API_KEY=sk-ant-...
DEFAULT_MODEL=claude-haiku-4-5-20251001

DeepInfra (LLaMA via OpenAI-compatible endpoint):

LLM_API_KEY=<deepinfra_key>
LLM_BASE_URL=https://api.deepinfra.com/v1/openai
DEFAULT_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct

Groq (native pi-ai provider):

LLM_API_KEY=<groq_key>
DEFAULT_MODEL=groq/llama-3.3-70b-versatile

Implementation

resolveModel(spec, baseUrl?) in packages/core/src/agent/model-resolver.ts handles the translation from a spec string to a pi-ai Model object. When baseUrl is provided it constructs a Model with api: 'openai-completions' pointing at the custom endpoint; otherwise it delegates to getModel() from pi-ai using inferred or explicit provider/model-ID pairs. The resolved API key is injected into every LLM call via agent.streamFn, which wraps streamSimple and adds the apiKey option. This covers the main conversation agent. The categorize_note tool resolves its own model via resolveModel() and picks the matching key by inspecting model.provider directly.

Security

User authentication via Telegram user ID whitelist
SSRF prevention on all URL fetching
HTML sanitization via DOMPurify
Rate limiting (token bucket per user)
Structured audit logging
Secret redaction in Pino logs

Getting Started

Architecture

Features

Operations

Architecture

Architecture

Data Flow

Package Dependencies

Scheduler & Notifications

Export Utility

Formats

`export_notes` Agent Tool

Exports Directory

Interface Delivery

Plugin Architecture

Available Plugins

Storage Architecture

Storage Sync

Search

Memory System

Custom Agent Message Types

`echos_context`

AI Categorization — Streaming with Progressive JSON

Context Overflow Detection

Prompt Caching

Cache retention

Startup log

Verifying cache hits

Multi-Provider LLM Support

Configuration

Model Spec Formats

Usage Examples

Implementation

Security

Getting Started

Architecture

Features

Operations

​Architecture

​Data Flow

​Package Dependencies

​Scheduler & Notifications

​Export Utility

​Formats

​export_notes Agent Tool

​Exports Directory

​Interface Delivery

​Plugin Architecture

​Available Plugins

​Storage Architecture

​Storage Sync

​Search

​Memory System

​Custom Agent Message Types

​echos_context

​AI Categorization — Streaming with Progressive JSON

​Context Overflow Detection

​Prompt Caching

​Cache retention

​Startup log

​Verifying cache hits

​Multi-Provider LLM Support

​Configuration

​Model Spec Formats

​Usage Examples

​Implementation

​Security

Architecture

Data Flow

Package Dependencies

Scheduler & Notifications

Export Utility

Formats

`export_notes` Agent Tool

Exports Directory

Interface Delivery

Plugin Architecture

Available Plugins

Storage Architecture

Storage Sync

Search

Memory System

Custom Agent Message Types

`echos_context`

AI Categorization — Streaming with Progressive JSON

Context Overflow Detection

Prompt Caching

Cache retention

Startup log

Verifying cache hits

Multi-Provider LLM Support

Configuration

Model Spec Formats

Usage Examples

Implementation

Security