Skip to main content

Architecture

EchOS is structured as a pnpm monorepo where a central agent core handles all reasoning, and separate packages implement each interface and storage layer. This document covers how a message moves through the system, how packages depend on each other, and how the three-layer storage architecture keeps data consistent.

Data Flow

User Input (any interface)


┌─────────────────────────────┐
│  Interface Adapter           │
│  (Telegram / Web / CLI)      │
│  - Auth verification         │
│  - Message normalization     │
│  - Response streaming        │
└──────────┬──────────────────┘


┌─────────────────────────────┐
│  Agent Core (pi-agent-core)  │
│  - System prompt + context   │
│  - LLM reasoning (pi-ai)    │
│  - Tool selection & calling  │
│  - Session persistence       │
└──────────┬──────────────────┘

     ┌─────┼─────────┐
     ▼     ▼         ▼
┌───────┐ ┌────────┐ ┌──────────┐
│ Core  │ │ Plugin │ │ Scheduler│
│ Tools │ │ Tools  │ │ (BullMQ) │
└───┬───┘ └───┬────┘ └─────┬───┘
    │         │            │
    └─────────┴────────────┘


┌─────────────────────────────┐
│  Storage Layer               │
│  - Markdown files (source    │
│    of truth, git-friendly)   │
│  - SQLite (metadata + FTS5)  │
│  - LanceDB (vector search)  │
└─────────────────────────────┘

Package Dependencies

@echos/shared          ← no dependencies (types, config, security, logging, NotificationService)
@echos/core            ← shared (storage, agent, plugin system)
@echos/telegram        ← shared, core (grammY bot, notification service)
@echos/web             ← shared, core (Fastify server)
@echos/cli             ← shared, core, plugin-article, plugin-youtube, plugin-twitter (CLI binary — standalone terminal interface)
@echos/scheduler       ← shared, core, plugin-article, plugin-youtube (BullMQ workers)
@echos/plugin-youtube  ← shared, core (YouTube transcript extraction)
@echos/plugin-article  ← shared, core (web article extraction)
@echos/plugin-twitter  ← shared, core (Twitter/X tweet and thread extraction)

Daemon Entry Point (src/)

The daemon entry point (src/index.ts) is a thin orchestrator (~78 lines) that delegates to focused modules:
  • src/plugin-loader.ts — Auto-discovers plugins by scanning the plugins/ directory at runtime. No manual imports needed.
  • src/redis-check.ts — Redis TCP pre-flight check (checkRedisConnection). Fatal-exits if Redis is unreachable.
  • src/storage-init.ts — Initializes SQLite, Markdown, Vector, and Search storage; runs reconciliation; starts the file watcher.
  • src/scheduler-setup.ts — Creates BullMQ queue, processors, workers, and schedule manager.
  • src/shutdown.ts — Graceful shutdown handler that closes all resources in order.
  • src/agent-deps.ts — Builds the AgentDeps object and plugin config from the application config.

Scheduler & Notifications

The scheduler package (@echos/scheduler) runs background jobs via BullMQ + Redis. It is always enabled and requires a running Redis instance. Notification delivery is decoupled via NotificationService (defined in @echos/shared). The Telegram package provides the concrete implementation; the scheduler receives it via dependency injection and never imports @echos/telegram directly. When Telegram is disabled, a log-only fallback is used. Workers:
  • Digest: Creates a throwaway AI agent to summarize recent notes and reminders, broadcasts the result
  • Reminder check: Queries SQLite for overdue reminders and sends notifications
  • Content processing: Processes article/YouTube URLs queued by the agent
  • Update check: Checks GitHub for new EchOS releases daily and notifies the user with install-method-specific update instructions. Disable with DISABLE_UPDATE_CHECK=true.
  • Trash purge: Permanently removes notes that have been in the trash for more than 30 days. Runs daily at 3 AM (0 3 * * *). Clears all three storage layers: markdown file (from knowledge/.trash/), SQLite record, and LanceDB vector.
See Scheduler for configuration and usage details.

Export Utility

Location: packages/core/src/export/index.ts The export utility provides pure serialization functions for converting notes into downloadable file formats. It is used by the export_notes agent tool and is independent of any interface or storage layer.

Formats

FormatFunctionOutput
markdownexportToMarkdown(note)Full markdown file with YAML frontmatter (reads raw file from disk when available; reconstructs from SQLite otherwise)
textexportToText(note)Plain text with markdown syntax stripped (headings, bold, links, list markers, etc. removed)
jsonexportToJson(notes)JSON array of { metadata, content } objects
zipexportToZip(notes)ZIP archive of .md files (one per note), deduplicated filenames

export_notes Agent Tool

Location: packages/core/src/agent/tools/export-notes.ts The tool selects notes (by ID or filter), serializes them, and returns an ExportFileResult JSON string in its tool result. Interfaces intercept this via the tool_execution_end agent event (which exposes event.result) and deliver the file:
  • Single note, markdown/text → returned inline (no file written to disk)
  • Multiple notes, or json/zip format → written to data/exports/export-{timestamp}.{ext}
Auto-format upgrade: if multiple notes are requested with markdown or text format, the tool automatically upgrades to zip.

Exports Directory

Export files are written to data/exports/ (configurable via exportsDir in AgentDeps). Files are cleaned up automatically after 1 hour by the export cleanup cron job in the scheduler (export-cleanup, runs hourly at 0 * * * *).

Interface Delivery

InterfaceDelivery mechanism
Telegramctx.replyWithDocument(new InputFile(buffer, fileName)) after agent completes; temp file deleted immediately
CLIInline content → stdout; file exports → --output path or ./fileName in CWD; path printed to stderr
WebGET /api/export/:fileName download endpoint; agent includes the URL in its text response

Note Version History

Location: packages/core/src/storage/revisions.ts Every call to update_note automatically snapshots the previous state in SQLite’s revisions table before applying changes. No configuration is required — revisioning is always on.

Revision storage

ColumnTypeDescription
idTEXT (UUID string)Unique revision identifier
note_idstringFK to notes.id
titlestringNote title at snapshot time
contentstringFull note body at snapshot time
tagsstringComma-separated tags at snapshot time
categorystringCategory at snapshot time
created_atISO timestampWhen the snapshot was taken

Auto-prune

Each save checks the total revision count for the note. If it exceeds 50, the oldest revisions are deleted as part of the same save operation. The cap is enforced in RevisionStorage.saveRevision().

Agent tools

ToolLocationDescription
note_historypackages/core/src/agent/tools/note-history.tsLists past revisions with timestamps and diff summaries
restore_versionpackages/core/src/agent/tools/restore-version.tsRestores a note to a past revision; saves current state first
restore_version updates markdown files, SQLite, and re-generates the LanceDB vector embedding after restoring.

Soft Delete / Trash

Notes are soft-deleted by default — moved to a recoverable trash state rather than permanently removed.

How it works

  1. delete_note (without permanent=true) calls markdown.moveToTrash(filePath), which moves the .md file from knowledge/{type}/{category}/ to knowledge/.trash/
  2. SQLite is updated: status = 'deleted', deleted_at = <now>, and file_path is updated to the .trash/ path
  3. LanceDB vectors are retained (cheap, removed on permanent purge)
To permanently delete without going through trash, pass permanent=true — this calls markdown.remove(), sqlite.purgeNote(), and vectorDb.remove() immediately.

Recovery

restore_note reverses the process: the .md file is moved back from .trash/ to its original path, and in SQLite the file_path is updated and status is set back to 'saved' (previous statuses such as read/archived are not currently preserved). The .trash/ subdirectory is inside the knowledge directory, so it remains visible in Obsidian and other markdown editors.

Automatic purge

The trash_purge background job (scheduler worker, runs daily at 3 AM) queries listDeletedNotes(), checks deleted_at age, and permanently removes any note older than 30 days from all three storage layers. Location: packages/scheduler/src/workers/trash-purge.ts

Filtering

All list_notes and search_knowledge queries exclude status = 'deleted' notes automatically. Only list_trash surfaces deleted notes.

Automated Backups

Backups are created by packages/core/src/backup/index.ts and triggered by the scheduler or the manage_backups agent tool.

Backup Contents

Each backup is a .tar.gz archive containing:
  • knowledge/ — all markdown notes
  • db/echos.db — SQLite database (consistent snapshot via the .backup() API)
  • db/vectors/ — LanceDB vector data
  • backup-manifest.json — version, timestamp, note count

Configuration

Env varDefaultDescription
BACKUP_ENABLEDtrueEnable/disable scheduled backups
BACKUP_CRON0 2 * * *Cron schedule (default: daily at 2 AM)
BACKUP_DIR~/echos/backupsWhere backup archives are stored
BACKUP_RETENTION_COUNT7Number of most-recent backups to keep

manage_backups Tool

Location: packages/core/src/agent/tools/backup.ts Accepts an action parameter:
  • create — trigger a manual backup immediately
  • list — list existing backups with size and age
  • prune — remove backups beyond the retention count

Backup Scheduler

The backup cron job runs at the configured schedule (default 0 2 * * *). After creating a backup it automatically prunes old backups beyond the retention count. The schedule is only registered when BACKUP_ENABLED=true.

Restore

restoreBackup(backupPath, targetDir) extracts a backup to a target directory. It does not overwrite live data — the user must manually swap directories after verifying the restored data.

Reading Queue

Two core tools provide reading queue intelligence for saveable content types (articles, YouTube videos, tweets):

reading_queue Tool

Location: packages/core/src/agent/tools/reading-queue.ts Lists unread items (status = 'saved') sorted by relevance to recent reading interests — scored by tag overlap (×2), category match (×1), and recency (×0.5). Fetches up to 200 candidates, queries the last 20 read items from the past 30 days via sqlite.db to build an interest profile, scores in-process, and slices to limit. When ≥3 recent reads exist, the response includes a “Sorted by relevance” note. Accepts an optional type filter and a limit (default 10).

reading_stats Tool

Location: packages/core/src/agent/tools/reading-stats.ts Aggregates reading progress via direct SQLite COUNT queries on the notes table. Returns total saved/read/archived counts, per-type breakdown, last-7-day activity, and a computed read-rate percentage. Uses sqlite.db directly (already a public property on SqliteStorage). The digest plugin (plugins/digest) includes a reading queue section in its prompt, calling reading_queue with limit=3 to surface top unread items in the daily digest.

knowledge_stats Tool

Location: packages/core/src/agent/tools/knowledge-stats.ts Returns a comprehensive overview of the entire knowledge base. Uses seven query methods added to SqliteStorage:
  • getContentTypeCounts()GROUP BY type count of non-deleted notes
  • getStatusCounts() — single-pass CASE SUM for saved/read/archived/unset buckets
  • getDistinctTagCount() — recursive CTE to enumerate and COUNT(DISTINCT) all comma-separated tags
  • getLinkCount() — SUM of comma-count+1 per note link list
  • getWeeklyCreationCounts(weeks)strftime('%Y-W%W', created) GROUP BY for the past N weeks
  • getTagFrequencies(limit) — delegates to the existing getTopTagsWithCounts prepared statement
  • getCategoryFrequencies(limit)GROUP BY category ORDER BY count DESC
The week-fill loop mirrors SQLite’s strftime('%W') semantics (week 0 = days before the year’s first Monday) so weekMap.get(week) matches correctly. Storage sizes are computed asynchronously via fs/promises (readdir + stat) with all three paths running in parallel. Requires knowledgeDir and dbPath on AgentDeps (both optional, defaulting to ./data/knowledge and ./data/db).

Note Templates

Location: packages/core/src/templates/index.ts The template module allows users to create notes from pre-defined structures. Templates are stored as markdown files with YAML frontmatter in <knowledgeDir>/templates/.

Built-in Templates

Five templates are scaffolded automatically on first use: Meeting Notes, Book Review, Project Brief, Weekly Review, and Decision Log. Each template uses {{placeholder}} syntax for variable substitution.

API

FunctionDescription
listTemplates(knowledgeDir)Scan templates directory, parse frontmatter, return Template[]
getTemplate(knowledgeDir, name)Find a template by name or filename slug
applyTemplate(template, variables)Replace {{key}} placeholders with values
createDefaultTemplates(knowledgeDir)Scaffold built-in templates if they don’t exist
saveCustomTemplate(knowledgeDir, ...)Save a user-created template
Tool: use_template (packages/core/src/agent/tools/use-template.ts) Actions: list (show templates), use (create a note from template), create (save a custom template). Using a template creates a real note persisted to markdown, SQLite, and the vector store.

Knowledge Graph

Location: packages/core/src/graph/index.ts The graph module builds an in-memory undirected graph from note links stored in SQLite. Links are stored bidirectionally by link_notes (both A→B and B→A are persisted), so the graph deduplicates them into a single undirected edge per pair.

Data Model

KnowledgeGraph
├── nodes: GraphNode[]   — id, title, type, tags, category
└── edges: GraphEdge[]   — source, target, label?
Tags and links are stored as comma-separated strings in SQLite (upsertNote uses tags.join(',')) and parsed with parseCommaSeparated() — no JSON parsing.

API

FunctionDescription
buildGraph(sqlite)Load all notes via listNotes(), build adjacency + deduplicated edge list
getSubgraph(graph, centerId, depth)BFS from a center node to N hops; returns a subgraph
exportMermaid(graph)graph TD with undirected --- edges
exportDot(graph)graph knowledge { } (undirected) with -- edges
exportJson(graph){ nodes, links } — D3 node-link format; links key matches D3 convention
getTopology(graph)Cluster count (union-find), top-10 hubs by degree, orphan nodes
parseCommaSeparated(raw)Split on commas, trim, filter empty — shared utility

explore_graph Agent Tool

Location: packages/core/src/agent/tools/explore-graph.ts Accepts an action parameter:
  • around — searches for a note by note_id or by topic (hybrid search), then runs BFS to depth hops (1–5, default 2). Builds a Map<id, GraphNode> from the subgraph for O(1) lookups per hop level and groups results by hop distance.
  • export — renders the full graph in mermaid (default), dot, or json format. Returns the full output without truncation.
  • stats — calls getTopology() and formats clusters, hubs (degree > 1), and orphans as a markdown report.
Each action returns a typed details object (StatsDetails, ExportDetails, or AroundDetails) rather than any.

Auto-Linking

Location: packages/core/src/graph/auto-linker.ts The auto-linker module suggests links between semantically similar notes to help the knowledge graph grow organically.

API

FunctionDescription
suggestLinks(noteId, sqlite, vectorStore, generateEmbedding, limit?, threshold?)Find semantically similar, not-yet-linked notes
suggestLinks generates a fresh embedding for the source note, queries the vector store for nearest neighbours, and filters out:
  • The note itself
  • Notes already linked (from the source note’s links field)
  • Notes sharing the same sourceUrl (avoid self-links on split content)
  • Notes below the configurable similarity threshold (default 0.82)
Each result is a LinkSuggestion: { targetId, targetTitle, similarity, reason } where reason is derived from shared tags, shared category, or falls back to “semantically similar content”. Location: packages/core/src/agent/tools/suggest-links.ts Standalone tool for on-demand link suggestions. Parameters: noteId (required), limit (default 5, max 20). Returns suggestions with similarity percentages and reasons. Use link_notes to accept any suggestions.

Auto-suggestions in categorize_note

After a note is categorized (both lightweight and full modes), the categorize tool automatically calls suggestLinks with limit: 3 and appends the top suggestions to the response text. This is non-fatal — if the vector store query fails the categorization result is still returned. This lets the agent proactively mention related notes to the user after categorization.

Plugin Architecture

Content processors live in plugins/ as separate workspace packages. Each plugin:
  • Implements the EchosPlugin interface from @echos/core
  • Returns agent tools from its setup(context) method
  • Receives a PluginContext with storage, embeddings, logger, and config
  • Is auto-discovered at runtime by src/plugin-loader.ts (scans plugins/ directory) and registered via PluginRegistry
Core tools (create_note, search, get, list, update, delete_note, restore_note, list_trash, note_history, restore_version, reminders, memory, linking, categorize_note, save_conversation, mark_content, export_notes, manage_tags) remain in @echos/core. Domain-specific processors (YouTube, article, image, etc.) are plugins. Plugins can optionally use the AI categorization service from @echos/core to automatically extract category, tags, gist, summary, and key points from content. The categorization pipeline is vocabulary-aware: before calling the LLM, it fetches the top 50 most-used tags from SQLite and injects them into the prompt, steering the model to reuse existing tags rather than coining synonyms. See Categorization for details.

Available Plugins

  • article: Web article extraction using Readability
  • youtube: YouTube video transcript extraction
  • twitter: Twitter/X tweet and thread extraction via FxTwitter API (no API key required)
  • image: Image storage with metadata extraction (format, dimensions, EXIF)
  • content-creation: Content generation tools
See Building a Plugin for detailed plugin documentation.

Storage Architecture

SQLite (better-sqlite3): Structured metadata index, FTS5 full-text search, memory store, reminders, and tag management. Tags are stored as comma-separated strings in the tags column. The getAllTagsWithCounts() method uses a recursive CTE to split and aggregate tags across all notes. renameTag() and mergeTags() use the ',' || tags || ',' wrapping trick to avoid substring false matches when replacing tag values in-place. Both operations automatically keep the FTS5 index in sync via the existing notes_au AFTER UPDATE trigger. The memory table stores long-term personal facts with a confidence score (0–1) and kind (fact, preference, person, project, expertise). Notes also store a content_hash (SHA-256) used to detect changes and skip unnecessary re-embedding. The status column tracks content lifecycle (saved, read, archived, deleted) and input_source records how content was captured (text, voice, url, file, image). When a note is soft-deleted, status is set to deleted, a deleted_at timestamp is recorded, and file_path is updated to point to the note’s new location in knowledge/.trash/. The revisions table stores snapshots of previous note states: each row captures note_id, title, content, tags, category, and created_at. Revisions are written automatically by update_note before applying changes, capped at 50 per note (oldest pruned on save). For images, additional columns store image_path (local file path), image_url (source URL), image_metadata (JSON with dimensions, format, EXIF), and ocr_text (for future OCR support). LanceDB (embedded): Vector embeddings for semantic search. No server process needed. Markdown files: Source of truth. YAML frontmatter with structured metadata. Directory layout: knowledge/{type}/{category}/{date}-{slug}.md. Images are stored in knowledge/image/{category}/{hash}.{ext} and referenced from markdown notes.

Storage Sync

EchOS keeps the three storage layers in sync automatically, even when markdown files are added or edited outside the application: Startup reconciliation (reconcileStorage in packages/core/src/storage/reconciler.ts): Runs once at boot. Scans all .md files in the knowledge directory and compares them against SQLite using the content_hash column:
  • New file → full upsert to SQLite + generate embedding in LanceDB
  • Content changed → update SQLite + re-embed (OpenAI called only when content hash differs)
  • File moved (same hash, different path) → update file path in SQLite only, no re-embed
  • No change → skipped entirely
  • SQLite record with no file on disk → deleted from SQLite and LanceDB
Live file watcher (createFileWatcher in packages/core/src/storage/watcher.ts): Uses chokidar to watch knowledge/**/*.md while the app is running. Events are debounced (500 ms) and awaitWriteFinish is enabled to handle atomic saves from editors (VS Code, Obsidian, etc.):
  • add / change → parse, compare content hash, upsert if changed (re-embed only on content change)
  • unlink → look up note by file path in SQLite, delete from SQLite + LanceDB
Both paths use the same content hash check, so the OpenAI embeddings API is only called when note body text actually changes — metadata-only edits (frontmatter, tags, title) do not trigger re-embedding. Hybrid search combines multiple strategies in a pipeline (packages/core/src/storage/search.ts):
  1. Keyword (FTS5): BM25-ranked full-text search across title, content, tags
  2. Semantic (LanceDB): Cosine similarity on OpenAI embeddings
  3. Hybrid: RRF fusion of keyword + semantic results
  4. Temporal decay (optional, default on): exponential decay factor 2^(-age/halfLife) applied to RRF scores so recent notes rank higher. Configurable half-life (default 90 days). Set temporalDecay: false for archival searches.
  5. Hotness scoring (optional, default on): after temporal decay, applies a frequency-recency boost score *= (1 + 0.15 * sigmoid(log1p(retrievalCount)) * temporalDecay(lastAccessed)). Tracks retrieval count and last access time per note in the note_hotness SQLite table. Notes that consistently surface in search results get a small but compounding boost. Disable with hotnessBoost: false. Implemented in packages/core/src/storage/sqlite-hotness.ts.
  6. Cross-encoder reranking (optional, default off): after all scoring stages, the top-N candidates are sent to Claude Haiku with a relevance-scoring prompt. Scores are parsed and results re-sorted. Adds one API call per search but yields the highest-quality ranking. Enable with rerank: true in SearchOptions or the search_knowledge tool. Requires ANTHROPIC_API_KEY. Implemented in packages/core/src/storage/reranker.ts.

Search Benchmark

A reproducible benchmark suite lives in benchmarks/search/ and measures search quality (Precision@5, Recall@10, MRR) and latency across all pipeline configurations and corpus sizes.
# Generate corpus fixtures locally (one-time, gitignored)
pnpm tsx benchmarks/search/generate-corpus.ts all

# Run the benchmark (all 3 scales × 5 pipeline configs)
pnpm bench:search

# Generate RESULTS.md report
pnpm tsx benchmarks/search/report.ts
The benchmark uses deterministic pseudo-embeddings derived from topic-cluster assignments — no OpenAI API key required. Results are fully reproducible: the same corpus always produces the same scores. The hybrid+decay+hotness+rerank configuration requires ANTHROPIC_API_KEY and is skipped otherwise. Key files:
  • benchmarks/search/generate-corpus.ts — generates synthetic notes (100/1000/10k) across 10 topic clusters
  • benchmarks/search/queries.json — 55 annotated test queries with expected note IDs
  • benchmarks/search/run.ts — loads corpus into temp SQLite+LanceDB, runs queries, emits results/{timestamp}.json
  • benchmarks/search/report.ts — reads latest results JSON, writes RESULTS.md

Memory System

Long-term memory (remember_about_me / recall_knowledge tools) uses a hybrid strategy to balance cost and recall:
  • At agent creation (including after /reset): the top 15 memories ranked by confidence DESC, updated DESC are injected directly into the system prompt as “Known Facts About the User”. This ensures core personal facts are always available without an explicit tool call.
  • On-demand retrieval: if more than 15 memories exist, recall_knowledge searches the full memory table using word-tokenised LIKE queries. The system prompt notes additional memories are available so the agent knows to use the tool.
This means /reset only clears the conversation history — all stored memories persist in SQLite and are reloaded into the next session automatically.

Custom Agent Message Types

EchOS extends the AgentMessage union from @mariozechner/pi-agent-core via TypeScript declaration merging in packages/core/src/agent/messages.ts.

echos_context

declare module '@mariozechner/pi-agent-core' {
  interface CustomAgentMessages {
    echos_context: EchosContextMessage;
  }
}
Used to inject structured context (e.g. current date/time) into each turn without string-concatenating it onto the user message in every interface adapter. The custom convertToLlm function (echosConvertToLlm) prepends the context content to the immediately following user message before the LLM call. Custom messages are preserved in agent.state.messages for debugging but never sent standalone to the LLM. Helpers exported from @echos/core:
  • createContextMessage(content) — creates an echos_context message
  • createUserMessage(content) — creates a typed user message
Usage in interfaces:
await agent.prompt([
  createContextMessage(`Current date/time: ${now.toISOString()} UTC`),
  createUserMessage(userInput),
]);
All interfaces (Telegram, Web, CLI) use this pattern.

AI Categorization — Streaming with Progressive JSON

The categorization service (packages/core/src/agent/categorization.ts) uses streamSimple from @mariozechner/pi-ai instead of a blocking fetch. As the LLM streams its JSON response, parseStreamingJson parses each partial chunk — which never throws, always returning {} on incomplete input. When new fields become fully formed in the partial JSON, an optional onProgress callback fires:
  • "Category: programming" — as soon as category is resolved
  • "Tags: typescript, api" — updated each time a new tag appears
  • "Gist: One sentence summary." — once the gist looks complete (>20 chars, ends with punctuation) — full mode only
Both categorizeLightweight and processFull accept onProgress?: (message: string) => void. Callers that don’t need progressive updates (e.g. the scheduler digest worker) pass no callback and get the same blocking behaviour as before.

Context Overflow Detection

The agent uses a two-layer approach to context window management: Layer 1 — Proactive pruning (createContextWindow in context-manager.ts): Runs before every LLM call via transformContext. Estimates token usage and slides the message window back to the nearest user-turn boundary until the budget fits. This should prevent overflows under normal operation. Layer 2 — Reactive detection (isAgentMessageOverflow in context-manager.ts): If a provider rejects the request despite pruning (e.g. single oversized message, model switch, token estimation drift), the last assistant message is checked against isContextOverflow from @mariozechner/pi-ai, which matches provider-specific error patterns for Anthropic, OpenAI, Gemini, Groq, Mistral, OpenRouter, and others. On overflow detection:
  • Telegram: Replies with “Conversation history is too long. Use /reset to start a new session.” instead of a raw provider error string.
  • Web API: Returns HTTP 413 with a structured error body ({ error: "Conversation history is too long. Please reset your session." }).
The helper isAgentMessageOverflow(message, contextWindow) is exported from @echos/core for use in any interface adapter.

Prompt Caching

Each agent instance is assigned a sessionId at creation time, forwarded to LLM providers that support session-aware prompt caching:
InterfaceSession ID format
Telegramtelegram-{userId}
Webweb-{userId}
CLI (pnpm echos)cli-local
pi-ai applies cache_control markers to the system prompt and last user message automatically. The TTL is controlled via the cacheRetention option passed to streamSimplepi-ai also reads a legacy PI_CACHE_RETENTION env var for backward compatibility, but EchOS passes it explicitly so behavior is consistent and provider-aware.

Cache retention

CACHE_RETENTIONTTLDescription
long (default)1 hourBest for normal usage — nearly every message hits the cache
short5 minutesMatches pi-ai’s own default; useful during development
nonedisabledNo cache_control markers sent
Provider support:
  • Anthropic models: Full prompt caching with configurable TTL. The stable system prompt (~800 tokens + up to 15 injected memories) is cached, reducing input token costs by ~90% on cache hits.
  • Custom OpenAI-compatible endpoints (LLM_BASE_URL set): cacheRetention is forced to 'none' regardless of CACHE_RETENTION. These endpoints do not support Anthropic-style prompt caching.
  • Caching is not applied to the categorization service (categorize_note tool) — it has no stable system prompt, so caching would add no value.

Startup log

On startup, cacheRetention is logged alongside the model and thinking level:
{ "model": "claude-haiku-4-5-20251001", "thinkingLevel": "off", "cacheRetention": "long", ... }

Verifying cache hits

With LOG_LLM_PAYLOADS=true, the raw request payload will show cache_control markers on the system prompt. After the second message in a session, usage stats should show non-zero cacheReadTokens.

Multi-Provider LLM Support

EchOS supports any LLM provider available through pi-ai (23+ providers) as well as custom OpenAI-compatible endpoints such as DeepInfra and Groq. The provider is selected entirely through environment variables — no code changes are needed.

Configuration

Environment variablePurpose
ANTHROPIC_API_KEYAPI key for Anthropic (existing behaviour)
LLM_API_KEYAPI key for any other provider (Groq, DeepInfra, etc.)
LLM_BASE_URLBase URL for a custom OpenAI-compatible endpoint
DEFAULT_MODELModel spec string (see formats below)
At least one of ANTHROPIC_API_KEY or LLM_API_KEY must be set. If LLM_BASE_URL is set, LLM_API_KEY is required. Validation is enforced at startup.

Model Spec Formats

FormatExampleProvider used
Bare model IDclaude-haiku-4-5-20251001Inferred (Anthropic)
provider/modelgroq/llama-3.3-70b-versatileExplicit (Groq)
Any string + LLM_BASE_URLmeta-llama/Meta-Llama-3.1-70B-InstructCustom OpenAI-compatible endpoint

Usage Examples

Existing Anthropic setup — no change needed:
ANTHROPIC_API_KEY=sk-ant-...
DEFAULT_MODEL=claude-haiku-4-5-20251001
DeepInfra (LLaMA via OpenAI-compatible endpoint):
LLM_API_KEY=<deepinfra_key>
LLM_BASE_URL=https://api.deepinfra.com/v1/openai
DEFAULT_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct
Groq (native pi-ai provider):
LLM_API_KEY=<groq_key>
DEFAULT_MODEL=groq/llama-3.3-70b-versatile

Implementation

resolveModel(spec, baseUrl?) in packages/core/src/agent/model-resolver.ts handles the translation from a spec string to a pi-ai Model object. When baseUrl is provided it constructs a Model with api: 'openai-completions' pointing at the custom endpoint; otherwise it delegates to getModel() from pi-ai using inferred or explicit provider/model-ID pairs. The resolved API key is injected into every LLM call via agent.streamFn, which wraps streamSimple and adds the apiKey option. This covers the main conversation agent. The categorize_note tool resolves its own model via resolveModel() and picks the matching key by inspecting model.provider directly.

Agent Factory Module Structure

The agent factory in packages/core/src/agent/ is split into focused modules:
FileResponsibility
index.tscreateEchosAgent() — wires model, API key, memories, and delegates tool creation
create-agent-tools.tscreateAgentTools() — all core tool imports and instantiation (~38 tools). Adding a new core tool only requires editing this file and the tools/index.ts barrel
types.tsAgentDeps and AgentToolDeps interfaces — stable, widely-imported types
AgentDeps is the top-level dependency bag passed to createEchosAgent(). AgentToolDeps is a resolved subset (with defaults applied) passed to createAgentTools().

Security

  • User authentication via Telegram user ID whitelist
  • SSRF prevention on all URL fetching
  • HTML sanitization via DOMPurify
  • Rate limiting (token bucket per user)
  • Structured audit logging
  • Secret redaction in Pino logs