23. Agent to Agent Intercom: The Necropolis Protocol
Context and Problem Statement
The LychD architecture prioritizes single-node sovereignty to enable deterministic resource management, yet a single substrate is physically finite. Traditional multi-node GPU clustering introduces synchronization latency that destroys high-order intelligence. To scale without the "Cluster Tax," sovereign nodes require a mechanism to function as a decentralized swarm. A protocol is required that allows independent "Cells" of intelligence to delegate labor, share stateful context, and trade artifacts while strictly respecting the absolute priority of the local Magus and the physical limits of the local VRAM.
Requirements
- Standard Interoperability: Mandatory adoption of the open Agent2Agent (A2A) standard via
fasta2ato ensure communication with any compliant entity. - The Emissary Pattern: Client-side manifestation of remote agents as
EmissaryToolinstances within the local cognitive arsenal. - Native Hosting: Provision of ASGI-compliant endpoints via the
agent.to_a2a()protocol to expose internal souls to the swarm. - Asynchronous Deferral: Mandatory support for Pydantic AI’s
CallDeferredmechanism, allowing the Daemon to returnDeferredToolRequestsand free resources. - Swarm Workload Pooling: Mandatory implementation of a dedicated "Remote Pool" in the Orchestrator (21) to prevent peer requests from overrunning local VRAM.
- Context Preservation: Support for
context_idto ensure multi-turn negotiations between nodes share a unified message and reasoning history. - Anatomical Persistence: Persistence of all A2A tasks and results within the Phylactery (06) binary JSONB chambers.
- Standard Discovery Path: Implementation of the
/.well-known/ai-agentprotocol for autonomous coordinate and capability identification.
Considered Options
Option 1: Distributed GPU Clustering
Splitting a single model across multiple network nodes. - Cons: Latency Paralysis. Jitter is too high for real-time reasoning. Violates the "Single-Node Sovereignty" doctrine by creating external uptime dependencies for core cognition.
Option 2: Decentralized A2A Swarm
Treating each node as a complete, sovereign "Brain" that collaborates via high-level delegation. - Pros: - Intelligence Layering: Allows a "Small Lych" to delegate complex tasks to a "Great Lych" without sharing raw VRAM. - Resilience: Each node remain independent; the swarm survives individual node failures. - Substrate Compatibility: Standardizes the swarm interface while maintaining 100% local resource determinism.
Decision Outcome
Agent2Agent (A2A) is adopted as the lingua franca of the Swarm, implemented via FastA2A and deeply integrated into the system's asynchronous worker substrate.
1. The Intercom Architecture (FastA2A Binding)
The system implements native A2A mount points on the Vessel. Local agents are converted into routers via agent.to_a2a(), while the Phylactery (06) serves as the permanent anchor for Tasks and Contexts. Incoming requests are transmuted into Ritual Intents and placed in a dedicated Swarm Workload Pool for the Ghouls (14). This ensures that external labor is queued and processed only when it does not interfere with the primary user interface.
2. Emissary Tools and Deferred Execution
Remote agents are manifested as EmissaryTool instances. If a peer requires significant time to complete a ritual, the tool raises a Pydantic AI CallDeferred exception. The agent run ends with a DeferredToolRequests object. This triggers the "Long Sleep" of the local Graph (22), which serializes its state to the database via BaseStatePersistence. Once the peer delivers the result, the local worker rehydrates the mind with the DeferredToolResults and resumes the thought exactly where it halted.
3. The Ward of Resource Sovereignty
The Intercom is protected by an internal "Ward" logic managed by the Orchestrator (21). Swarm requests are subject to a Resource Lease. If a local user Reflex (high priority) requires the VRAM currently held by a remote task, the Orchestrator revokes the lease, pauses the Ghoul, and persists the swarm context. Discovery is facilitated via a standard manifest at /.well-known/ai-agent, with trust levels segmented by the network interface to allow privileged A2A collaboration (e.g., source code transfer) on trusted paths.
Consequences
Positive
- Scalable Intelligence: Enables a hierarchy of specialized nodes collaborating without the synchronization overhead of clustering.
- Contextual Immortality: Swarm conversations span days and survive system restarts through rehydration rituals.
- Leased Resources: The Orchestrator ensures the Magus remains master of the local iron, preempting the swarm for immediate local needs.
Negative
- Protocol Overhead: The A2A handshake and artifact serialization introduce slightly higher latency than a local function call.
- Recursive Deadlocks: Cyclic swarm requests (Node A $\to$ B $\to$ A) require a global
TTL(Time To Live) on context transfers to prevent resource-locking loops.