methodology · gate 0–4 · clear-dim

architecture decisions come before code.

Five gates the build cannot skip. Grounded in MIT, Google Research, and CLEAR. The methodology that lets us reach production in six to eight weeks instead of six to eighteen months.

01 · the five gates
every build passes these gates before any code is written.
  • gate 0 · agent needed?Default to a single strong LLM + retrieval. Escalate to agents only when task properties (long horizon · branching · tool use · maker-checker) actually require them. Most systems don't need agents.
  • gate 1 · multi-agent?MIT signal test: each agent must add new exogenous signal. If specialists are just bounded capabilities a manager invokes, build a single agent. (Google Research 180-config study: multi-agent variants degraded 39–70% on sequential tasks.)
  • gate 2 · topologyFlow (pipeline) for regulated sequential work. Orchestration for parallelisable breadth. Peer mesh only when agents genuinely produce independent evidence. Choose by task shape, not by aesthetic.
  • gate 3 · instrumentationStructured outputs as data contracts. Per-stage evaluators. Token budget audit. HITL checkpoints at every high-stakes boundary. CLEAR telemetry plumbed from week one.
  • gate 4 · protocolMCP for tool access. A2A for peer-to-peer. ACP for governance. Never dynamically mutate the tool set mid-iteration — KV cache invalidation is a 10x cost penalty.
02 · clear-dim eval
the evaluation harness that ships with every blueprint.

CLEAR-dim is our evaluation harness — five dimensions, scored per stage, plumbed into the system from week one. Comprehensiveness, Latency, Explainability, Auditability, Reproducibility. Every deployment gets a baseline scorecard before going live; every release re-runs against it.

  • C · ComprehensivenessDoes the system handle the full task surface? Per-stage evaluators check coverage against a golden set.
  • L · Latencyp50 and p95 per stage. Cost-aware: faster ≠ better if it costs 10x.
  • E · ExplainabilityCan the system tell you why? Reasoning traces. HITL approval requires reasoning visible.
  • A · AuditabilityEvery external call, model invocation, human decision written to immutable store. DPDP-compliant retention by class.
  • R · ReproducibilitySame input → same output (within stochasticity bounds). Cascade test replays prior failures against every release.
03 · the citations
this isn't original. we just enforce it.
  • arXiv:2512.08296 — 180-config multi-agent studyGoogle Research benchmark showed multi-agent variants degraded 39–70% on sequential tasks vs single-agent baselines.
  • arXiv:2603.04474 — cascade failure modesMIT / CityU 2026: hub fragility, translation loss at supervisors, confidence drift across handoffs — present in every major framework.
  • CLEAR · evaluation dimensionsFive-dimension production benchmark we extended into the CLEAR-dim eval harness.
rfp incoming

we'll send the methodology whitepaper.

Eight pages. Gate-by-gate detail. Citations. CLEAR scorecard methodology. Useful for procurement teams comparing vendors.

request whitepaper
DPDP-ready by design AWS · Azure · GCP blueprint · patent pending India residency · on-prem option