ROME (RFQ)

A Request-for-Quote workflow for AX — participants negotiate block-sized trades off the lit book and have them booked atomically in EP3 — built as a new internal service rome that proxies through order_gateway over the existing WebSocket, with no new public network surface.

RFC: rome.md Epic: A-3209 Milestone: Features & Functionality (Perpetuals exchange) Owner: Tin Status snapshot: 2026-06-08

§0Principles

Five invariants govern the design. Everything in the tracker is mechanics that follows from these.

No new public network surface

Participants speak only to order_gateway over the existing WebSocket. rome is an internal service — it has no public listener, no new ports, no new serialization format. The gateway proxies to rome over a remoc RFn channel on TCP loopback, reusing the same IPC pattern as marketdata-publisher.

§2a

EP3 booking is the only cross-process hop on the critical path

The AcceptQuote flow — request creation, quote submission, subscription fanout — is entirely in-process between the gateway and rome. The single unavoidable cross-process latency is ep3_admin.InsertTwoSidedBlockTrade at accept time. Everything else is DashMap lookups and tokio channels.

§2b

Latch, don't retry — EP3 does not deduplicate

Connamara confirmed EP3 does not dedup InsertTwoSidedBlockTrade on cross_id or any other field. Any retry on an ambiguous outcome (5xx, DEADLINE_EXCEEDED, crash) guarantees a double-book. The contract: definitive 4xx → roll back to Active; ambiguous → latch to needs_manual_reconciliation, never auto-retry. Operator reconciles.

§2c, §2f

In-memory matching, ClickHouse audit trail

Live RFQ state lives in-memory in rome (DashMap + per-entry tokio::Mutex). Redis stores durable snapshots only for in-flight Settling requests so a crash is reconcilable. ClickHouse rfq_log is the compliance audit trail. No new database is introduced.

§2d

Gateway enforces risk; rome enforces lifecycle

Margin checks run in the gateway before rome is ever called — check_margin_requirement_using_position_cache at every entry point (submit request, submit quote, accept quote). rome owns the state machine and lifecycle transitions; it trusts the gateway's risk attestation via risk_checked_at_ns.

§2e

§1Progress & Dependencies

Eight parallel tracks from protocol design through production rollout. The core engine and integration tracks (B, C) are in review; pre-rollout hardening, GUI, and rollout (E–H) are backlog, gated on the core landing. Snapshot 2026-06-08.

Done — 2 In progress — 3 Not started — 3 · 8 tracks

Dependency map

How the tracks block one another. Solid arrows are hard blockers; dashed arrows are soft dependencies. The critical path runs B (engine core) → C (integration) → E (hardening) → H (rollout), with D (visibility) and F (GUI) converging before rollout.

RFC & ProtocolTrack A · done Engine CoreTrack B · A-3215 · in review Visibility & AnonymityTrack D · A-3301 · in review EP3 + OG IntegrationTrack C · A-3216 · in review Observability & TestsTrack G · backlog Pre-rollout HardeningTrack E · A-3294 + A-3295 GUITrack F · A-3211 · backlog Feature Flag & RolloutTrack H · A-3296
done in progress not started hard blocker soft / foundation
ARFC & Public Protocol Tin
done

Design spec and SDK protocol types. Foundation for all other tracks.

A1ROME RFC — design specTindone

The founding design document covering architecture, state machine, EP3 booking contract, failure modes, and the feature comparison matrix. Merged 2026-05-28.

  • #1858 docs(rome): spec for Request for Order Matching Engine (ROME)
A2SDK public RFQ protocol typesTindone

Public protocol types in rs/sdk/src/protocol/rfq.rsRfqRequest, RfqResponse, RfqEvent enums following the existing OrderGatewayRequest convention. Merged 2026-05-29; follow-up SDK work for the order-gateway WS commands continues in open PRs.

  • Merged to main
  • #2068 feat(sdk): add RFQ public protocol and types
  • Open — follow-up SDK work
  • #2087 feat(sdk): add RFQ public protocol and types (follow-up)
  • #2094 feat(sdk): add RFQ order-gateway websocket commands
BEngine Core — rome service Tin
in review

The core rome service: IPC protocol, state machine, matching logic, expiration, ClickHouse writer, and Redis durable state. Epic A-3215 (In Review). Hard-blocks integration (C).

B1Internal IPC protocol, ClickHouse schema, Redis keysTinin review

Internal IPC types in sdk-internal/src/protocol.rs under ipc::rome: ToRomeIpc, RomeIpcResponse, RomeIpcError, RomeIpcFn. ClickHouse rfq_log table schema and migration. Redis key layout for durable-state snapshots.

  • #2069 feat(sdk-internal): add RFQ IPC protocol, ClickHouse log schema, Redis keys
B2Engine skeleton, state machine & IPC tasksTinin review

The rs/rome/ crate: AppState (DashMap-based), RequestEntry state machine (Active → Settling → Settled / Cancelled / Expired), per-request tokio::Mutex, expiration heap task, batched ClickHouse writer, ID generation (u128 with node prefix), remoc IPC accept loop, and the durable_state Redis writer for Settling snapshots.

  • Open
  • #2070 feat(rome): add RFQ engine skeleton with state machine and IPC tasks
  • Superseded
  • #1843 feat(rome): core state machine and IPC implementation (superseded by split PRs)
B3Property tests & integration testsTinin review

State-machine property tests and integration tests under rs/rome/tests/: happy-path bid/ask, concurrent accept (exactly-one-winner), EP3 failure branches, expiration, side-enforcement matrix. Uses ax_test_utils containers and Ep3Mock.

  • #2071 test(rome): add property tests and integration tests
CEP3 + Order Gateway Integration Tin
in review

Wire rome through order_gateway WS and EP3 block-trade booking. Epic A-3216 (In Review). The biggest remaining technical piece — 8 open PRs forming a deep stack.

C1Order-gateway + EP3 + Rome end-to-end wiringTinin review

Integration branch #1844 (root): connects order_gateway WS dispatch, the RomeIpcFn handle, the ep3-mock block-trade stub, and the local docker compose stack. Stacked PRs break out each layer.

Sub-stepPRWhat
SDK WS commands#2094Add OrderGatewayRequest::Rfq variants
IPC connect helper#2095Rome IPC connect helper for order-gateway
ep3-mock block trade#2096Implement InsertTwoSidedBlockTrade in ep3-mock
RFQ-log alignment#2101Align rfq_log quantity with ClickHouse schema
EP3 booking on accept#2102EP3 block-trade booking on AcceptQuote
Wire WS commands#2103Wire RFQ WebSocket commands through Rome IPC
Docker compose#2104Add Rome to local docker stack
DVisibility & Anonymity Tin
in review

Targeted RFQs, protocol-level anonymity, and maker discovery. Epic A-3301 (In Review). Soft-unblocks pre-rollout hardening (E).

D1RFQ visibility, anonymity & targeted makersTinin review

target_makers: Vec<UserId> on SubmitQuoteRequest for directed RFQs that skip the public stream. disclose_identity: bool for protocol-level anonymity — per-request pseudonym on outgoing events, real user_id in ClickHouse. Maker list API (GET /rfq/makers).

  • Open
  • #1914 feat(rome): RFQ visibility, anonymity, and targeted makers
  • Superseded / closed
  • #1845 feat(rome): anonymity support and target makers (folded into #1914)
EPre-rollout Hardening P0 + P1 Tin
not started

Size/rate limits, public tape, side enforcement, connection-state hardening, forward-compat legs shim. Epics A-3294 (backlog) + A-3295 (backlog). Gates rollout (H).

E1Size limits, rate limits, public tape, quote-side enforcementTinnot started

P0 pre-rollout blockers (A-3294): per-instrument RFQ minimum block size, per-user submission rate limit, public trade-tape policy (condition: "block"), and the 6-case quote-side enforcement integration matrix.

  • No PRs — backlog, gated on core engine and integration landing.
E2Connection-state hardening & forward-compat legs shimTinnot started

P1/P2 polish (A-3295): close_only on MakerInfo, forward-compat legs: Vec<Leg> shim, counterparty margin re-check on AcceptQuote, cancel-on-disconnect hardening (gateway crash recovery, server-side heartbeat), and the counterparty mid-flow disconnect policy decision.

Policy decision blocks design. What happens when a responder's WS disconnects between SubmitQuote and AcceptQuote — cancel the quote (mirroring order behavior) or leave it live? Decision shapes the cancel-on-disconnect implementation. See Q1.
  • Superseded / closed
  • #1915 feat(rome): cancel-on-disconnect (closed, will be re-done under A-3295)
  • No active PRs — backlog.
FGUI Tin
not started

Requester "Create Strategy" modal, maker selection sidebar, responder inbox, quote composer. Epic A-3211 (Backlog). 7 milestones (G1–G7).

F1GUI milestones G1–G7Tinnot started
#TitleEstDependsTicket
G1Static modal shell + asset selector + template buttonsSA-3316
G2Editable legs table + template pre-population + client-side GreeksMG1A-3317
G3Maker sidebar against stubbed maker listMG1A-3318
G4Wire to order_gateway WS (single-leg via legs shim)MG2, G3, A-3295A-3319
G5Confirm-then-send modal for submit + acceptSG4A-3320
G6Hedge leg UI, anonymity toggle, favoritesMG4A-3321
G7Responder surface: inbox, quote composer, historyLG4A-3322
  • Superseded / closed
  • #1903 feat(gui): RFQ (closed, will be re-done post-integration)
  • No active PRs — backlog, gated on the integration (C) and hardening (E) tracks.
GObservability & Testing Tin
not started

Prometheus metrics, dashboards, alerts, and batteries-included test coverage. Epics A-3219 + A-3260 (Backlog). Gates rollout (H).

G1Observability — metrics, dashboards, alertsTinnot started

Prometheus metrics (rome_active_requests, rome_accept_latency_seconds, rome_ep3_book_latency_seconds, rome_log_drops_total, etc.), Grafana dashboards, and incident.io alert wiring. A-3219.

  • No PRs — backlog.
G2Batteries-included testingTinnot started

Broader integration and scenario coverage: the 16 required test scenarios from the RFC (§13.1), including connection-state tests (client disconnect, gateway crash, rome restart, gateway restart) that must pass before rollout. A-3260.

  • No PRs — backlog.
HFeature Flag & Rollout Tin
not started

Last gate before GA. Feature flag on OrderGatewayRequest::Rfq, staged rollout plan, rollback playbook. Epic A-3296 (Backlog).

H1Feature flag + staged production rolloutTinnot started

Feature flag on the order_gateway WS endpoints so the RFQ surface can be enabled per-environment / per-user-tier. Staged rollout plan: canary → opt-in beta → GA, with rollback playbook. Depends on all Tier 1–2 items, observability, and testing. A-3296.

  • No PRs — backlog, last gate.
Reading the tracks. A (RFC + protocol) is done and underlies everything. The three in-review tracks — B (engine core), C (EP3 + OG integration), D (visibility) — form the core implementation, all owned by Tin, all composed of split PRs in review. The critical path is B → C → E → H: engine core must land before integration can merge, integration must land before pre-rollout hardening begins, and hardening + GUI + observability all gate the rollout. V2 features (multi-leg, quote aggregation, restart recovery, fee rebates) are deferred under A-3297 and not tracked here.

Notebook

Reference design — the mechanics and reasoning behind the tracker and the open questions. Drawn from the RFC and working decisions. Sections 2a–2f.

§2aArchitecture & data flow

rome is an internal Rust service following the existing AX convention for service-to-service IPC (marketdata-publisher): TCP loopback listener + remoc::Connect::io + a typed RFn shipped over the connection. No .proto, no generated client crate. tonic is used only for the EP3 admin call.

   requester WS ─► gateway (R) ──remoc──► rome ──tonic gRPC──► EP3 admin
                       │                  │              │
                       ▼ broadcast        │              │
              public RFQs ◄─── gateway (P) ◄── responder WS
                                          │
                                          ▼ async insert (batched)
                                     ClickHouse rfq_log

Three channel types out of rome:

  • Public RFQ eventstokio::sync::broadcast<PublicRfqEvent>, one per rome process. Each gateway holds one subscription and rebroadcasts. Slow consumers Lagged rather than back-pressuring.
  • User RFQ eventsbroadcast::Sender<UserRfqEvent> per user. Directed quotes and fills. Multiple connections per user all see every event.
  • Logmpsc::Sender<RfqLogRow> to the bounded ClickHouse writer. Off the hot path; never blocks matching.

§2bThe AcceptQuote critical path

The only flow that crosses a process boundary and the only one whose correctness depends on a specific lock-release ordering.

StepActorOperationLock state
1ClientWS frame AcceptQuote { quote_id, side }
2GatewayRisk-check acceptor + counterpartylocal margin cache (read)
3GatewayRomeIpcFn::call(AcceptQuote)
4romeResolve quote_id → request_id; acquire per-entry mutexmutex held
5romeVerify Active; flip → Settling; persist cross_id to Redismutex held
6romeDrop mutexreleased
7rome → EP3InsertTwoSidedBlockTrade { cross_id }no locks
8romeReacquire mutex; flip → Settled { trade_id }mutex held
9romeBroadcast Filled on both user channels; log; return ackclone-and-release
Why release-around-RPC. Holding the mutex across the EP3 call would serialize all activity on that request (including new quotes from other responders) for as long as EP3 takes to ack. Because state is Settling during the gap, concurrent AcceptQuote and Cancel on the same request both reject cleanly with AlreadySettling.

§2cEP3 booking & the no-retry contract

AdminAPI.InsertTwoSidedBlockTrade is the only EP3 RPC rome calls. Connamara confirmed (2026-05-26) that EP3 does not deduplicate this RPC on cross_id or any other field. Identical retry produces two block trades with distinct trade IDs.

The implication is binary. Any rome→EP3 retry on an ambiguous outcome — 5xx, DEADLINE_EXCEEDED, mid-call rome crash, lost response — guarantees a double-book. There is no client-side trick (fresh ULID, retry tokens, etc.) that recovers safety. Anti-patterns explicitly ruled out: retrying with a fresh cross_id, retrying with the same cross_id, and auto-retrying 5xx with exponential backoff.

The contract:

  • Definitive 4xx (e.g. INVALID_ARGUMENT, FAILED_PRECONDITION) — EP3 did not commit. Roll back Settling → Active, return typed reject, client may retry.
  • Ambiguous outcome (5xx, DEADLINE_EXCEEDED, crash, lost response) — leave the Settling snapshot in Redis, log trade_book_ambiguous, return typed error. No auto-retry. Operator reconciles.

cross_id is a fresh ULID per cross, minted at accept time, persisted on the Settling snapshot before the EP3 call. It is a correlation key, not an idempotency token. It enables future SearchOrders-based reconcile if the vendor confirms propagation to order records.

§2dState machine & concurrency

Live RFQ state lives in-memory in rome. No new database is introduced.

StoreWhatLifetime
DashMap<RequestId, Arc<RequestEntry>>Per-request state machineIn-memory; lost on restart (v1)
DashMap<QuoteId, RequestId>Quote-to-request indexIn-memory
Redis (durable_state)Settling snapshots with cross_idShort-lived; survives rome crash
ClickHouse rfq_logCompliance audit trailPermanent

RequestState transitions: Active → Settling { accepted } → Settled { trade_id } (happy path), with Cancelled, Expired, and needs_manual_reconciliation as terminal/stuck states. A single request's lifecycle is serialized by its per-entry tokio::sync::Mutex (held across await in the AcceptQuote critical section). DashMap reads and inserts are lock-free across requests.

§2eRisk checks & gateway enforcement

Risk checks are enforced in the gateway, not in rome. rome never calls EP3 for margin — it trusts the gateway's attestation.

Entry pointWhat's checked
SubmitQuoteRequestRequester can honor any fill: synthetic worst-case at last mark × slippage buffer, on each requested side.
SubmitQuoteResponder can honor a fill at the offered price on each side they offered. Two checks if two-sided.
AcceptQuoteAcceptor at the locked price on the accepted side. Counterparty re-checked if reachable on the same gateway shard.

If any check fails the gateway short-circuits with RfqReject and never calls rome. Risk-check attestation lives in each proxy struct (risk_checked_at_ns) since remoc has no headers.

§2fFailure modes & retry safety

Five layers where a "retry" can fire in the end-to-end booking flow. Only Layer 4 (rome→EP3) can reach EP3 unguarded; the policy is "never auto-retry on ambiguous."

LayerWhen it firesDouble-book risk?Protection
1. User clicks Accept twiceUI debounce missNoState machine: second click sees Settling/Settled
2. Client auto-retry on reconnectWS drops mid-AcceptNoAfter Accept, quote removed from quotes_index; retry returns QuoteNotFound
3. Gateway → rome IPC retryremoc connection dropsNoSymmetric to Layer 2
4. rome → EP3 retryTonic 5xx / timeout / crashYes — load-bearingLatch-and-page; no auto-retry
5. rome restart with Settling on diskOOM, panic, deployNo (no auto-retry)durable_state::load_requests routes to manual-recon
UX gap at Layers 2–3. A reconnecting client cannot reliably learn whether their Accept booked — all returns QuoteNotFound. Recovery requires a client_accept_id + GetAcceptStatus query, analogous to clord_id + GetOrderStatus on the order path. Tracked separately; does not block V1 correctness.

§3Open Questions

The questions the code can't answer. Each carries its current resolution status.

  1. Counterparty disconnect between SubmitQuote and AcceptQuote — cancel the quote, or leave it live?
    • Needs discussion Current proposal: cancel the quote when the responder's WS drops, mirroring order behavior. Decision unblocks the cancel-on-disconnect design in A-3295 item 5.
  2. Two-sided requests — can a requester partially accept (accept the bid, leave the ask live)?
    • Decided — no Accepting closes the entire request. Partial acceptance deferred to v2 (A-3297 F3).
  3. Does EP3 deduplicate InsertTwoSidedBlockTrade?
    • Resolved — no dedup Connamara confirmed (2026-05-26): EP3 does not dedup on cross_id or any other field. Identical retry produces two block trades. The idempotency_key pattern from AdjustAccountBalanceRequest is not exposed on this RPC. Policy: latch-and-page on ambiguous outcomes (§2c).
  4. Does EP3 commit-before-respond, and does cross_id propagate to drop-copy order records?
    • Open — vendor follow-up Sub-questions (iii) and (iv) from Q9. Affects whether drop-copy self-healing is viable (auto-transition Settling → Settled by matching Execution.order.cross_id). Does not block V1 — operator reconciliation is the contract regardless.
  5. Multi-gateway counterparty re-check on AcceptQuote — what if the counterparty is on a different shard?
    • Open Current behavior: if the counterparty's user replica is not reachable on the caller's gateway, rome refuses the accept. UX cliff at scale. Partially tracked under A-3295.
  6. Does BTP support GTC or post_only time-in-force? (for future RFQ-on-Bitnomial)
    • Not applicable to v1 Rome v1 books through EP3 InsertTwoSidedBlockTrade, not BTP. Relevant only if RFQ is extended to the Bitnomial edition.
  7. Fee schedule for RFQ fills?
    • Commercial gate Same as standard fees for v1; RFQ-specific rebates are v2 (A-3297 F7). Needs commercial sign-off before scoping.
  8. Risk-check attestation trust — risk_checked_at_ns is unsigned, "stale" is undefined, and gateways can lie.
    • Open Cross-cutting concern (C3 in the RFC). Not yet a blocking question, but shapes what the code looks like if rome needs to independently verify margin.

§4Documentation

The reference material this project is built against — the RFC, Linear epics, and related AX plans.

Internal — design & specs

Linear — epics & tickets

EpicAreaOwnerStatus
A-3209ROME — master epicTinIn Progress
A-3210Core RFQ logic in rome + OG WSTinIn Progress
A-3215ROME skeleton: protocol, IPC, state machineTinIn Review
A-3216ROME ↔ EP3 ↔ Order Gateway integrationTinIn Review
A-3301RFQ visibility, anonymity, targeted makersTinIn Review
A-3294Pre-rollout: size/rate limits, tape, sidesTinBacklog
A-3295Polish & connection-state hardeningTinBacklog
A-3211GUI: requester + responder UXTinBacklog
A-3219Observability: metrics, dashboards, alertsTinBacklog
A-3260Batteries-included testingTinBacklog
A-3296Feature flag + staged rolloutTinBacklog
A-3297V2 advanced features (post-rollout)TinBacklog

All under the Perpetuals exchange project · Features & Functionality milestone.

Feature comparison

A 25-row feature matrix (AX v1 vs Bybit RFQ vs Deribit Block RFQ) is maintained in the RFC (§17). Key v1 gaps: no multi-leg strategies, no quote aggregation, no partial fills, no restart recovery, no fee rebates — all deferred to v2 under A-3297.

Related AX plans

  • Bitnomial DCM Edition (AIEX) — the parallel Bitnomial integration; Rome v1 books through EP3, not BTP.
  • Multi-Accounts — the identity/ownership rekey; not directly blocking Rome but informs account-keying discussions.