ROME (RFQ)
A Request-for-Quote workflow for AX — participants negotiate block-sized trades off the lit book and have them booked atomically in EP3 — built as a new internal service rome that proxies through order_gateway over the existing WebSocket, with no new public network surface.
§0Principles
Five invariants govern the design. Everything in the tracker is mechanics that follows from these.
No new public network surface
Participants speak only to order_gateway over the existing WebSocket. rome is an internal service — it has no public listener, no new ports, no new serialization format. The gateway proxies to rome over a remoc RFn channel on TCP loopback, reusing the same IPC pattern as marketdata-publisher.
EP3 booking is the only cross-process hop on the critical path
The AcceptQuote flow — request creation, quote submission, subscription fanout — is entirely in-process between the gateway and rome. The single unavoidable cross-process latency is ep3_admin.InsertTwoSidedBlockTrade at accept time. Everything else is DashMap lookups and tokio channels.
Latch, don't retry — EP3 does not deduplicate
Connamara confirmed EP3 does not dedup InsertTwoSidedBlockTrade on cross_id or any other field. Any retry on an ambiguous outcome (5xx, DEADLINE_EXCEEDED, crash) guarantees a double-book. The contract: definitive 4xx → roll back to Active; ambiguous → latch to needs_manual_reconciliation, never auto-retry. Operator reconciles.
In-memory matching, ClickHouse audit trail
Live RFQ state lives in-memory in rome (DashMap + per-entry tokio::Mutex). Redis stores durable snapshots only for in-flight Settling requests so a crash is reconcilable. ClickHouse rfq_log is the compliance audit trail. No new database is introduced.
Gateway enforces risk; rome enforces lifecycle
Margin checks run in the gateway before rome is ever called — check_margin_requirement_using_position_cache at every entry point (submit request, submit quote, accept quote). rome owns the state machine and lifecycle transitions; it trusts the gateway's risk attestation via risk_checked_at_ns.
§1Progress & Dependencies
Eight parallel tracks from protocol design through production rollout. The core engine and integration tracks (B, C) are in review; pre-rollout hardening, GUI, and rollout (E–H) are backlog, gated on the core landing. Snapshot 2026-06-08.
Dependency map
How the tracks block one another. Solid arrows are hard blockers; dashed arrows are soft dependencies. The critical path runs B (engine core) → C (integration) → E (hardening) → H (rollout), with D (visibility) and F (GUI) converging before rollout.
Design spec and SDK protocol types. Foundation for all other tracks.
A1ROME RFC — design specTindone›
The founding design document covering architecture, state machine, EP3 booking contract, failure modes, and the feature comparison matrix. Merged 2026-05-28.
- #1858 docs(rome): spec for Request for Order Matching Engine (ROME)
A2SDK public RFQ protocol typesTindone›
Public protocol types in rs/sdk/src/protocol/rfq.rs — RfqRequest, RfqResponse, RfqEvent enums following the existing OrderGatewayRequest convention. Merged 2026-05-29; follow-up SDK work for the order-gateway WS commands continues in open PRs.
rome service
The core rome service: IPC protocol, state machine, matching logic, expiration, ClickHouse writer, and Redis durable state. Epic A-3215 (In Review). Hard-blocks integration (C).
B1Internal IPC protocol, ClickHouse schema, Redis keysTinin review›
Internal IPC types in sdk-internal/src/protocol.rs under ipc::rome: ToRomeIpc, RomeIpcResponse, RomeIpcError, RomeIpcFn. ClickHouse rfq_log table schema and migration. Redis key layout for durable-state snapshots.
- #2069 feat(sdk-internal): add RFQ IPC protocol, ClickHouse log schema, Redis keys
B2Engine skeleton, state machine & IPC tasksTinin review›
The rs/rome/ crate: AppState (DashMap-based), RequestEntry state machine (Active → Settling → Settled / Cancelled / Expired), per-request tokio::Mutex, expiration heap task, batched ClickHouse writer, ID generation (u128 with node prefix), remoc IPC accept loop, and the durable_state Redis writer for Settling snapshots.
B3Property tests & integration testsTinin review›
State-machine property tests and integration tests under rs/rome/tests/: happy-path bid/ask, concurrent accept (exactly-one-winner), EP3 failure branches, expiration, side-enforcement matrix. Uses ax_test_utils containers and Ep3Mock.
- #2071 test(rome): add property tests and integration tests
Wire rome through order_gateway WS and EP3 block-trade booking. Epic A-3216 (In Review). The biggest remaining technical piece — 8 open PRs forming a deep stack.
C1Order-gateway + EP3 + Rome end-to-end wiringTinin review›
Integration branch #1844 (root): connects order_gateway WS dispatch, the RomeIpcFn handle, the ep3-mock block-trade stub, and the local docker compose stack. Stacked PRs break out each layer.
| Sub-step | PR | What |
|---|---|---|
| SDK WS commands | #2094 | Add OrderGatewayRequest::Rfq variants |
| IPC connect helper | #2095 | Rome IPC connect helper for order-gateway |
| ep3-mock block trade | #2096 | Implement InsertTwoSidedBlockTrade in ep3-mock |
| RFQ-log alignment | #2101 | Align rfq_log quantity with ClickHouse schema |
| EP3 booking on accept | #2102 | EP3 block-trade booking on AcceptQuote |
| Wire WS commands | #2103 | Wire RFQ WebSocket commands through Rome IPC |
| Docker compose | #2104 | Add Rome to local docker stack |
Targeted RFQs, protocol-level anonymity, and maker discovery. Epic A-3301 (In Review). Soft-unblocks pre-rollout hardening (E).
D1RFQ visibility, anonymity & targeted makersTinin review›
target_makers: Vec<UserId> on SubmitQuoteRequest for directed RFQs that skip the public stream. disclose_identity: bool for protocol-level anonymity — per-request pseudonym on outgoing events, real user_id in ClickHouse. Maker list API (GET /rfq/makers).
Size/rate limits, public tape, side enforcement, connection-state hardening, forward-compat legs shim. Epics A-3294 (backlog) + A-3295 (backlog). Gates rollout (H).
E1Size limits, rate limits, public tape, quote-side enforcementTinnot started›
P0 pre-rollout blockers (A-3294): per-instrument RFQ minimum block size, per-user submission rate limit, public trade-tape policy (condition: "block"), and the 6-case quote-side enforcement integration matrix.
- No PRs — backlog, gated on core engine and integration landing.
E2Connection-state hardening & forward-compat legs shimTinnot started›
P1/P2 polish (A-3295): close_only on MakerInfo, forward-compat legs: Vec<Leg> shim, counterparty margin re-check on AcceptQuote, cancel-on-disconnect hardening (gateway crash recovery, server-side heartbeat), and the counterparty mid-flow disconnect policy decision.
SubmitQuote and AcceptQuote — cancel the quote (mirroring order behavior) or leave it live? Decision shapes the cancel-on-disconnect implementation. See Q1.
- Superseded / closed
- #1915 feat(rome): cancel-on-disconnect (closed, will be re-done under A-3295)
- No active PRs — backlog.
Requester "Create Strategy" modal, maker selection sidebar, responder inbox, quote composer. Epic A-3211 (Backlog). 7 milestones (G1–G7).
F1GUI milestones G1–G7Tinnot started›
| # | Title | Est | Depends | Ticket |
|---|---|---|---|---|
| G1 | Static modal shell + asset selector + template buttons | S | — | A-3316 |
| G2 | Editable legs table + template pre-population + client-side Greeks | M | G1 | A-3317 |
| G3 | Maker sidebar against stubbed maker list | M | G1 | A-3318 |
| G4 | Wire to order_gateway WS (single-leg via legs shim) | M | G2, G3, A-3295 | A-3319 |
| G5 | Confirm-then-send modal for submit + accept | S | G4 | A-3320 |
| G6 | Hedge leg UI, anonymity toggle, favorites | M | G4 | A-3321 |
| G7 | Responder surface: inbox, quote composer, history | L | G4 | A-3322 |
- Superseded / closed
- #1903 feat(gui): RFQ (closed, will be re-done post-integration)
- No active PRs — backlog, gated on the integration (C) and hardening (E) tracks.
Prometheus metrics, dashboards, alerts, and batteries-included test coverage. Epics A-3219 + A-3260 (Backlog). Gates rollout (H).
G1Observability — metrics, dashboards, alertsTinnot started›
Prometheus metrics (rome_active_requests, rome_accept_latency_seconds, rome_ep3_book_latency_seconds, rome_log_drops_total, etc.), Grafana dashboards, and incident.io alert wiring. A-3219.
- No PRs — backlog.
G2Batteries-included testingTinnot started›
Broader integration and scenario coverage: the 16 required test scenarios from the RFC (§13.1), including connection-state tests (client disconnect, gateway crash, rome restart, gateway restart) that must pass before rollout. A-3260.
- No PRs — backlog.
Last gate before GA. Feature flag on OrderGatewayRequest::Rfq, staged rollout plan, rollback playbook. Epic A-3296 (Backlog).
H1Feature flag + staged production rolloutTinnot started›
Feature flag on the order_gateway WS endpoints so the RFQ surface can be enabled per-environment / per-user-tier. Staged rollout plan: canary → opt-in beta → GA, with rollback playbook. Depends on all Tier 1–2 items, observability, and testing. A-3296.
- No PRs — backlog, last gate.
Notebook
Reference design — the mechanics and reasoning behind the tracker and the open questions. Drawn from the RFC and working decisions. Sections 2a–2f.
§2aArchitecture & data flow
rome is an internal Rust service following the existing AX convention for service-to-service IPC (marketdata-publisher): TCP loopback listener + remoc::Connect::io + a typed RFn shipped over the connection. No .proto, no generated client crate. tonic is used only for the EP3 admin call.
requester WS ─► gateway (R) ──remoc──► rome ──tonic gRPC──► EP3 admin
│ │ │
▼ broadcast │ │
public RFQs ◄─── gateway (P) ◄── responder WS
│
▼ async insert (batched)
ClickHouse rfq_log
Three channel types out of rome:
- Public RFQ events —
tokio::sync::broadcast<PublicRfqEvent>, one perromeprocess. Each gateway holds one subscription and rebroadcasts. Slow consumersLaggedrather than back-pressuring. - User RFQ events —
broadcast::Sender<UserRfqEvent>per user. Directed quotes and fills. Multiple connections per user all see every event. - Log —
mpsc::Sender<RfqLogRow>to the bounded ClickHouse writer. Off the hot path; never blocks matching.
§2bThe AcceptQuote critical path
The only flow that crosses a process boundary and the only one whose correctness depends on a specific lock-release ordering.
| Step | Actor | Operation | Lock state |
|---|---|---|---|
| 1 | Client | WS frame AcceptQuote { quote_id, side } | — |
| 2 | Gateway | Risk-check acceptor + counterparty | local margin cache (read) |
| 3 | Gateway | RomeIpcFn::call(AcceptQuote) | — |
| 4 | rome | Resolve quote_id → request_id; acquire per-entry mutex | mutex held |
| 5 | rome | Verify Active; flip → Settling; persist cross_id to Redis | mutex held |
| 6 | rome | Drop mutex | released |
| 7 | rome → EP3 | InsertTwoSidedBlockTrade { cross_id } | no locks |
| 8 | rome | Reacquire mutex; flip → Settled { trade_id } | mutex held |
| 9 | rome | Broadcast Filled on both user channels; log; return ack | clone-and-release |
Settling during the gap, concurrent AcceptQuote and Cancel on the same request both reject cleanly with AlreadySettling.
§2cEP3 booking & the no-retry contract
AdminAPI.InsertTwoSidedBlockTrade is the only EP3 RPC rome calls. Connamara confirmed (2026-05-26) that EP3 does not deduplicate this RPC on cross_id or any other field. Identical retry produces two block trades with distinct trade IDs.
rome→EP3 retry on an ambiguous outcome — 5xx, DEADLINE_EXCEEDED, mid-call rome crash, lost response — guarantees a double-book. There is no client-side trick (fresh ULID, retry tokens, etc.) that recovers safety. Anti-patterns explicitly ruled out: retrying with a fresh cross_id, retrying with the same cross_id, and auto-retrying 5xx with exponential backoff.
The contract:
- Definitive 4xx (e.g.
INVALID_ARGUMENT,FAILED_PRECONDITION) — EP3 did not commit. Roll backSettling → Active, return typed reject, client may retry. - Ambiguous outcome (5xx,
DEADLINE_EXCEEDED, crash, lost response) — leave theSettlingsnapshot in Redis, logtrade_book_ambiguous, return typed error. No auto-retry. Operator reconciles.
cross_id is a fresh ULID per cross, minted at accept time, persisted on the Settling snapshot before the EP3 call. It is a correlation key, not an idempotency token. It enables future SearchOrders-based reconcile if the vendor confirms propagation to order records.
§2dState machine & concurrency
Live RFQ state lives in-memory in rome. No new database is introduced.
| Store | What | Lifetime |
|---|---|---|
DashMap<RequestId, Arc<RequestEntry>> | Per-request state machine | In-memory; lost on restart (v1) |
DashMap<QuoteId, RequestId> | Quote-to-request index | In-memory |
Redis (durable_state) | Settling snapshots with cross_id | Short-lived; survives rome crash |
ClickHouse rfq_log | Compliance audit trail | Permanent |
RequestState transitions: Active → Settling { accepted } → Settled { trade_id } (happy path), with Cancelled, Expired, and needs_manual_reconciliation as terminal/stuck states. A single request's lifecycle is serialized by its per-entry tokio::sync::Mutex (held across await in the AcceptQuote critical section). DashMap reads and inserts are lock-free across requests.
§2eRisk checks & gateway enforcement
Risk checks are enforced in the gateway, not in rome. rome never calls EP3 for margin — it trusts the gateway's attestation.
| Entry point | What's checked |
|---|---|
SubmitQuoteRequest | Requester can honor any fill: synthetic worst-case at last mark × slippage buffer, on each requested side. |
SubmitQuote | Responder can honor a fill at the offered price on each side they offered. Two checks if two-sided. |
AcceptQuote | Acceptor at the locked price on the accepted side. Counterparty re-checked if reachable on the same gateway shard. |
If any check fails the gateway short-circuits with RfqReject and never calls rome. Risk-check attestation lives in each proxy struct (risk_checked_at_ns) since remoc has no headers.
§2fFailure modes & retry safety
Five layers where a "retry" can fire in the end-to-end booking flow. Only Layer 4 (rome→EP3) can reach EP3 unguarded; the policy is "never auto-retry on ambiguous."
| Layer | When it fires | Double-book risk? | Protection |
|---|---|---|---|
| 1. User clicks Accept twice | UI debounce miss | No | State machine: second click sees Settling/Settled |
| 2. Client auto-retry on reconnect | WS drops mid-Accept | No | After Accept, quote removed from quotes_index; retry returns QuoteNotFound |
| 3. Gateway → rome IPC retry | remoc connection drops | No | Symmetric to Layer 2 |
| 4. rome → EP3 retry | Tonic 5xx / timeout / crash | Yes — load-bearing | Latch-and-page; no auto-retry |
| 5. rome restart with Settling on disk | OOM, panic, deploy | No (no auto-retry) | durable_state::load_requests routes to manual-recon |
QuoteNotFound. Recovery requires a client_accept_id + GetAcceptStatus query, analogous to clord_id + GetOrderStatus on the order path. Tracked separately; does not block V1 correctness.
§3Open Questions
The questions the code can't answer. Each carries its current resolution status.
-
Counterparty disconnect between
SubmitQuoteandAcceptQuote— cancel the quote, or leave it live?- Needs discussion Current proposal: cancel the quote when the responder's WS drops, mirroring order behavior. Decision unblocks the cancel-on-disconnect design in A-3295 item 5.
-
Two-sided requests — can a requester partially accept (accept the bid, leave the ask live)?
- Decided — no Accepting closes the entire request. Partial acceptance deferred to v2 (A-3297 F3).
-
Does EP3 deduplicate
InsertTwoSidedBlockTrade?- Resolved — no dedup Connamara confirmed (2026-05-26): EP3 does not dedup on
cross_idor any other field. Identical retry produces two block trades. Theidempotency_keypattern fromAdjustAccountBalanceRequestis not exposed on this RPC. Policy: latch-and-page on ambiguous outcomes (§2c).
- Resolved — no dedup Connamara confirmed (2026-05-26): EP3 does not dedup on
-
Does EP3 commit-before-respond, and does
cross_idpropagate to drop-copy order records?- Open — vendor follow-up Sub-questions (iii) and (iv) from Q9. Affects whether drop-copy self-healing is viable (auto-transition
Settling → Settledby matchingExecution.order.cross_id). Does not block V1 — operator reconciliation is the contract regardless.
- Open — vendor follow-up Sub-questions (iii) and (iv) from Q9. Affects whether drop-copy self-healing is viable (auto-transition
-
Multi-gateway counterparty re-check on
AcceptQuote— what if the counterparty is on a different shard?- Open Current behavior: if the counterparty's user replica is not reachable on the caller's gateway, rome refuses the accept. UX cliff at scale. Partially tracked under A-3295.
-
Does BTP support
GTCorpost_onlytime-in-force? (for future RFQ-on-Bitnomial)- Not applicable to v1 Rome v1 books through EP3
InsertTwoSidedBlockTrade, not BTP. Relevant only if RFQ is extended to the Bitnomial edition.
- Not applicable to v1 Rome v1 books through EP3
-
Fee schedule for RFQ fills?
- Commercial gate Same as standard fees for v1; RFQ-specific rebates are v2 (A-3297 F7). Needs commercial sign-off before scoping.
-
Risk-check attestation trust —
risk_checked_at_nsis unsigned, "stale" is undefined, and gateways can lie.- Open Cross-cutting concern (C3 in the RFC). Not yet a blocking question, but shapes what the code looks like if rome needs to independently verify margin.
§4Documentation
The reference material this project is built against — the RFC, Linear epics, and related AX plans.
Internal — design & specs
- RFC: ROME — Request for Orders Matching Engine — the founding design (
docs/rfc/rome.md); this plan tracks its execution.
Linear — epics & tickets
| Epic | Area | Owner | Status |
|---|---|---|---|
| A-3209 | ROME — master epic | Tin | In Progress |
| A-3210 | Core RFQ logic in rome + OG WS | Tin | In Progress |
| A-3215 | ROME skeleton: protocol, IPC, state machine | Tin | In Review |
| A-3216 | ROME ↔ EP3 ↔ Order Gateway integration | Tin | In Review |
| A-3301 | RFQ visibility, anonymity, targeted makers | Tin | In Review |
| A-3294 | Pre-rollout: size/rate limits, tape, sides | Tin | Backlog |
| A-3295 | Polish & connection-state hardening | Tin | Backlog |
| A-3211 | GUI: requester + responder UX | Tin | Backlog |
| A-3219 | Observability: metrics, dashboards, alerts | Tin | Backlog |
| A-3260 | Batteries-included testing | Tin | Backlog |
| A-3296 | Feature flag + staged rollout | Tin | Backlog |
| A-3297 | V2 advanced features (post-rollout) | Tin | Backlog |
All under the Perpetuals exchange project · Features & Functionality milestone.
Feature comparison
A 25-row feature matrix (AX v1 vs Bybit RFQ vs Deribit Block RFQ) is maintained in the RFC (§17). Key v1 gaps: no multi-leg strategies, no quote aggregation, no partial fills, no restart recovery, no fee rebates — all deferred to v2 under A-3297.
Related AX plans
- Bitnomial DCM Edition (AIEX) — the parallel Bitnomial integration; Rome v1 books through EP3, not BTP.
- Multi-Accounts — the identity/ownership rekey; not directly blocking Rome but informs account-keying discussions.