ROME (RFQ) Design Proposal

Rome Request for Orders Matching Engine

Status Implementation in progress M1M3 landing this sprint; M4M7 + GUI tracked under tier tickets
Owner @tin
Created 2026-05-06
Last updated 2026-05-28

0. Executive summary

ROME adds an RFQ (request-for-quote) workflow to AX so participants can negotiate block-sized trades off the lit book and have them booked atomically in EP3.

Architecture in one paragraph. Participants speak only to order_gateway over the existing WebSocket. The gateway proxies to a new internal service rome over a remoc RFn channel after running the same risk checks it already does for orders. rome owns the per-request state machine and the in-memory request/quote indexes; it is the only component that talks to EP3's AdminAPI.InsertTwoSidedBlockTrade. ClickHouse rfq_log is the audit trail. No new public network surface.

What's decided. Anonymity is a protocol-level flag, not a UI cosmetic. Targeted RFQs route through the per-user directed channel and skip the public stream. Quote prices are strategy-net (when multi-leg lands), not per-leg. Cancel-on-disconnect mirrors order behavior. Public tape policy: emit RFQ fills with condition: "block" (Bybit/Deribit parity).

What's open. Counterparty mid-flow disconnect policy (A-3295 item 5). Multi-rome sharding strategy (v2). Whether two-sided requests can be partially accepted (currently no). Fee/rebate schedule (commercial gate, A-3297 F7).

What's explicitly v2. Multi-leg strategies, hedge legs, quote aggregation across makers, quote amendment without cancel+resubmit, maker quality scoring, restart recovery, fee rebates, off-protocol OTC accept. All tracked in A-3297.


1. Scope

Add an RFQ workflow to AX. Exchange participants submit a QuoteRequest, other participants subscribe to a public stream of requests and post Quote responses, the requester accepts one quote, and the resulting trade is booked in EP3.

The matching logic and per-request state machine live in a new service rome (Request for Orders Matching Engine). rome is internal it has no public network surface. Participants reach it over the existing order_gateway WebSocket, which proxies after running the same kind of risk checks it already does for orders.

1.1 In scope (v1)

1.2 Out of scope (v1, deferred to v2 A-3297)

1.3 Non-goals


2. High-level data flow

End-to-end picture: requester on the left, rome in the middle, EP3 on the right. Every client message goes through order_gateway; rome has no public surface.

   requester WS ─► gateway (R) ──remoc──► rome ──tonic gRPC──► EP3 admin
                       │                  │              │
                       ▼ broadcast        │              │
              public RFQs ◄─── gateway (P) ◄── responder WS
                                          │
                                          ▼ async insert (batched)
                                     ClickHouse rfq_log

   directed events (QuoteReceived, Filled) flow back through per-user channels

2.1 Channel types out of rome

2.2 Step-by-step happy path

Phase 1 request creation

  1. Requester's UI sends RfqRequest::SubmitQuoteRequest{symbol, qty, req_bids, req_asks, expiration} as a JSON WebSocket frame to its connected order_gateway.
  2. order_gateway parses, validates the session, and runs check_margin_requirement_using_position_cache against a synthetic worst-case fill at the requested size on each requested side, using the local position_cache and mark_price_cache. If margin is insufficient it short-circuits with RfqReject; rome is never called.
  3. order_gateway invokes the shared RomeIpcFn with ToRomeIpc::SubmitQuoteRequest(SubmitQuoteRequestProxy{ caller_user_id, caller_session_id, risk_checked_at_ns, req }). This is a remoc RFn call over the existing TCP-loopback connection no new serialization per call beyond the bincode-shaped remoc framing.
  4. rome mints a request_id, builds RequestEntry { state: Active{quotes:[], expires_at_ns} }, inserts it into the requests DashMap, and pushes the deadline onto the expiration heap.
  5. rome broadcasts PublicRfqEvent::QuoteRequestPosted and sends RfqLogRow{event_type=request_submitted} to the bounded log channel.
  6. rome returns SubmitQuoteRequestAck { request_id }. Gateway echoes SubmitQuoteRequestResponse to the requester's WS.

Phase 2 fanout to subscribers

  1. Each order_gateway process holds one long-lived Rome::SubscribeRfqEvents { user_id: None } stream. The new event arrives on every gateway's listener task and is rebroadcast on the per-process broadcast::Sender<RfqEvent>.
  2. Every WS client subscribed via RfqRequest::SubscribeQuoteRequests receives the event as RfqEvent::QuoteRequestPosted.

Phase 3 quote submission

  1. A responder's UI sends RfqRequest::SubmitQuote{request_id, bid?, ask?, expiration}.
  2. The responder's gateway runs check_margin_requirement_using_position_cache for the offered side(s) at the offered price. On failure it short-circuits.
  3. Gateway issues Rome::SubmitQuote.
  4. rome looks up the RequestEntry in the DashMap, takes its tokio::sync::Mutex, verifies state is Active, mints quote_id, appends to the quotes SmallVec, releases the mutex, and pushes the quote's expiration onto the heap.
  5. rome looks up the requester's user_streams entry and broadcasts UserRfqEvent::QuoteReceived on it.
  6. rome logs quote_submitted and returns SubmitQuoteAck { quote_id }.
  7. The responder's gateway echoes SubmitQuoteResponse back to the responder.
  8. The requester's gateway receives the directed event via its Rome::SubscribeRfqEvents { user_id: Some(requester) } listener and forwards it to the requester's WS as RfqEvent::QuoteReceived.

Phase 4 accept & book (the only EP3-bound critical path)

  1. The requester's UI sends RfqRequest::AcceptQuote{quote_id, side}.
  2. The requester's gateway risk-checks the acceptor at the locked price for the side being accepted. If the counterparty's user replica is reachable on this gateway, it re-checks the counterparty's margin too.
  3. Gateway issues Rome::AcceptQuote.
  4. rome resolves quote_id → request_id via quotes_index, takes the RequestEntry mutex, and verifies state is Active and the quote is still present and unexpired.
  5. rome mints a fresh ULID cross_id, persists Settling { accepted, cross_id } to Redis via durable_state so a rome crash from this point onwards is reconcilable, transitions Active → Settling { accepted: quote_id }, and drops the mutex.
  6. rome calls ep3_admin.insert_two_sided_block_trade(...) with the just-minted ULID populated on TwoSidedBlockTrade.cross_id. cross_id is not a dedup token Connamara confirmed EP3 does not deduplicate this RPC on any field (see §10) but it propagates onto the resulting Order records that every Execution on drop-copy carries (api.proto:100, 154, 157), enabling future self-healing of ambiguous outcomes (see step 28 and §12.1). This is the only RPC on the critical path that crosses a process boundary.
  7. EP3 books the trade and returns trade_id (happy path; failure branches below).
  8. rome retakes the mutex, transitions to Settled { trade_id }, removes the entry from the active maps, and removes the Settling snapshot from Redis.
  9. rome broadcasts UserRfqEvent::Filled on both parties' user channels and logs trade_booked.
  10. rome returns AcceptQuoteAck { trade_id }. The requester's gateway echoes AcceptQuoteResponse to the requester.
  11. Both parties receive RfqEvent::Filled over their WS via their respective Rome::SubscribeRfqEvents { user_id: Some(..) } listeners.
  12. Independently, EP3 emits an Execution on the drop-copy stream for each side of the booked block trade. Each Execution carries an embedded Order with block_trade_indicator = true, the propagated cross_id, and the EP3-generated trade_id. order_gateway's existing dropcopy handler (lib.rs:61) updates position_cache for both parties so the next margin check after this trade already reflects the new positions, with no additional plumbing from rome. This same drop-copy stream is the substrate for the future self-healing path described in the ambiguous-outcome branch below.

Failure branches (replacing the original "may retry" line see §7.4.1 for the state-machine view and §12.1 for the full layered retry-safety analysis):

Other layers where a "retry" can fire (user double-click, client WS reconnect with resubmit, gateway rome IPC retry) do not risk a double-book rome's state machine cleans up quotes_index and requests on the Settled transition, so a retried Accept for the same quote_id returns QuoteNotFound. They do, however, leave the client unable to learn the trade outcome after a reconnect; see §12.1 for the recovery-UX gap and its planned mitigation (client_accept_id + GetAcceptStatus query, analogous to AX's existing clord_id + GetOrderStatus pattern on the order path).

2.3 Sequence diagram

sequenceDiagram
    autonumber
    actor R as Requester
    participant Gr as order_gateway (R)
    participant Ro as rome
    participant Rs as Redis (durable_state)
    participant CH as ClickHouse
    participant Gp as order_gateway (P)
    actor P as Provider
    participant E as EP3 admin

    %% Phase 1 — request creation
    R->>Gr: WS SubmitQuoteRequest
    Gr->>Gr: check_margin (requester)
    Gr->>Ro: remoc RFn SubmitQuoteRequest
    Ro->>Ro: insert RequestEntry(Active)
    Ro--)Rs: persist Active snapshot (async)
    Ro--)CH: log request_submitted (async)
    Ro--)Gp: public RFQ QuoteRequestPosted
    Ro-->>Gr: Ack { request_id }
    Gr-->>R: WS SubmitQuoteRequestResponse

    %% Phase 2 — fanout
    Gp-->>P: WS RfqEvent::QuoteRequestPosted

    %% Phase 3 — quote submission
    P->>Gp: WS SubmitQuote
    Gp->>Gp: check_margin (provider)
    Gp->>Ro: remoc RFn SubmitQuote
    Ro->>Ro: lock entry, append Quote, unlock
    Ro--)Rs: persist updated snapshot (async)
    Ro--)CH: log quote_submitted (async)
    Ro--)Gr: directed QuoteReceived (user channel)
    Ro-->>Gp: Ack { quote_id }
    Gp-->>P: WS SubmitQuoteResponse
    Gr-->>R: WS RfqEvent::QuoteReceived

    %% Phase 4 — accept & book (only EP3-bound critical path)
    R->>Gr: WS AcceptQuote { quote_id, side }
    Gr->>Gr: check_margin (acceptor + counterparty)
    Gr->>Ro: remoc RFn AcceptQuote
    Ro->>Ro: lock; mint cross_id (ULID); Active → Settling { cross_id }
    Ro->>Rs: persist Settling snapshot (before EP3 call)
    Ro->>Ro: unlock
    Ro->>E: InsertTwoSidedBlockTrade { cross_id }

    alt happy path (EP3 booked, response received)
        E-->>Ro: trade_id
        Ro->>Ro: lock; Settling → Settled; unlock
        Ro--)Rs: remove Settling snapshot
        Ro--)CH: log trade_booked
        Ro--)Gr: directed Filled
        Ro--)Gp: directed Filled
        Ro-->>Gr: Ack { trade_id }
        Gr-->>R: WS AcceptQuoteResponse
        Gr-->>R: WS RfqEvent::Filled
        Gp-->>P: WS RfqEvent::Filled
        E--)Gr: drop-copy Execution { order.cross_id, order.block_trade_indicator, trade_id } → position_cache
        E--)Gp: drop-copy Execution → position_cache
    else definitive 4xx reject (EP3 did not commit)
        E-->>Ro: tonic::Status (INVALID_ARGUMENT, FAILED_PRECONDITION, …)
        Ro->>Ro: lock; Settling → Active; unlock
        Ro--)Rs: remove Settling snapshot
        Ro--)CH: log trade_book_failed
        Ro-->>Gr: typed reject
        Gr-->>R: WS RfqReject (quote alive, client may retry)
    else ambiguous outcome (5xx / DEADLINE_EXCEEDED / crash / lost response)
        Note over Ro,E: EP3 may or may not have committed.<br/>NO auto-retry — Connamara confirmed EP3 does not dedup (§10).
        Ro->>Ro: lock; Settling → needs_manual_reconciliation; unlock
        Note over Rs: Settling snapshot stays in Redis (operator-visible)
        Ro--)CH: log trade_book_ambiguous
        Ro-->>Gr: typed error
        Gr-->>R: WS RfqReject
        Note over E,Ro: Planned self-healing (pending vendor cross_id propagation confirmation):<br/>rome subscribes to drop-copy, matches Execution.order.cross_id<br/>against persisted cross_id → auto-transition to Settled.<br/>Operator path is the timeout fallback.
    end

Solid arrows are synchronous request-response (remoc RFn for gateway↔︎rome, tonic gRPC for rome↔︎EP3, JSON-over-WS for client↔︎gateway). Dashed arrows (-->>) are the corresponding responses. Async-fire arrows (--)) denote events that don't block the caller remoc broadcast/mpsc subscriptions, log inserts, durable-state writes via the dedicated Redis writer task, and EP3's pre-existing drop-copy stream. The position_cache updates in the happy-path branch flow through machinery that already exists for plain orders; rome itself never pushes them. The Phase 4 alt block enumerates the three outcomes from §2.2 step 23: happy path, definitive 4xx (safe to roll back), and ambiguous (latch to manual reconciliation see §10 and §12.1). A rome crash anywhere between the Settling persist and the response handler is recovered on restart via durable_state::load_requests, which routes Settling snapshots to the same needs_manual_reconciliation state as in-process ambiguous outcomes.


3. Component layout

rome follows the existing AX convention for internal Rust↔︎Rust services (marketdata-publisher): TCP loopback listener + remoc::Connect::io + a RFn shipped over the connection. No .proto, no generated client crate. tonic is used only for the EP3 admin call, because EP3 is the third-party Connamara service.

rs/
  rome/                       (new)
    Cargo.toml                deps: remoc, ax-sdk-internal, ax-ep3, klickhouse, ...
    src/
      main.rs                 clap config + run()
      lib.rs                  bootstrap: remoc listener, EP3 admin client, ClickHouse pool
      ipc.rs                  accept loop + per-connection RFn handler (mirrors
                              marketdata-publisher/src/ipc.rs)
      handlers.rs             one async fn per ToRomeIpc variant
      state.rs                AppState — all in-memory RFQ state
      matching.rs             RequestEntry state machine
      expiration.rs           single timer-driven expiry task
      clickhouse_writer.rs    batched async-insert writer task
      id_gen.rs               node-prefixed monotonic IDs
      metrics.rs

  sdk/src/protocol/
    rfq.rs                    (new) public WS-facing types (serde_json)

  sdk-internal/src/protocol.rs
    ipc::rome                 (new submodule) ToRomeIpc / RomeIpcResponse /
                              RomeIpcError / RomeIpcFn — internal IPC types,
                              alongside the existing ipc::* MdPub types

  order-gateway/src/
    rfq.rs                    (new) WS handlers; proxies to rome via the
                              shared RomeIpcFn handle
    state.rs                  (edit) add RomeIpcClient + public RFQ receiver task
    ws_service.rs             (edit) dispatch new RfqRequest variants

  sdk-internal/src/clickhouse/
    schema.rs                 (edit) add ChRfqLog row + insert helpers
    migrations/               (edit) add `rfq_log` table

  ep3/                        (no change beyond using existing
                               AdminAPI.InsertTwoSidedBlockTrade — tonic)

rome is added to the workspace members list in rs/Cargo.toml. There is no separate rome-client crate the gateway imports RomeIpcFn and the request/response/error types directly from ax-sdk-internal.


4. Public protocol (SDK)

rs/sdk/src/protocol/rfq.rs adds the WebSocket-facing types. They follow the existing convention from order_gateway.rs:27 single-letter t tag for the request enum, untagged response, single-letter field names where bandwidth matters.

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "t")]
pub enum RfqRequest {
    #[serde(rename = "qr")]   SubmitQuoteRequest(SubmitQuoteRequest),
    #[serde(rename = "xqr")]  CancelQuoteRequest(CancelQuoteRequest),
    #[serde(rename = "sqr")]  SubscribeQuoteRequests(SubscribeQuoteRequests),
    #[serde(rename = "uqr")]  UnsubscribeQuoteRequests,
    #[serde(rename = "q")]    SubmitQuote(SubmitQuote),
    #[serde(rename = "xq")]   CancelQuote(CancelQuote),
    #[serde(rename = "aq")]   AcceptQuote(AcceptQuote),
}

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(untagged)]
pub enum RfqResponse {
    SubmitQuoteRequestResponse(SubmitQuoteRequestResponse),
    SubmitQuoteResponse(SubmitQuoteResponse),
    AcceptQuoteResponse(AcceptQuoteResponse),
    CancelAck(CancelAck),
    Reject(RfqReject),
}

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "t")]
pub enum RfqEvent {
    #[serde(rename = "QR")]   QuoteRequestPosted(QuoteRequestPosted),    // public RFQ stream
    #[serde(rename = "XQR")]  QuoteRequestRemoved(QuoteRequestRemoved),  // public RFQ stream
    #[serde(rename = "Q")]    QuoteReceived(QuoteReceived),              // directed to requester
    #[serde(rename = "XQ")]   QuoteRemoved(QuoteRemoved),                // directed to requester
    #[serde(rename = "F")]    Filled(Filled),                            // directed to both sides
}

pub struct SubmitQuoteRequest {
    pub symbol: Symbol,
    pub quantity: Decimal,
    pub req_asks: bool,
    pub req_bids: bool,
    pub expiration: DateTime<Utc>,
    pub client_request_id: Option<u64>,   // echoed back, for dedupe by sender
    // landed in #1845 (anonymity + targeting)
    pub disclose_identity: bool,
    pub target_makers: Vec<UserId>,       // empty = public; non-empty = directed only
}

pub struct SubmitQuote {
    pub request_id: RequestId,
    pub bid: Option<Decimal>,
    pub ask: Option<Decimal>,
    pub expiration: DateTime<Utc>,
    pub client_quote_id: Option<u64>,
}

pub struct AcceptQuote {
    pub quote_id: QuoteId,
    pub side: Side,            // disambiguates two-sided quotes
}

RequestId and QuoteId are u128 carrying the node prefix in the high bits (see §11).

4.1 Forward-compatibility note (legs shim)

A-3295 (Tier 2) introduces a forward-compat legs: Vec<Leg> representation behind the existing single-instrument shape so that the v2 multi-leg work doesn't break the wire. Single-leg v1 clients deserialize unchanged; multi-leg requests are accepted by the wire and rejected by the engine with not_implemented until A-3297 F1 lands.


5. Internal IPC protocol (remoc)

Internal-only. Types live in sdk-internal/src/protocol.rs under a new ipc::rome submodule, alongside the existing ipc module that already holds the marketdata-publisher IPC types (ToMdPubIpc, MdPubIpcResponse, MdPubIpcError, MdPubIpcFn).

// sdk-internal/src/protocol.rs (new submodule)

pub mod ipc::rome {
    use remoc::{rch, rfn, prelude::*};
    use serde::{Serialize, Deserialize};

    pub type RomeIpcFn =
        rfn::RFn<(ToRomeIpc,), Result<RomeIpcResponse, RomeIpcError>>;

    #[derive(Debug, Clone, Serialize, Deserialize)]
    pub enum ToRomeIpc {
        SubmitQuoteRequest(SubmitQuoteRequestProxy),
        CancelQuoteRequest(CancelQuoteRequestProxy),
        SubmitQuote(SubmitQuoteProxy),
        CancelQuote(CancelQuoteProxy),
        AcceptQuote(AcceptQuoteProxy),
        SubscribeRfqEvents { user_id: Option<UserId> },
        ListActiveQuoteRequests,
        ListMakers,                                  // landed in #1845
        CancelAllForUser { user_id: UserId },
    }

    #[derive(Debug, Serialize, Deserialize)]
    pub enum RomeIpcResponse {
        SubmitQuoteRequestAck { request_id: RequestId },
        SubmitQuoteAck        { quote_id: QuoteId },
        AcceptQuoteAck        { trade_id: String },
        Ack,
        ActiveRequests(Vec<ActiveRequestSummary>),
        Makers(Vec<MakerInfo>),

        // channel-of-channels: rome ships a receiver back over the wire
        PublicRfqEventsSubscription {
            snapshot: Vec<ActiveRequestSummary>,
            rx: rch::broadcast::Receiver<PublicRfqEvent>,
        },
        UserRfqEventsSubscription {
            rx: rch::mpsc::Receiver<UserRfqEvent>,
        },
    }

    #[derive(Debug, Serialize, Deserialize, thiserror::Error)]
    pub enum RomeIpcError {
        #[error("request not found")]                  RequestNotFound,
        #[error("quote not found")]                    QuoteNotFound,
        #[error("request already settling or settled")] AlreadySettling,
        #[error("expired")]                            Expired,
        #[error("ep3 booking failed: {0}")]            BookFailed(String),
        #[error("internal: {0}")]                      Internal(String),
    }

    pub struct SubmitQuoteRequestProxy {
        pub caller_user_id: UserId,
        pub caller_session_id: SessionId,
        pub risk_checked_at_ns: i64,   // gateway-attested risk check timestamp
        pub req: SubmitQuoteRequest,   // public type from sdk::protocol::rfq
    }
    // ...other *Proxy structs analogously
}

Server side mirrors marketdata-publisher/src/ipc.rs:66–92: bind a TCP listener on loopback, accept connections, set up remoc::Connect::io, build a RomeIpcFn (an rfn::RFn::new_1 closure that dispatches on ToRomeIpc), send it to the client over a rch::base::Sender<RomeIpcFn>.

Client side mirrors the existing connect_to_md_pub helper in sdk-internal/src/protocol.rs:194–215: connect TCP, set up remoc::Connect::io, receive the RomeIpcFn, hand it to the gateway's AppState.

5.1 Why streams are channels-of-channels, not separate RFns

SubscribeRfqEvents { user_id: None } returns a RomeIpcResponse::PublicRfqEventsSubscription { snapshot, rx } where rx: rch::broadcast::Receiver<PublicRfqEvent>. remoc transparently ships the receiver across the connection; on the gateway side it's just a broadcast::Receiver you can .recv() from. Slow consumers get broadcast::error::RecvError::Lagged, exactly mirroring tokio::sync::broadcast. This is cleaner than a server-stream RPC because the snapshot and the subscription handle arrive atomically in a single response no separate "snapshot then resume" handshake needed.

SubscribeRfqEvents { user_id: Some(user_id) } returns rch::mpsc::Receiver<UserRfqEvent> because directed events are point-to-point (per-user) rather than fanned out; mpsc makes back-pressure explicit so a stuck gateway can't make rome buffer unboundedly.

Risk-check attestation lives in each proxy struct (risk_checked_at_ns) rather than as a transport header, since remoc has no headers. rome refuses requests where it's missing or stale.


6. order_gateway integration

rs/order-gateway/src/state.rs gains:

pub struct AppState {
    // ...existing fields...
    pub rome: RomeIpcFn,                          // remoc RFn handle, shared (Clone)
    pub public_rfq_events_tx: broadcast::Sender<RfqEvent>, // one per gateway process
    pub directed_event_streams: DashMap<UserId, broadcast::Sender<RfqEvent>>,
}

RomeIpcFn is Clone (an rfn::RFn is just a handle to the remoted closure) so handler tasks can clone-and-call without coordination.

Two long-lived background tasks per gateway process. The public RFQ listener, started at gateway boot:

let resp = state.rome.call(ToRomeIpc::SubscribeRfqEvents { user_id: None }).await?;
let RomeIpcResponse::PublicRfqEventsSubscription { snapshot, mut rx } = resp else { bail!(...) };
state.public_rfq_initial_snapshot.store(snapshot);
while let Ok(event) = rx.recv().await {
    let _ = state.public_rfq_events_tx.send(event_into_ws(event));
}
// on Lagged or connection drop: reconnect with backoff and re-snapshot

A per-user listener spawned lazily on first RFQ subscribe for that user and torn down when their last WS connection closes:

let resp = state.rome.call(ToRomeIpc::SubscribeRfqEvents { user_id: Some(user_id) }).await?;
let RomeIpcResponse::UserRfqEventsSubscription { mut rx } = resp else { bail!(...) };
while let Some(event) = rx.recv().await {
    state.directed_event_streams.get(&user_id).map(|tx| tx.send(event_into_ws(event)));
}

ws_service.rs dispatches new variants in the existing match on OrderGatewayRequest. Decision (D2): nest under OrderGatewayRequest::Rfq(RfqRequest) to keep the top-level enum focused on order entry.

6.1 Risk checks

Enforced in the gateway, not in rome.

If any check fails the gateway short-circuits with an RfqReject and never calls rome.

The full counterparty re-check on AcceptQuote (M5 in the original rollout plan) is tracked as part of A-3295.


7. Rome service architecture

7.1 State

pub struct AppState {
    pub requests: DashMap<RequestId, Arc<RequestEntry>>,
    pub quotes_index: DashMap<QuoteId, RequestId>,
    pub user_streams: DashMap<UserId, broadcast::Sender<UserRfqEvent>>,
    pub public_rfq_events: broadcast::Sender<PublicRfqEvent>,
    pub expirations: Arc<Mutex<BinaryHeap<Reverse<(i64, ExpiryKey)>>>>,
    pub log_tx: mpsc::Sender<RfqLogRow>,
    pub ep3_admin: ax_ep3::Ep3Client,
    pub id_gen: IdGen,
}

pub struct RequestEntry {
    pub immutable: RequestImmutable,
    pub state: tokio::sync::Mutex<RequestState>,
}

pub enum RequestState {
    Active   { quotes: SmallVec<[Quote; 4]>, expires_at_ns: i64 },
    Settling { accepted: QuoteId },
    Settled  { trade_id: String },
    Cancelled,
    Expired,
}

DashMap for sharded lock-free lookup (matches marketdata-publisher). tokio::sync::Mutex on RequestEntry.state because the lock is held across an await in the AcceptQuote critical section. parking_lot elsewhere, per the existing convention.

7.2 Concurrency model

7.3 Hot-path data flows

SubmitQuoteRequest. WS frame → gateway.parse → check_margin → RomeIpcFn::call(SubmitQuoteRequest) → rome mints request_id → DashMap insert → publish on public RFQ broadcast → push expiration → log → return SubmitQuoteRequestAck → gateway emits RfqResponse on the WS. No EP3 hop.

SubscribeQuoteRequests. Gateway already holds one process-wide public RFQ subscription returned by RomeIpcFn::call(SubscribeRfqEvents { user_id: None }). On client subscribe: gateway hands the client the cached initial snapshot, then begins forwarding from its in-process public_rfq_events_tx broadcast. No new remoc subscription per WS client; one per gateway process.

SubmitQuote. WS frame → gateway risk-check → RomeIpcFn::call(SubmitQuote) → take request mutex → verify Active → mint quote_id → push to quotes vec → release mutex → publish UserRfqEvent on requester's user mpsc → push expiration → log → SubmitQuoteAck. No EP3 hop.

AcceptQuote critical path.

  1. WS frame gateway risk-checks acceptor (and counterparty if reachable).
  2. Rome::AcceptQuote.
  3. Take request mutex.
  4. Verify state is Active and the named quote is still present and unexpired.
  5. Transition to Settling { accepted: quote_id }. Drop the mutex.
  6. Call ep3_admin.insert_two_sided_block_trade(...). This is the unavoidable cross-process latency.
  7. Retake the mutex.
  8. On success: transition to Settled { trade_id }, broadcast Filled on both parties' directed channels, log, return AcceptQuoteAck { trade_id }.
  9. On failure: transition back to Active, log a trade_book_failed row, return a typed error so the gateway can reject the WS frame.

Why release-around-RPC is required. Holding the mutex across the EP3 call would serialize all activity on that request (including new quotes from other responders, which are valid up until we successfully book) for as long as EP3 takes to ack. Because the state is Settling during the gap, concurrent AcceptQuote and Cancel on the same request both reject cleanly.

Cancel (request or quote). Take the per-request mutex, transition to Cancelled (or remove the quote from the vec), broadcast QuoteRequestRemoved or QuoteRemoved, log, return.

7.4 State transition reference

The following three tables formalize the request lifecycle, the AcceptQuote critical-path ordering, and the cross-cutting concerns that span multiple transitions. They exist to make implementation review concrete and to give each open design question a specific row to attach to. Test references map to the scenarios in §13.1.

7.4.1 RequestState transitions

From Trigger To Preconditions Side effects Locking Tests Considerations
SubmitQuoteRequest Active gateway risk check passed; not expired broadcast QuoteRequestPosted; push expiration; log request_submitted DashMap insert (lock-free) #1, #2 Risk-check trust (C3); client-retry dedupe (C7)
Active SubmitQuote Active request in Active; quote unexpired; gateway risk-checked responder append to quotes vec; broadcast directed QuoteReceived; push quote expiration; log quote_submitted per-entry tokio::Mutex (brief) #1, #3 Quote-late-arrival reject; price moved since check
Active CancelQuote Active quote present and owned by caller remove from vec; broadcast QuoteRemoved; log per-entry mutex #8 Race vs AcceptQuote on same quote
Active CancelQuoteRequest / CancelAllForUser Cancelled caller is request owner broadcast QuoteRequestRemoved; log; remove from requests DashMap per-entry mutex #7 Race vs AcceptQuote; gateway-crash leak (§12 row 6)
Active expiration tick Expired now_ns ≥ expires_at_ns broadcast QuoteRequestRemoved; log; remove per-entry mutex via expiration task Heap wakeup race (C8); wall-clock regression (C9)
Active AcceptQuote success Settling Settled { trade_id } quote present, unexpired; acceptor + counterparty risk-checked mutex flip Settling drop mutex EP3 RPC retake mutex flip Settled broadcast Filled both parties log trade_booked per-entry mutex released across EP3 RPC #1, #2 EP3 dedup contract resolved (no dedup; see §10, §12.1); commit-before-respond (Q9 iii, open); broadcast under mutex (C10)
Active AcceptQuote definitive EP3 reject (4xx) Settling Active as above; EP3 returns INVALID_ARGUMENT / FAILED_PRECONDITION / NOT_FOUND / ALREADY_EXISTS / etc. mutex flip Settling drop EP3 4xx retake flip back to Active log trade_book_failed return typed reject per-entry mutex #6 Quote still alive; client may retry. Concurrent Cancel rejected with AlreadySettling is now stale (C11)
Active AcceptQuote ambiguous (5xx / DEADLINE_EXCEEDED / mid-call crash) Settling needs_manual_reconciliation as above; EP3 unreachable or response lost mutex flip Settling drop EP3 ambiguous retake (if rome alive) flip to manual-recon log trade_book_ambiguous return error per-entry mutex #6a EP3 may have committed; recovery path needs persisted cross_id (C1); operator action
Settling AcceptQuote (any quote) (rejected) RomeIpcError::AlreadySettling per-entry mutex #5 Exactly-one-winner depends on mutex-then-flip ordering
Settling CancelQuote / CancelQuoteRequest (rejected) RomeIpcError::AlreadySettling per-entry mutex Stale cancel from in-flight client
Settling expiration tick (no-op) drop heap entry silently Per §8: only acts "if still in matching state" Settling is not
Settled any (rejected) terminal Late retries from disconnected clients
Cancelled / Expired any (rejected) RequestNotFound / Expired
needs_manual_reconciliation operator resolves to Settled Settled { trade_id } reconciliation confirmed prior EP3 commit broadcast Filled; log trade_recovered per-entry mutex Pending C1 design
needs_manual_reconciliation operator resolves as not-committed Active reconciliation confirmed no EP3 commit log trade_recovery_rolled_back per-entry mutex Pending C1 design
needs_manual_reconciliation late successful EP3 response (race-resolve) Settled { trade_id } response arrives after hard timeout carrying trade_id broadcast Filled; log trade_recovered_late_response per-entry mutex See §12 row 3; matches drop-copy self-healing intent
needs_manual_reconciliation drop-copy Execution match (push-model self-healing) Settled { trade_id } execution.order.cross_id == persisted cross_id AND block_trade_indicator = true broadcast Filled; log trade_recovered_drop_copy per-entry mutex Pending Q9 follow-up on cross_id propagation reaching drop-copy Executions for both legs

7.4.2 AcceptQuote critical-path ordering

Step numbers map to §2.2. This is the only flow that crosses a process boundary and the only one whose correctness depends on a specific lock-release ordering.

Step Actor Operation Lock state Persistence Can fail with Failure handling
17 Client WS frame AcceptQuote WS drop No client-side idempotency (C7) retry-after-reconnect can double-call rome
18 Gateway Risk-check acceptor + counterparty gateway local margin cache (read) margin insufficient; counterparty on other shard (C5) Reject locally; rome never called
19 Gateway RomeIpcFn::call(AcceptQuote) remoc/IPC drop IPC retry; client may not learn outcome
20 rome quotes_index lookup request_id DashMap shard lock (brief) not found QuoteNotFound
20a rome Acquire per-entry mutex mutex acquired contention only Wait
20b rome Verify Active; quote present & unexpired mutex held state moved on AlreadySettling / Expired
21 rome Flip Settling { accepted } mutex held (none today C1)
21a rome Persist cross_id + Settling marker (C1; required for reconcile path) mutex held TBD needs design persistence failure Bail before EP3 call
21b rome Drop mutex released Concurrent Accept/Cancel now see Settling and reject
22 rome EP3 InsertTwoSidedBlockTrade (tonic) no rome locks EP3 commits before responding (Q9 iii, open) 4xx; 5xx; DEADLINE_EXCEEDED; conn drop; rome process crash Branch (§7.4.1); never auto-retry on ambiguous (§10, §12.1)
23 EP3 rome Returns trade_id (success) EP3 has committed response lost on wire Indistinguishable from 5xx at this layer vendor confirmed no dedup, so latch (§12.1 Layer 4)
24 rome Reacquire mutex mutex acquired rome crashed before this enters recovery Recovery reads persisted cross_id, runs SearchOrders(cross_id=…)
24a rome Flip Settled { trade_id } mutex held persist? (C1)
24b rome Remove from requests, quotes_index mutex held + DashMap shard
25 rome Broadcast Filled on both user_streams mutex released before send (C10 clone-and-release) mpsc slow consumer; channel full C10 decided: clone-and-release. Clone the Filled event under the mutex, drop the mutex, then send on each per-user mpsc. Matches the order-gateway pattern. try_send-and-drop was considered but rejected: dropping a Filled has compliance implications (trader doesn't learn their trade booked), unlike dropping a transient QuoteReceived. Slow per-user consumers backpressure their own mpsc but no longer block the per-entry mutex or stall concurrent Cancel/SubmitQuote on the same request
25a rome Send trade_booked row to log mpsc mutex held bounded mpsc CH log queue full Drop oldest (C6 compliance-relevant for trade outcomes)
26 rome gateway AcceptQuoteAck { trade_id } gateway disconnected rome state is committed; client recovery needs a query API to learn trade_id post-reconnect (C7)
27 gateway client WS AcceptQuoteResponse + RfqEvent::Filled client disconnected Committed in rome and EP3; client has no recovery surface today
28 EP3 gateway(s) Existing async fill stream position_cache update timing relative to step 24 Settled may precede position_cache update for either side (C12) next risk check may see stale position

7.4.3 Cross-cutting considerations

ID Concern Affected transitions What's load-bearing Tracking
C1 Durable state for Settling success path step 21a24; recovery after rome crash; ambiguous-outcome path Wired via Redis (durable_state module + tasks::redis_writer). On restart, Settling snapshots are dropped from the active set and surfaced for manual reconciliation. Open sub-item: persisting cross_id on the Settling snapshot before the first EP3 call only useful once Q9 follow-up confirms cross_id propagates to order records (enables SearchOrders reconcile); otherwise the operator path is sufficient Wired (recovery); cross_id persistence depends on Q9 follow-up
C2 EP3 idempotency contract every Active → Settling → * path dedup key, retention, duplicate status code, commit ordering Q9 resolved 2026-05-26: EP3 does not dedup; latch-and-page is the V1 contract (§10, §12.1)
C3 Risk-check attestation trust every transition starting from a gateway call risk_checked_at_ns is unsigned; "stale" undefined; gateway can lie Open; not yet a §16 Q
C4 Multi-gateway counterparty re-check step 18 of AcceptQuote "rome refuses" if counterparty not reachable on caller's gateway UX cliff at scale Open; partially under A-3295
C5 Multi-gateway directed-event fanout every Filled / QuoteReceived emission spec assumes one mpsc per user; users have multiple connections potentially across gateway processes Open; not yet a §16 Q
C6 Audit log durability step 25a across all outcomes drop-oldest policy can lose trade_booked / trade_book_failed / trade_book_ambiguous Open; not yet a §16 Q
C7 Gatewayrome client retry / AcceptQuote idempotency steps 1719 of any flow; step 26 reconnect SubmitQuoteRequest/SubmitQuote have optional client_request_id/client_quote_id; AcceptQuote has none. No reconnect-recovery API Open; not yet a §16 Q
C8 Expiration heap wakeup expiration row of §7.4.1 new shorter-deadline pushes don't wake the timer task Notify or select! needed Open; not yet a §16 Q
C9 Wall-clock regression ID generation; expiration ticks now_ns()-seeded counter in §11; expiration comparison Open; not yet a §16 Q
C10 Broadcast-under-mutex step 25 a stuck per-user mpsc blocks the entry mutex; cascading stalls under one slow client Decided clone-and-release (see §7.4.2 step 25 note). try_send-and-drop rejected because dropping a Filled event has compliance implications
C11 Stale Cancel after Settling → Active rollback row 7 of §7.4.1 a Cancel rejected with AlreadySettling during the Settling window is stale once rome rolls back; client gave up but the request is alive again Open; UX nice-to-have
C12 TCR / position_cache lag vs Settled step 24a step 28 rome's Settled precedes position_cache update for either party; next risk check may see stale position Regression-guard test added (§13.1 #16); behavior remains "small but unbounded" until/unless the drop-copy hop is replaced with an in-process push
C13 Node-ID uniqueness ID generation §11 "configured per rome instance" with no enforcement two misconfigured instances collide silently Open; not yet a §16 Q

7.4.4 Notes on the mapping


8. Expiration

One tokio::spawn per rome process. State is a BinaryHeap<Reverse<(deadline_ns, key)>> behind a parking_lot::Mutex. Loop: peek, sleep until the head deadline, pop, look up the entry, take its mutex, if still in the matching state transition to Expired (or remove the quote), broadcast, log. If the entry has already moved on (cancelled/settled), drop the heap entry silently. Heap pushes are O(log n) and contended only on insert; the broadcast handles fanout.

This is the same pattern trade-engine uses for GTD orders.


9. ClickHouse logging

New table:

CREATE TABLE rfq_log (
    timestamp_ns      UInt64,
    event_type        LowCardinality(String),
    user_id           String,
    request_id        UInt128,
    quote_id          Nullable(UInt128),
    symbol            LowCardinality(String),
    quantity          Decimal128(18),
    bid               Nullable(Decimal128(18)),
    ask               Nullable(Decimal128(18)),
    accepted_side     Nullable(String),
    expiration_ns     Nullable(UInt64),
    trade_id          Nullable(String),
    reject_reason     Nullable(String),
    -- landed in #1845
    target_makers     Array(String),
    disclose_identity Bool
) ENGINE = MergeTree
ORDER BY (timestamp_ns, request_id);

event_type values: request_submitted, quote_submitted, quote_accepted, request_cancelled, quote_cancelled, request_expired, quote_expired, trade_booked, trade_book_failed.

Migration file added to sdk-internal/src/clickhouse/migrations/. ChRfqLog row struct added next to ChHistoricalOrder in sdk-internal/src/clickhouse/schema.rs.

Writer task: bounded mpsc::Sender<RfqLogRow>, capacity 4096. The task batches up to 1000 rows or 50ms (whichever first) and issues INSERT INTO rfq_log SETTINGS async_insert=1, wait_for_async_insert=0 FORMAT NATIVE, exactly as order-gateway/src/lib.rs:65–68. Backpressure policy: if the channel ever fills, drop the oldest queued log row and emit a rome.log_drops Prometheus counter we never block a hot path on logging.

9.1 Anonymity and the audit log

disclose_identity = false is a presentation flag, not a privacy flag. ClickHouse rows always carry the real user_id for compliance. Anonymity stripping happens only at RfqEvent emission time on outgoing WS frames.

9.2 Trade table marker

#1861 added trade_condition = rfq on the existing Trade row so RFQ-sourced fills are distinguishable in the trade history without joining against rfq_log. The downstream marketdata-publisher propagation to the public tape is still pending under A-3294.


10. EP3 booking

AdminAPI.InsertTwoSidedBlockTrade is the only EP3 RPC rome calls and the only place rome uses tonic EP3 is the third-party Connamara service, not a Rust↔︎Rust hop. Pool one Ep3Client per rome process, share via Arc; tonic multiplexes over a single HTTP/2 connection. Configure the same connect/request/keepalive timeouts as order-gateway.

Idempotency. Resolved (2026-05-26 Connamara). EP3 does not deduplicate InsertTwoSidedBlockTrade on cross_id or any other field. Identical retry produces two block trades with distinct EP3-generated trade IDs and distinct order IDs. The idempotency_key pattern available on AdjustAccountBalanceRequest (admin_api.proto:1680) is not exposed on this RPC.

The implication is binary. Any romeEP3 retry on an ambiguous outcome 5xx, DEADLINE_EXCEEDED, mid-call rome crash, lost response guarantees a double-book. There is no client-side trick (fresh ULID cross_id, retry tokens, etc.) that recovers safety, because EP3 ignores those fields for dedup purposes.

Contract:

Identifier constraints (unchanged): trade_id and cross_id MUST NOT derive from request_id the public RFQ stream exposes request_id, and cross_id is queryable via SearchOrders (admin_api.proto:694), so a deterministic mapping would let firehose subscribers predict trades. Both should be fresh ULIDs; cross_id should be persisted on the Settling snapshot before the first EP3 call so a retry after rome crash reuses it (enables future SearchOrders-based reconcile if vendor follow-up confirms cross_id propagates to order records).

See §12.1 for the full layered retry analysis (when can a retry happen across the client gateway rome EP3 stack, what protects against double-book at each layer).

Open follow-ups (do not block V1):

ep3-mock previously stubbed InsertTwoSidedBlockTrade as unimplemented (ep3-mock/src/admin_service.rs:1540). #1844 implements enough of the mock to return a synthetic trade_id and emit a fill event on the EP3 fill stream so end-to-end tests work.

10.1 Multi-leg block format (forward-looking)

When A-3297 F1 lands, rome will submit all legs in one EP3 call so booking is atomic. Connamara's InsertTwoSidedBlockTrade accepts the multi-leg form (block trades on Deribit-style products are routinely multi-leg); the call site needs to be adapted but the wire shape stays compatible.


11. ID generation

RequestId and QuoteId are u128 laid out as (node_id: u32) << 96 | (kind: u32) << 64 | (counter: u64). The node id is configured per rome instance, kind is 0=request, 1=quote, counter is a per-process AtomicU64 started from now_ns() to give monotonic ordering even across restarts as long as wall clock doesn't go backwards.

Rationale. u128 is cheap; embedding the node id makes future sharding trivial; embedding the kind makes log forensics easier without a separate table lookup. Wire format on the WS is the standard Uuid-shaped hex string for human readability.

11.1 cross_id

cross_id is a client-supplied string on EP3's TwoSidedBlockTrade (api.proto:100). Today no other AX service submits crosses to EP3, so rome is the sole minting point. Treat it as a correlation key, not an idempotency or dedup token.

Attributes

What it is not

Operational role


12. Failure modes

Scenario Behavior Status
Gateway rome connection drop RFn calls return RFnError; gateway listener tasks reconnect TCP with backoff, re-acquire RomeIpcFn, re-subscribe (atomic snapshot + new receiver). WS subscribers get a fresh snapshot before resume. Wired in #1844; explicit reconnect tests under A-3260
rome restart All in-memory state lost by design for v1. ClickHouse retains history. v2 (A-3297 F6) may persist active state to Redis or snapshot. By design
EP3 unavailable during AcceptQuote Definitive reject (4xx): transition back to Active, gateway returns typed reject, client may retry. Ambiguous outcome (5xx, DEADLINE_EXCEEDED, mid-call crash, lost response): latch to needs_manual_reconciliation per durable_state.rs; no auto-retry Connamara confirmed EP3 does not dedup, so any retry guarantees a double-book (see §10, §12.1). Hard timeout (5s placeholder; should be sized against EP3's observed InsertTwoSidedBlockTrade commit-time distribution under load fold into the same vendor follow-up as Q9) bounds the ambiguous window. Late EP3 response after the hard timeout fires: if a successful response carrying trade_id arrives at t = N+δ (slow Mongo, queue depth, partition reshuffle), rome race-resolves auto-transition needs_manual_reconciliation → Settled { trade_id }, broadcast Filled, log trade_recovered_late_response. The alternative (ignore the late response, force operator reconciliation) is also safe but adds operator burden; race-resolve matches the drop-copy self-healing intent below and is the chosen default. Late 4xx responses are dropped: a definitive reject after we've latched to manual reconciliation cannot retroactively un-commit a trade EP3 may have already booked on a separate path, so the operator path remains authoritative. Wired policy confirmed by vendor answer to Q9
Client disconnect mid-flow Cancel all of their open quote requests + open quotes (mirrors cancel_session_orders). Rome::CancelAllForUser is wired at ws_service.rs:116. Happy path wired; hardening under A-3295
Counterparty disconnect between SubmitQuote and AcceptQuote Open question current proposal: cancel the quote when the responder's WS disconnects, mirroring order behavior. Will be documented in the SDK once decided. Needs discussion (A-3295 item 5)
Slow public RFQ subscriber broadcast::Lagged gateway listener re-snapshots and resumes; client never sees the lag. Wired
Gateway crash without sending CancelAllForUser Today the user's RFQs/quotes leak in rome until their natural deadline. Server-side TTL/heartbeat from rome gateway needed. Tracked under A-3295 item 4

12.1 Retry safety by layer

Following the §10 confirmation that EP3 does not deduplicate InsertTwoSidedBlockTrade, this is the exhaustive layered analysis of where a "retry" can fire in the end-to-end booking flow, what protects against double-book at each layer, and which UX gaps remain.

   User             WS Client          order_gateway          rome             EP3
    │                  │                    │                  │                │
  Layer 1           Layer 2              Layer 3            Layer 4         (server)
 user retry      client auto-retry   gateway→rome retry   rome→EP3 retry        │
                                                                                 │
                                            Layer 5: rome process crash + restart retry
Layer When it fires Double-book risk? Protection today UX gap
1. User clicks Accept twice UI debounce miss; impatient user; double-fire bug No Rome's state machine: second click sees Settling/Settled and returns AlreadySettling / QuoteNotFound None rejection is clear
2. Client auto-retry on WS reconnect WS drops mid-Accept; GUI's exponential-backoff reconnect; client resubmits No After successful Accept, rome's remove_request_indexes clears the quote from quotes_index, so a retried Accept returns QuoteNotFound; if first Accept is still mid-process, second sees Settling and returns AlreadySettling Yes client cannot distinguish "trade booked, retry too late" from "quote expired" / "quote cancelled" / "wrong gateway"; all return QuoteNotFound. Recovery requires a client-set idempotency token (e.g. client_accept_id) plus a GetAcceptStatus(cid=…) query, analogous to clord_id + GetOrderStatus on the order path
3. Gateway rome IPC retry remoc connection drops; gateway's RFn returns an error; gateway re-issues No Symmetric to Layer 2: state machine cleanup prevents a second Accept from reaching EP3 Same as Layer 2 symmetric
4. Rome EP3 retry Tonic call returns DEADLINE_EXCEEDED / Unavailable / connection drop; or response lost between EP3 commit and rome's response handler Yes this is the load-bearing case Latch-and-page only. Definitive 4xx roll back to Active. Ambiguous latch Settling snapshot, log trade_book_ambiguous, return error, no auto-retry. Wired via the definitive classifier at exchange_reject.rs:237 plus the needs_manual_reconciliation recovery path. M4 extends the path to in-process ambiguous outcomes Trader sees an error and doesn't know if the trade booked. Operator reconciles by checking EP3 (SearchOrders by cross_id if propagation confirmed; otherwise manual EP3 admin lookup) and updating rome state
5. Rome restart with Settling state on disk Rome process crashes (panic, OOM, deploy) between EP3 call dispatch and Settled transition No (because no auto-retry) durable_state::load_requests drops Settling snapshots from the active set on startup and warns; the latched state requires operator action just like Layer 4 ambiguous Same operator-reconciliation cost as Layer 4

Two distinct safety properties:

Anti-patterns explicitly ruled out by this analysis:


13. Testing

Integration tests under rs/rome/tests/ and rs/order-gateway/tests/. Use ax_test_utils containers (Postgres, ClickHouse, Redis) and Ep3Mock. Each test spins order_gateway + rome + mock EP3 and drives them via real WS clients.

13.1 Required scenarios

  1. Happy-path bid: requester asks bids, two responders quote, requester accepts, EP3 book succeeds, both sides see Filled, ClickHouse has the expected sequence of rows.
  2. Happy-path two-sided: requester asks bids and asks, accepts the bid; verify ask side is correctly discarded.
  3. Quote arrives after request expires reject.
  4. Acceptance arrives after quote expires reject.
  5. Concurrent AcceptQuote on the same request exactly one wins, others get request_settling reject; no double-booking in EP3.
  6. EP3 returns definitive failure (4xx, e.g. INVALID_ARGUMENT) on InsertTwoSidedBlockTrade request returns to Active, both sides notified, ClickHouse logs trade_book_failed, client may retry. 6a. EP3 returns ambiguous failure (5xx, DEADLINE_EXCEEDED, or rome crashes mid-call) request latches to needs_manual_reconciliation, no auto-retry, ClickHouse logs trade_book_ambiguous, operator path is exercised. Asserts the §10 policy confirmed by Connamara. 6b. Retry-after-transport-failure latch simulate EP3 committing but the response being dropped (e.g. kill the EP3 mock connection mid-call); assert rome does NOT auto-retry, transitions to needs_manual_reconciliation, emits trade_book_ambiguous, and the operator-reconciliation API surfaces the stuck request. Confirms the no-auto-retry policy. 6c. Client-retry-after-Settled returns QuoteNotFound (asserts Layer 2 / Layer 3 protection from §12.1) drive Accept to success, then re-issue the same Accept (simulating a WS-reconnect retry); assert QuoteNotFound, no second EP3 call, no duplicate Filled event.
  7. Requester WS disconnect all their requests cancelled; subscribers see QuoteRequestRemoved.
  8. Responder WS disconnect all their quotes cancelled.
  9. rome restart while gateway up gateway rebuilds streams cleanly.
  10. Gateway restart while rome up reconnects, snapshot then live.
  11. Margin moves between SubmitQuote and AcceptQuote so the responder can no longer honor the fill AcceptQuote rejected at the gateway, rome never sees it.
  12. Snapshot tests (insta, inline) for every RfqResponse and RfqEvent serialization.
  13. Side-enforcement matrix 6 mismatched combos covering quote-includes-disallowed-side, accept-on-missing-side, two-sided-on-one-sided, empty-quote (added under A-3294 item 4).
  14. Targeted RFQ visibility: a non-targeted maker subscribed to the public stream must not see the QuoteRequestPosted for a targeted RFQ.
  15. Anonymous RFQ: outgoing QuoteRequestPosted must not contain the requester's user_id; ClickHouse row must.
  16. position_cache lag vs Settled (C12) drive an Accept to Settled, immediately (within the same test tick) submit a tight follow-on order on the opposite side from the just-booked block on both the maker and the taker; assert that the margin check used by order-gateway already reflects the new position from the drop-copy Execution, not the pre-Fill position. Exercises the timing gap between rome's Settled transition (step 24a) and position_cache update (step 28), which today is small but unbounded the test pins it as a regression guard so we notice if the drop-copy hop ever grows.

Per the project rule on connection-state testing, scenarios 710 must exist and pass before rollout.

13.2 Tracked under


14. Observability

Prometheus metrics from rome:

Structured logs via the existing log crate convention (lowercase messages: "failed to book trade", not "Failed to ...").

14.1 Tracked under


15. GUI

Tracked under A-3211; ticket carries the full surface spec and milestone list. Summary here.

16.1 Requester surface "Create Strategy" modal

Reference UX is the "Create Strategy" mock: a multi-leg options RFQ builder on the left with a maker-targeting sidebar on the right.

16.2 Maker selection sidebar (required)

Search input + Select all / Favorites segmented control + per-row star (favorite toggle) + checkbox (include in RFQ). Submit blocked client-side if zero makers selected. Favorite set is per-user, persisted server-side.

16.3 Responder surface

Live inbox of incoming RFQs, quote composer (bid/ask on strategy net premium, not per leg), my-quotes view, fills view filtered to RFQ-sourced fills.

16.4 Component reuse policy

Per CLAUDE.md Code Style (GUI): check @architect-xyz/ui-components and @architect/ui before writing any new utility, hook, or component. Net-new components likely required: Greeks readout strip, leg-builder table row, maker picker sidebar.

16.5 Milestones (G1G7)

# Title Depends
G1 Static modal shell + asset selector + template buttons
G2 Editable legs table + template pre-population + client-side Greeks G1
G3 Maker sidebar against stubbed list G1
G4 Wire to order_gateway WS (single-leg via A-3295 shim; true multi-leg blocked on A-3297 F1) G2, G3, A-3295
G5 Confirm-then-send modal for SubmitQuoteRequest + AcceptQuote G4
G6 Hedge leg UI, anonymity toggle, favorites persisted server-side G4
G7 Responder surface G4

Mapped implementation tickets:


16. Open questions register

Questions still needing a decision. Each carries a default proposal so we can ship without it but a sign-off would replace the placeholder.

# Question Default Owner Tracking
Q1 Counterparty disconnect between SubmitQuote and AcceptQuote cancel the quote, or leave it live? Cancel the quote when responder's WS drops, mirroring order behavior tin A-3295 item 5
Q2 Two-sided requests partial acceptance (accept the bid, leave the ask offer live)? No accepting closes the request tin this doc
Q3 Quote replacement let responder amend a live quote, or cancel + resubmit? Cancel + resubmit for v1; AmendQuote is v2 (A-3297 F4) tin A-3297 F4
Q4 Multi-rome sharding strategy when we outgrow a single instance Defer until needed; node-id in RequestId already supports it tin v2
Q5 How does this interact with risk-monitor's alerting? Probably emits on trade_booked events the same way orders do, but needs a pass tin this doc
Q6 Do we need a maker eligibility / application-gating program (Bybit-style IM contact) for v1? No "open to approved participants" commercial this doc
Q7 Fee schedule for RFQ fills Same as standard fees for v1; rebates are v2 (A-3297 F7) commercial A-3297 F7
Q8 "Other" template in the strategy picker exact preset list Hide until product specifies product A-3211
Q9 EP3 dedup contract for InsertTwoSidedBlockTrade. Resolved (2026-05-26 Connamara): EP3 does not dedup on cross_id or any other field; identical retry produces two trades with distinct EP3-generated trade IDs and distinct order IDs. The idempotency_key field is not available on this RPC. Implications and current policy documented in §10 and the §12.1 retry-safety analysis. Sub-questions (iii) and (iv) commit-before-respond ordering and the definitive-vs-ambiguous status-code split remain open with Connamara as follow-ups; they affect operator-reconciliation tooling but not the V1 latch-and-page contract. Pursued follow-ups: vendor RFE for idempotency_key (long-term clean fix); vendor question on whether cross_id propagates to resulting order records (would enable SearchOrders-based reconcile and reduce operator burden). Latch-and-page wired (§10); follow-ups tracked vendor-side tin this doc; vendor RFE pending

17. Feature comparison AX vs Bybit RFQ vs Deribit Block RFQ

# Feature AX (v1 today) Bybit RFQ Deribit Block RFQ
1 Instruments in scope Single instrument, perp-style Spot + perp + dated future + option Option + perp + dated future
2 Multi-leg / strategies single-leg only (v2: F1) multi-leg up to 20 legs, custom ratios
3 Hedge leg (delta hedge attached) (v2: F2) partial (via multi-leg ticket) explicit hedge leg, atomic
4 Quote sides Bid / Ask / Bid+Ask one-way / two-way one-way / two-way
5 Targeted vs public RFQ target_makers all-makers or subset
6 Anonymity protocol-level disclose_identity (#1845) Anonymous flag at RFQ creation blind auction (makers see only own quotes)
7 Quote aggregation across makers one quote fills full size (v2: F3) partial (multi-maker quotes) aggregated fills + per-quote AON opt-out
8 Quote replacement / improvement cancel + resubmit (v2: F4)
9 Partial fills full quantity only (v2 via F3) via aggregation
10 Per-instrument minimum block size today; via A-3294 per-currency block minima
11 Quote / RFQ expiration caller sets, server enforces
12 Maker eligibility / gating "open to approved participants" application-gated (IM contact) maker program
13 Maker discovery API GET /rfq/makers via /rfq/config
14 Maker quality / response-rate scoring (v2: F5) partial
15 Accept off-protocol OTC quote (v2: F8) accept-other-quote
16 Public trade tape print after fill today; via A-3294 stripped of party info with block condition
17 Fee rebate / RFQ-specific fee schedule standard fees (v2: F7) 50% maker rebate on preset combos maker-program rebates
18 Per-user rate limit today; via A-3294
19 History / analytics for participants snapshot + ClickHouse rfq_log
20 Reject reasons surfaced to client RfqReject { reason, message }
21 Cancel-on-disconnect for live RFQs wired (happy path); hardening via A-3295
22 Restart recovery from log in-memory only (v2: F6) n/a n/a
23 EP3 block-trade booking pathway InsertTwoSidedBlockTrade n/a n/a
24 WS streaming of events topics rfq.open.*
25 OpenAPI / typed REST client utoipa + OrderGatewayRestClient::rfq_*

Appendix A Glossary

Term Meaning
RFQ Request for Quote a participant asks the market for a price, makers respond, requester accepts one
rome Request for Orders Matching Engine the new internal Rust service
EP3 The Connamara matching engine the third-party service ROME books trades into
Maker A participant responding to RFQs with quotes
Taker / Requester The participant submitting the RFQ and accepting a quote
Block trade A privately-negotiated trade booked outside the lit book; the EP3 primitive ROME uses
AON All-or-none a quote that must fill in full or not at all
remoc Remote channels crate; the Rust↔︎Rust IPC primitive used between gateway and rome
RFn remoc Remote Function the typed RPC primitive built on remoc channels