Date: 2026-04-10
admin-cli test ep3-cancel-replace-discovery (rs/admin-cli/src/ep3_cancel_replace_discovery.rs)
is
a
"space
probe"
that
runs 14
sequential
scenarios
against
a
real
EP3
environment
to
characterize cancel-replace
semantics.
It
clears
the
market,
runs
a
specific
action
pattern, and
pretty-prints
execution
timelines
for
a
human
to
inspect.
It
has
served
well as
a
discovery
tool
for
one
narrow
surface
area.
It has three limitations that keep it from being a general exchange test battery:
ax-exchange-sdk
WS/REST
surface
that
customers
actually
hit.
settle_ms
between
operations
is
the
only
sync primitive
—
flaky
under
load,
wasteful
under
idle,
and
hides
timing
bugs.
This
RFC
proposes
a
successor:
an
admin-cli test battery
tool
that
runs
a scenario
catalog
against
our
public
API
surface,
executes
against
ax-demo
(or any
environment)
from
a
dev
laptop
or
CI,
and
produces
structured
pass/fail
plus timeline
artifacts.
It
also
proposes
the
minimum
changes
to
ax-demo
needed
to make
the
battery
safe
to
run
against
live
demo
infrastructure,
and
an
offline coverage-report
pipeline
that
doesn't
perturb
ax-demo
at
all.
sleep
calls,
kills
most
flakes.
New
crate
module
rs/admin-cli/src/test_battery/:
test_battery/
mod.rs — entry point, scenario registry, CLI wiring
env.rs — Env: holds public-SDK clients (maker/taker), admin client,
sandbox config (symbol, price scale, accounts)
recorder.rs — EventRecorder: subscribes to each client's WS stream,
records every event with a monotonic timestamp into a
per-client timeline
scenario.rs — Scenario trait, Outcome struct, assertion helpers
assertions.rs — expect_event_within, expect_order_state, expect_no_event_for,
expect_fill_sequence, wait_for<F>
report.rs — JUnit XML + per-scenario markdown timeline emission
scenarios/
order_lifecycle.rs
cancel_replace.rs
self_trade_prevention.rs
margin_rejects.rs
order_flags.rs
book_priority.rs
reconnect_resume.rs
drop_copy_parity.rs
...
The
Scenario
trait:
#[async_trait]
pub trait Scenario: Send + Sync {
fn name(&self) -> &'static str;
fn surface_tags(&self) -> &[SurfaceTag]; // for coverage registry, below
async fn run(&self, env: &Env) -> Result<Outcome>;
}Env
holds
two
OrderGatewayWsClient
instances
(from
ax-exchange-sdk)
for maker
and
taker,
plus
an
admin
Ep3Client
for
inspection
and
setup.
The existing
rs/admin-cli/src/replace_order_test.rs
already
demonstrates
the public-SDK
WS-client
shape
—
reuse
that.
EventRecorder
spawns
a
task
per
client
that
pulls
WS
events
into
a
timeline. Assertions
take
the
form:
env.maker.place_order(...).await?;
env.wait_for(Client::Maker, |ev| matches!(ev, OrderAcked { .. }),
Duration::from_secs(2)).await?;wait_for
blocks
on
the
recorded
stream
until
a
predicate
matches
or
times out.
There
are
no
sleep(settle_ms)
calls
in
scenarios.
Timeouts
are
per-wait, not
per-scenario,
so
a
flake
gives
a
precise
failure
site.
For
scenarios
that
assert
absence
of
an
event
(e.g.,
"no
fill
should
occur"), expect_no_event_for(duration)
waits
the
full
window
and
asserts
nothing matched.
Each scenario follows:
setup — sandbox ACLs guarantee empty markets, so setup is mostly
price/qty parameterization
run — a sequence of actions interleaved with wait_for / expect_*
teardown — cancel_all as a defensive measure; harness verifies markets
are clean before marking the scenario passed
Outcome
carries
events
(per-client
recorded
timeline),
assertions (pass/fail
list
with
source
locations),
and
timing
(wall
clock
+
per-wait latencies).
The
report
emitter
consumes
Outcome
to
produce
JUnit
XML
for
CI and
a
markdown
timeline
per
scenario
for
humans.
The battery runs against a soft sandbox inside ax-demo: dedicated users, firm, and symbols, isolated by authz rules inside ax-demo itself. Not a separate deployment — the whole point is that the battery exercises the same binaries, config, DB, and gateway real users hit.
Namespace:
firms/BATTERY/
battery-maker-1,
battery-taker-1,
battery-self-1
(same
user
both sides,
for
self-trade
prevention
scenarios),
battery-crossfirm-1
under
a second
firm
for
cross-firm
scenarios.
BATTERY.*
—
one
of
each
product
type
(perp,
dated
future,
option if
applicable).
The
variety
matters:
if
the
sandbox
only
has
a
perp,
the battery
will
never
hit
the
dated-future
branches
and
coverage
will
look artificially
thin
exactly
where
it
matters.
Admin helpers, gated:
force_mark_price(symbol, price)
—
only
accepts
BATTERY.*
symbols,
only callable
by
admin.
Needed
for
margin-reject
scenarios.
force_settlement(symbol)
—
same
gating.
These live in the admin surface but are hard-gated on symbol prefix and caller identity. See safety model below.
The only thing protecting real demo markets from a battery bug is the ACL, so it must be fail-closed and doubly-gated:
BATTERY.*
symbols
(rejected
at
the gateway).
Prevents
humans
from
polluting
battery
state.
BATTERY.*
symbols
(rejected
at
the gateway).
Prevents
a
buggy
scenario
from
scribbling
on
real
demo
markets.
force_mark_price,
etc.)
check
both
that the
caller
is
a
battery
user
and
that
the
target
symbol
is
a
battery symbol.
Either
check
alone
is
a
footgun.
Enforcement lives at the order gateway / risk layer, wherever the existing user→symbol authz check already runs. Confirming where that check lives is a prerequisite to this RFC landing.
Preflight smoke test. Before every battery run, the harness executes two assertions and aborts on failure:
BATTERY.PERP-1
is
rejected.
If either fails, the isolation the battery is relying on doesn't exist, and the run is aborted before any scenario executes. This is the single most important piece of safety machinery in the design.
Even with isolation, two battery runs hitting the same sandbox simultaneously will stomp on each other. Options considered:
Start with (c). Add (a) when the first human wants to run the battery ad-hoc while CI is running. (b) is overkill until the battery is large enough that runs take more than a few minutes.
Coverage
is
a
property
of
(battery, git SHA, sandbox config).
Computed entirely
offline,
without
touching
ax-demo:
/version
(or
equivalent).
git checkout <sha>
and
build
the
stack
with RUSTFLAGS="-C instrument-coverage".
BATTERY.*
instruments,
firms/BATTERY/
users,
risk
params, and
any
other
state
the
scenarios
depend
on.
Without
this,
the
local
stack won't
exercise
the
same
code
paths
and
the
report
will
be
dishonest.
grcov,
emit
HTML
+
lcov.
ax-demo is never instrumented. Live ax-demo stays fast; coverage runs happen in CI against a local throwaway stack at the same SHA.
Two-target model: the same battery binary runs in two modes:
PR-level coverage diff falls out for free: run coverage mode in CI on each PR that touches scenarios, diff against main, post "this PR adds +N lines of coverage" as a PR comment. This is the reporting that actually motivates people to add scenarios.
Separate
from
line
coverage:
a
registry
enumerating
every
public
SDK message
type
and
every
documented
reject
reason,
with
a
map
from
surface-tag to
scenarios
that
exercise
it.
Each
scenario
declares
its
surface_tags(). The
registry
emits
a
report:
"of
N
reject
reasons,
the
battery
exercises
M."
This is cheap to build (a day), immediately useful, and answers a question line coverage can't: "what holes does our battery have at the API level?" Ship this first; line coverage comes second.
/version
endpoint.
firms/BATTERY/,
BATTERY.*
symbols,
ACL rules)
in
ax-demo.
Ship
the
preflight
smoke
test
as
an
independent
binary first,
so
we
can
verify
isolation
works
before
any
scenarios
run.
test_battery/env.rs,
recorder.rs, scenario.rs,
assertions.rs)
with
one
or
two
reference
scenarios (order
lifecycle,
cancel-replace)
ported
from
the
discovery
tool
to
prove the
event-driven
pattern.