Date: 2026-06-09
Status: Draft. The transactions reform (§Rollout stage 1–2) is being implemented; the rest of this document is direction we may adopt incrementally, not a commitment.
In
early
June
we
shipped
a
hard
7-day
cap
on
all
history
endpoints (MAX_HISTORICAL_QUERY_WINDOW_NS,
rs/sdk/src/protocol/time_range.rs):
both time
bounds
are
required,
and
any
range
wider
than
7
days
is
rejected
with
a 400.
This
was
a
deliberate
hotfix
lineage:
historical_orders
(~272M
rows
per
query)
and
was
taking prod
ClickHouse
to
its
knees.
Fix:
bound
the
window
to
7
days,
and
add
a (timestamp_ns, user_id, order_id)
projection
so
time-range
queries
prune.
The cap stopped the bleeding, but on 2026-06-09 it produced its first customer-facing incident: a Flowdesk customer's withdrawals "disappeared" from the Admin UI because they were older than the UI's largest selectable window. "Show me all withdrawals for this customer" is now impossible to express against the API — by design.
Meanwhile the GUI has independently grown three client-side workarounds:
useWindowedCursorPagination
/
stepWindowedQuery (gui/packages/app/util/historyWindow.ts)
walks
7-day
windows
backwards, skipping
empty
windows,
capped
at
90
days
lookback
and
16
steps.
useTransactions
(gui/packages/admin/src/hooks/useTransactions.ts) loops
a
cursor
up
to
20
pages
within
one
normalized
window.
Every external API consumer (e.g. Flowdesk's integration, ops curl scripts) must rebuild the same loop. When a client-side loop must be reimplemented by every caller of an API, the loop belongs on the server.
The 7-day cap bounds the wrong variable. Query cost in ClickHouse is not driven by the width of the requested time range; it is driven by how many granules the query reads, which is a function of how well the query's predicates align with the table's primary key, partitioning, and skip indexes. Concretely:
transaction_types=withdrawal
for
one
user)
over
a single
hot
7-day
window
can
still
read
every
granule
in
the
window,
run FINAL
merge-on-read
over
all
of
it,
plus
a
separate
COUNT(*) ... FINAL for
total_count
—
on
every
page.
Expensive,
but
currently
allowed.
The cap therefore simultaneously over- and under-protects, while exporting a window-walking loop to every client and silently hiding data in UIs that don't walk.
search_after
+
point-in-time: keyset
tuple
+
tiebreaker,
snapshot
anchoring
so
concurrent
writes
don't shift
pages.
fromId
keyset
escape
hatch.
The consensus mechanism for "unbounded history without unbounded scans" is: cursor = bounded per-page work, server-side slicing for non-indexed predicates, scan budgets as the backstop.
Four
commitments
in
the
current
public
contract (docs/internal/overview/pagination.mdx)
should
be
walked
back:
{timestamp_ns}:{id})
is
documented
and
transparent;
clients
will couple
to
it,
foreclosing
evolution
(window-walking
progress,
snapshot anchoring).
The
public
contract
becomes:
pass
it
back
unchanged.
The encoding
lives
in
internal
docs
only.
complete: bool
and
a searched_until_ns
watermark.
An
empty
page
with
a
next_cursor
is valid
—
it
means
"scanned
my
budget,
found
nothing
yet,
resume
here." Termination
is
absence
of
next_cursor,
never
an
empty
page.
(The current
docs
teach
the
opposite
invariant;
this
is
a
contract
change
to state
before
third
parties
couple
to
the
wording.)
total_count
on
cursor
pages
is semantically
ambiguous
(count
of
which
scope?)
and
costs
a COUNT(*) ... FINAL
per
page.
Omit
it
for
wide/unbounded
ranges;
where provided,
it
means
"at
least
N."
Align the index with the dominant query. History queries are overwhelmingly "one entity, ordered by time."
historical_orders
already
has
the
proj_ts_user
projection
(A-3520).
transactions
is
ReplacingMergeTree,
PRIMARY KEY (timestamp, account_id, sequence_number),
queried
with
FINAL.
Projections
are
not
usable
here: ClickHouse
does
not
apply
projections
to
SELECT ... FINAL
queries
(and ALTER TABLE ... ADD PROJECTION
on
ReplacingMergeTree
requires
opting
into deduplicate_merge_projection_mode).
The
implementable
alternative
is
a bloom_filter
skip
index
on
account_id
(and
user_id
if
we
query
it directly):
granule
pruning
composes
fine
with
FINAL.
Bloom
filters
skip granules
with
zero
matches,
so
they
help
sparse/cold
accounts
most
and degrade
for
accounts
active
in
every
granule
—
acceptable,
since
hot accounts'
rows
cluster
near
the
scan
frontier
anyway.
(ts < $c) OR (ts = $c AND id < $i) relies
on
the
optimizer's
OR-analysis
for
PK
pruning.
Add
the
redundant range
conjunct
—
ts <= $c AND (...)
—
so
pruning
is
guaranteed,
not inferred.
(The
server-side
slice
bounds
below
add
ts >= slice_start AND ts < slice_end
conjuncts
naturally.)
max_execution_time
(with timeout_overflow_mode
left
at
throw)
bounds
any
single
slice
query. max_rows_to_read
/
read_overflow_mode='break'
is
available
but
returns silently-truncated
results
with
no
resume
position,
and
partial
results
can poison
the
query
cache
(ClickHouse
#67476)
—
prefer
slice-granular budgets
(the
cursor
resumes
at
an
exact
window
boundary)
over
row-granular breaks.
FINAL
+
per-page
COUNT(*):
set do_not_merge_across_partitions_select_final=1
(partitions
are
monthly; merges
need
not
cross
them),
and
stop
issuing
the
count
query
for wide/unbounded
ranges
per
model
item
4.
Accept
unbounded
ranges;
walk
windows
server-side.
The
handler
chunks
the requested
(or
open-ended)
range
into
slices
—
MAX_HISTORICAL_QUERY_WINDOW_NS becomes
the
slice
size
instead
of
a
rejection
threshold
—
and
queries slice-by-slice
in
sort
order
until
the
page
fills
or
the
per-request
slice budget
is
exhausted:
resolve range: start = requested or 0; end = requested or now (pinned)
position = cursor position, else range edge per sort direction
loop (≤ MAX_SLICES_PER_REQUEST, e.g. 13 ≈ 90 days of scanning):
query slice [max(start, pos − slice), pos) with remaining limit + 1
accumulate rows; advance pos to slice boundary
stop when limit + 1 rows collected or range exhausted
respond:
next_cursor = row position (page full) | slice boundary (budget hit)
| absent (range exhausted)
complete = range exhausted within this request chain
searched_until_ns = pos
The
GUI's
stepWindowedQuery
is
a
working
prototype
of
exactly
this algorithm,
written
in
the
wrong
tier;
the
reform
ports
it
down
the
stack.
Cursor
v2:
a
versioned,
opaque
token
(base64)
encoding
{position: row(ts, id) | boundary(ts), as_of_end_ns, …}.
Servers
accept
legacy {ts}:{id}
cursors
indefinitely
(they
decode
to
a
row
position
with
no pinned
end).
Pinning
as_of_end_ns
at
first
request
fixes
the
live-drift problem
historyWindow.ts
currently
works
around
with
its
'live'
sentinel.
Backstops,
not
gates:
per-request
slice
budget
+
per-slice max_execution_time.
A
whole-ledger
admin
scan
with
no
entity
filter
remains the
one
genuinely
unprunable
query;
option:
require
an
entity,
symbol,
or type
filter
for
open-ended
ranges
on
admin
endpoints.
Per-query
cost protection
is
strictly
stronger
than
the
range
cap
(a
7-day
window
over
a
hot month
is
still
a
heavy
query
today),
which
also
addresses
the admin-vs-trading-traffic
isolation
concern.
historyWindow.ts
windowing,
admin 20-page
loops,
16-step
empty-window
heuristics)
once
the
server
walks. Hooks
become
plain
useInfiniteQuery
with getNextPageParam: (last) => last.next_cursor.
complete: false,
render "Searched
back
to
⟨searched_until⟩
—
Keep
searching"
(explicit
button beats
auto-loop
for
admin
tools).
Show
"1,000+"
rather
than
exact
totals.
Rewrite
docs/internal/overview/pagination.mdx
(and
any
public
derivative) to
state
the
target
model:
opaque
cursor,
optional
range,
completeness signaling,
lower-bound
counts,
empty-page-with-cursor
contract.
The
public version
must
describe
look-ahead
and
budgets
behaviorally,
without
naming
the datastore.
complete/searched_until_ns, validation
relaxed
from
"reject
>7d"
to
"walk",
per-page
COUNT(*) dropped
for
wide
ranges,
bloom_filter(account_id)
skip
index
+ materialization
migration.
historical_orders
already
has
its
projection; trades
is
symbol-led
and
may
want
a
per-account
projection
or
index
of its
own.)
total_count
everywhere
(offset-paginated
admin
tables
keep
it).
read_overflow_mode='break'
row-granular
partial
results
(revisit
if slice-granular
budgets
prove
too
coarse).
complete/searched_until_ns
to
the
WS/SDK
typed
clients in
the
same
release,
or
let
REST
lead?
trades
table:
per-account
access
path
(projection
is
viable
there
—
no FINAL-blocking?
trades
is
ReplacingMergeTree
with
FINAL
too,
so likely
also
bloom-filter
territory).