vortex_torch.indexer.output_func¶

Classes

topK()

Per-request top-k page selector with reserved BOS/EOS pages.

class topK[source]¶

Bases: vOp

Per-request top-k page selector with reserved BOS/EOS pages.

The terminal op of a forward_indexer: it turns per-page scores into the sparse set of pages each request attends to.

Math:: For a request’s per-page scores \(X_p\) (\(p=0,\dots,S-1\)), with reserved prefix \(\mathcal{B}=\{0,\dots,n_{\mathrm{bos}}-1\}\) and suffix \(\mathcal{E}=\{S-n_{\mathrm{eos}},\dots,S-1\}\), the selected page set is

\[\mathcal{S} = \mathcal{B}\,\cup\,\mathcal{E}\,\cup\, \operatorname*{top\text{-}k}_{\,p\,\notin\,\mathcal{B}\cup\mathcal{E}} X_p, \qquad k = \texttt{topk\_val}.\]
__init__:: topK() — no arguments; the budget topk_val and the reserved BOS/EOS counts are read from Context at runtime.
__call__:: op(score, o, ctx=ctx) — score is [S, 1, 1] (one scalar per page; must be RAGGED); the selected page ids are written in place into o. Returns nothing.
Note:: every flow’s forward_indexer must end in this op (or approxTopK).

class approxTopK(tolerate_ratio=0.0)[source]¶

Bases: topK

Approximate topK — faster adaptive 8-bit radix selection.

Math:

The selected set \(\mathcal{S}\) matches topK, but the top-\(k\) search over the non-reserved pages stops at the first radix round whose count \(r\) still owed by the threshold bin satisfies

\[r \;\le\; \texttt{tolerate\_ratio}\cdot k,\]

with any remaining slots filled in arrival order.

__init__:

approxTopK(tolerate_ratio=0.0) — approximation budget in [0, 1]: 0.0 = exact (all radix rounds run), higher = cheaper but looser (typical throughput sweet spot 0.05–0.15).

__call__:

op(score, o, ctx=ctx) — same signature as topK (score [S, 1, 1], RAGGED; writes page ids into o).

Note:

indices are unsorted within each request; downstream consumers must not assume sorted order.

Parameters:

tolerate_ratio (float)

class Union[source]¶

Bases: vOp

Per-row union of two (block_table, seqlens) pairs (trtllm output func).

Math:

For request row \(i\), given two selected block-id sets \(\mathcal{B}_i^{0}\), \(\mathcal{B}_i^{1}\) (from two vortex_torch.indexer.TopK calls), the final sparse set is the deduplicated union with the dense trailing block \(\ell_i\) pinned to the tail:

\[\mathcal{U}_i = \big(\mathcal{B}_i^{0}\cup\mathcal{B}_i^{1}\setminus\{\ell_i\}\big) \;\Vert\; \{\ell_i\},\]

and \(\text{seqlens}_i = u_i\cdot\text{block\_size} + \text{last\_block\_len}_i\), where \(u_i = |\mathcal{U}_i| - 1\). Pinning \(\ell_i\) last lets trtllm read the partial token count from the right slot.

__init__:

Union() — no arguments.

__call__:

op((bt_0, sl_0), (bt_1, sl_1), o, ctx=ctx) — two RAGGED (block_table, seqlens) tuples → writes the union into o (ctx.metadata.sparse_block_tables) and updates ctx.metadata.sparse_seqlens as a side effect.

Note:

trtllm only — asserts under flashinfer. Terminal op of a trtllm forward_indexer (alternative to topK()).