vortex_torch.indexer.output_func¶
Classes
|
Per-request top-k page selector with reserved BOS/EOS pages. |
- class topK[source]¶
Bases:
vOpPer-request top-k page selector with reserved BOS/EOS pages.
The terminal op of a
forward_indexer: it turns per-page scores into the sparse set of pages each request attends to.- Math:
For a request’s per-page scores \(X_p\) (\(p=0,\dots,S-1\)), with reserved prefix \(\mathcal{B}=\{0,\dots,n_{\mathrm{bos}}-1\}\) and suffix \(\mathcal{E}=\{S-n_{\mathrm{eos}},\dots,S-1\}\), the selected page set is
\[\mathcal{S} = \mathcal{B}\,\cup\,\mathcal{E}\,\cup\, \operatorname*{top\text{-}k}_{\,p\,\notin\,\mathcal{B}\cup\mathcal{E}} X_p, \qquad k = \texttt{topk\_val}.\]- __init__:
topK()— no arguments; the budgettopk_valand the reserved BOS/EOS counts are read fromContextat runtime.- __call__:
op(score, o, ctx=ctx)—scoreis[S, 1, 1](one scalar per page; must beRAGGED); the selected page ids are written in place intoo. Returns nothing.- Note:
every flow’s
forward_indexermust end in this op (orapproxTopK).
- class approxTopK(tolerate_ratio=0.0)[source]¶
Bases:
topKApproximate
topK— faster adaptive 8-bit radix selection.- Math:
The selected set \(\mathcal{S}\) matches
topK, but the top-\(k\) search over the non-reserved pages stops at the first radix round whose count \(r\) still owed by the threshold bin satisfies\[r \;\le\; \texttt{tolerate\_ratio}\cdot k,\]with any remaining slots filled in arrival order.
- __init__:
approxTopK(tolerate_ratio=0.0)— approximation budget in[0, 1]:0.0= exact (all radix rounds run), higher = cheaper but looser (typical throughput sweet spot0.05–0.15).- __call__:
op(score, o, ctx=ctx)— same signature astopK(score[S, 1, 1],RAGGED; writes page ids intoo).- Note:
indices are unsorted within each request; downstream consumers must not assume sorted order.
- Parameters:
tolerate_ratio (float)
- class Union[source]¶
Bases:
vOpPer-row union of two
(block_table, seqlens)pairs (trtllm output func).- Math:
For request row \(i\), given two selected block-id sets \(\mathcal{B}_i^{0}\), \(\mathcal{B}_i^{1}\) (from two
vortex_torch.indexer.TopKcalls), the final sparse set is the deduplicated union with the dense trailing block \(\ell_i\) pinned to the tail:\[\mathcal{U}_i = \big(\mathcal{B}_i^{0}\cup\mathcal{B}_i^{1}\setminus\{\ell_i\}\big) \;\Vert\; \{\ell_i\},\]and \(\text{seqlens}_i = u_i\cdot\text{block\_size} + \text{last\_block\_len}_i\), where \(u_i = |\mathcal{U}_i| - 1\). Pinning \(\ell_i\) last lets trtllm read the partial token count from the right slot.
- __init__:
Union()— no arguments.- __call__:
op((bt_0, sl_0), (bt_1, sl_1), o, ctx=ctx)— two RAGGED(block_table, seqlens)tuples → writes the union intoo(ctx.metadata.sparse_block_tables) and updatesctx.metadata.sparse_seqlensas a side effect.- Note:
trtllm only — asserts under flashinfer. Terminal op of a trtllm
forward_indexer(alternative totopK()).