vortex_torch.indexer.select¶
Selection-style indexer ops.
Ops in this file produce intermediate block-table / seqlens tensors —
they don’t write into the framework-provided o (which the existing
vortex_torch.indexer.topK / “TopKOut” does). Use a follow-up op
(forthcoming) to copy the intermediates produced here into the final
o / ctx.metadata.sparse_seqlens buffers consumed by
trtllm_batch_decode_with_kv_cache.
Classes
|
Explicit- |
- class TopK(k)[source]¶
Bases:
vOpExplicit-
kblock-table top-k (trtllm backend, two outputs).- Math:
For request row \(i\) with score \(s\) over its dense blocks,
bos/eosreserved blocks and \(L_i = \lceil \text{seqlen}_i / \text{block\_size}\rceil\):\[\begin{split}\mathcal{B}_i = \begin{cases} \{0,\dots,L_i-1\}, & L_i \le \text{bos}+k+\text{eos}, \\[2pt] [0,\text{bos}) \;\cup\; \operatorname*{top\text{-}k}_{\,\text{bos}\le j<L_i-\text{eos}} s_j \;\cup\; [L_i-\text{eos}, L_i), & \text{otherwise}. \end{cases}\end{split}\]Rows that already fit the budget are copied dense; larger rows keep the leading
bosand trailingeosblocks and fill the middle with the top-kby score.- __init__:
TopK(k)— number of selected blocks (excludes the reserved BOS/EOS); per-row sparse block count isbos + k + eos.- __call__:
block_tables, seqlens = op(score, ctx=ctx)—scoreRAGGED[S, 1, 1]→block_tables(RAGGED int32) andseqlens(BATCHED int32), both auto-allocated. Feed the pair toUnionbeforetrtllm_batch_decode_with_kv_cache.- Note:
trtllm only — asserts under flashinfer; use
topK()for flashinfer / CSR layouts.- Parameters:
k (int)