vortex_torch.indexer.select

Selection-style indexer ops.

Ops in this file produce intermediate block-table / seqlens tensors — they don’t write into the framework-provided o (which the existing vortex_torch.indexer.topK / “TopKOut” does). Use a follow-up op (forthcoming) to copy the intermediates produced here into the final o / ctx.metadata.sparse_seqlens buffers consumed by trtllm_batch_decode_with_kv_cache.

Classes

TopK(k)

Explicit-k block-table top-k (trtllm backend, two outputs).

class TopK(k)[source]

Bases: vOp

Explicit-k block-table top-k (trtllm backend, two outputs).

Math:

For request row \(i\) with score \(s\) over its dense blocks, bos / eos reserved blocks and \(L_i = \lceil \text{seqlen}_i / \text{block\_size}\rceil\):

\[\begin{split}\mathcal{B}_i = \begin{cases} \{0,\dots,L_i-1\}, & L_i \le \text{bos}+k+\text{eos}, \\[2pt] [0,\text{bos}) \;\cup\; \operatorname*{top\text{-}k}_{\,\text{bos}\le j<L_i-\text{eos}} s_j \;\cup\; [L_i-\text{eos}, L_i), & \text{otherwise}. \end{cases}\end{split}\]

Rows that already fit the budget are copied dense; larger rows keep the leading bos and trailing eos blocks and fill the middle with the top-k by score.

__init__:

TopK(k) — number of selected blocks (excludes the reserved BOS/EOS); per-row sparse block count is bos + k + eos.

__call__:

block_tables, seqlens = op(score, ctx=ctx)score RAGGED [S, 1, 1]block_tables (RAGGED int32) and seqlens (BATCHED int32), both auto-allocated. Feed the pair to Union before trtllm_batch_decode_with_kv_cache.

Note:

trtllm only — asserts under flashinfer; use topK() for flashinfer / CSR layouts.

Parameters:

k (int)