vortex_torch.indexer.context¶
Functions
|
Classes
|
Static, single-instance indexer context. |
- class Context[source]¶
Bases:
ContextBaseStatic, single-instance indexer context.
Holds only configuration that’s fixed for the lifetime of the compiled indexer — shapes, page/block sizes, head counts, allocation budgets, codegen scratch (op/tensor lists), backend identity, …
Per-forward-batch buffers (winfo_*, dense/sparse_kv_indptr+indices, dense/sparse_seqlens, dense/sparse_block_tables, kv_last_page_len) and
batch_sizelive on a separateMetaDataobject, pre-allocated at attention-backend__init__and exposed asctx.metadata. Codegen emitsctx.metadata.<field>for any value that varies between forward batches.- Build pattern (from each attention backend’s
_compile): ctx = Context() ctx.create(self, model_runner) # static fields ctx.metadata = MetaData.preallocate(ctx, device=…)
- property batch_size: int¶
Current batch size — proxied from
self.metadata.Kept as a property (not a slot) so the only writable copy lives on
self.metadata;ctx.batch_sizeis now read-only.
- set_batch_size(n)[source]¶
Compatibility shim — forwards to
self.metadata.set_batch_size.- Parameters:
n (int)
- Return type:
None
- create(parent, model_runner, *, overwrite=False)[source]¶
Populate the static fields. Per-batch
MetaDatais allocated separately by the caller viaMetaData.preallocate(ctx, device=...)— see this class’s docstring.
- vortex_dtype: torch.dtype¶
Intermediate-tensor dtype (default
torch.bfloat16).
- query_arg_names¶
- Build pattern (from each attention backend’s
- class MetaData[source]¶
Bases:
objectPer-forward-batch state for the indexer. See module docstring.
- winfo_q_indices: torch.Tensor¶
- winfo_is_first_workload_per_batch: torch.Tensor¶
- winfo_kv_offsets: torch.Tensor¶
- winfo_kv_lens: torch.Tensor¶
- winfo_num_workloads: torch.Tensor¶
- winfo_chunk_size: torch.Tensor¶
- kv_last_page_len: torch.Tensor¶
- classmethod preallocate(ctx, *, device)[source]¶
Build a backend-appropriate
MetaDatafrom a populatedContext.- Picks the buffer set from
ctx.vortex_attention_backend: "flashinfer"→ allocates CSR buffers (dense/sparse_kv_indptr,dense/sparse_kv_indices); leaves block-table buffers asNone."trtllm"→ allocates 2D block_tables + per-row seqlens; leaves CSR buffers asNone.
winfo_*andkv_last_page_lenare always allocated.- Picks the buffer set from