vortex_torch.indexer.context¶

Functions

get_ctx()

Classes

Context()

Mutable, single-instance context; populate later via .create(...).

class vortex_torch.indexer.context.Context[source]¶

Bases: ContextBase

Mutable, single-instance context; populate later via .create(…).

dense_kv_indices: torch.Tensor¶: Dense KV index tensor for mapping keys/values.

sparse_kv_indices: torch.Tensor¶: Sparse KV index tensor for irregular KV layout.

dense_kv_indptr: torch.Tensor¶: CSR-style indptr for dense KV segments.

sparse_kv_indptr: torch.Tensor¶: CSR-style indptr for sparse KV segments.

kv_last_page_len: int¶: Length of the last KV page.

batch_size: int¶: Active batch size.

winfo_q_indices: torch.Tensor¶: Query indices used in workload scheduling.

winfo_kv_offsets: torch.Tensor¶: KV offsets per workload.

winfo_kv_lens: torch.Tensor¶: KV lengths per workload.

winfo_num_workloads: int¶: Number of workloads in the current batch.

winfo_chunk_size: int¶: Chunk size for workload partitioning.

max_num_workloads: int¶: Maximum number of workloads allowed.

max_chunk_size: int¶: Maximum allowed chunk size.

min_chunk_size: int¶: Minimum allowed chunk size.

group_size: int¶: Group size for grouped attention.

num_kv_heads: int¶: Number of KV heads.

num_qo_heads: int¶: Number of query/output heads.

head_dim: int¶: Dimension per attention head.

num_sms: int¶: Number of streaming multiprocessors (SMs).

page_size: int¶: Page size used for memory paging.

max_num_pages: int¶: Total available pages.

max_num_pages_per_request: int¶: Page limit per individual request.

indexer_dtype: torch.dtype¶: Dtype used by indexer operations.

topk_val: int¶: Top-K value used in pruning or selection.

page_reserved_bos: int¶: Reserved page count for BOS (begin-of-sequence).

page_reserved_eos: int¶: Reserved page count for EOS (end-of-sequence).

set_batch_size(n)[source]¶

Parameters:: n (int)
Return type:: None

create(parent, model_runner, *, overwrite=False)[source]¶

Populate this instance once (no locking). Set overwrite=True to allow re-init. NOTE: Without locking, concurrent callers may race; call from a single thread.

Parameters:

parent (Any)
model_runner (Any)
overwrite (bool)

Return type:

Context

name: str¶: Human-readable context name.

mode: Literal['profile', 'execute']¶: Current operating mode.

vortex_torch.indexer.context.get_ctx()[source]¶

Return type:: Context