vortex_torch.indexer.context

Functions

Classes

Context()

Mutable, single-instance context; populate later via .create(...).

class vortex_torch.indexer.context.Context[source]

Bases: ContextBase

Mutable, single-instance context; populate later via .create(…).

dense_kv_indices: torch.Tensor

Dense KV index tensor for mapping keys/values.

sparse_kv_indices: torch.Tensor

Sparse KV index tensor for irregular KV layout.

dense_kv_indptr: torch.Tensor

CSR-style indptr for dense KV segments.

sparse_kv_indptr: torch.Tensor

CSR-style indptr for sparse KV segments.

kv_last_page_len: int

Length of the last KV page.

batch_size: int

Active batch size.

winfo_q_indices: torch.Tensor

Query indices used in workload scheduling.

winfo_kv_offsets: torch.Tensor

KV offsets per workload.

winfo_kv_lens: torch.Tensor

KV lengths per workload.

winfo_num_workloads: int

Number of workloads in the current batch.

winfo_chunk_size: int

Chunk size for workload partitioning.

max_num_workloads: int

Maximum number of workloads allowed.

max_chunk_size: int

Maximum allowed chunk size.

min_chunk_size: int

Minimum allowed chunk size.

group_size: int

Group size for grouped attention.

num_kv_heads: int

Number of KV heads.

num_qo_heads: int

Number of query/output heads.

head_dim: int

Dimension per attention head.

num_sms: int

Number of streaming multiprocessors (SMs).

page_size: int

Page size used for memory paging.

max_num_pages: int

Total available pages.

max_num_pages_per_request: int

Page limit per individual request.

indexer_dtype: torch.dtype

Dtype used by indexer operations.

topk_val: int

Top-K value used in pruning or selection.

page_reserved_bos: int

Reserved page count for BOS (begin-of-sequence).

page_reserved_eos: int

Reserved page count for EOS (end-of-sequence).

set_batch_size(n)[source]
Parameters:

n (int)

Return type:

None

create(parent, model_runner, *, overwrite=False)[source]

Populate this instance once (no locking). Set overwrite=True to allow re-init. NOTE: Without locking, concurrent callers may race; call from a single thread.

Parameters:
  • parent (Any)

  • model_runner (Any)

  • overwrite (bool)

Return type:

Context

name: str

Human-readable context name.

mode: Literal['profile', 'execute']

Current operating mode.

vortex_torch.indexer.context.get_ctx()[source]
Return type:

Context