vortex_torch.cache.elementwise

Classes

Abs([alpha, beta])

Absolute-value transform of an affine argument.

Add_Mul([alpha, beta])

Affine transformation \(y = \beta x + \alpha\).

Elementwise([alpha, beta])

Unary elementwise operator dispatcher (e.g. ReLU/Sigmoid/SiLU/Abs/Affine).

Relu([alpha, beta])

Piecewise ReLU-like activation.

Sigmoid([alpha, beta])

Sigmoid activation with configurable shift and slope.

Silu([alpha, beta])

SiLU-like activation with configurable shift and slope.

class vortex_torch.cache.elementwise.Elementwise(alpha=0.0, beta=1.0)[source]

Bases: vOp

Unary elementwise operator dispatcher (e.g. ReLU/Sigmoid/SiLU/Abs/Affine).

This class dispatches a family of unary elementwise operations on rank-3 tensors. The input is treated as

\[X \in \mathbb{R}^{B \times N \times D},\]

where:

  • \(B\) is a leading batch-like axis (for example, max_new_tokens_per_batch * head_num coming from the runtime context),

  • \(N\) is a sequence or position dimension, and

  • \(D\) is a feature/channel dimension.

The operation is applied pointwise:

\[Y[b, n, d] = f(X[b, n, d]; \alpha, \beta, \text{op_type}),\]

where the actual function \(f\) is selected by op_type, and may make use of scalar parameters alpha and beta (for example, in affine or activation variants).

Dispatch is based on the pair of tensor formats (x_format, o_format) and a registry mapping:

(x_format, o_format) -> (impl, resolved_output_format)

Policy

  • If output is None, profile() selects an implementation with o_format == FORMAT.RAGGED, allocates an internal buffer of logical shape [B, N, D], and returns a vTensor view.

  • If output is provided, profile() requires an exact (x_fmt, o_fmt) mapping in _impl_map and validates shape/device consistency.

  • The logical (N, D) axes are preserved by design; only the leading B comes from the runtime context.

_impl_map

Dispatch table keyed by (x_format, o_format). Each entry maps to (callable_impl, resolved_output_format).

Type:

Dict[Tuple[FORMAT, FORMAT], Tuple[Callable, FORMAT]]

alpha

Scalar parameter used by certain unary ops.

Type:

float

beta

Scalar parameter used by certain unary ops.

Type:

float

op_type

Runtime-set enum/int describing the specific elementwise operation.

Type:

Optional[ElementwiseOpType]

impl

The resolved implementation selected during profile().

Type:

Optional[Callable]

output_format

The output tensor format as determined in profile().

Type:

Optional[FORMAT]

output_buffer

Internal output buffer allocated when output is None.

Type:

Optional[torch.Tensor]

profile(x, output, loc, ctx)[source]

Validate inputs, select implementation and output format, and optionally allocate an internal output buffer.

The input tensor x is expected to have logical shape [B, N, D]. The auxiliary tensor loc carries per-position metadata used by the implementation (for example, mapping positions to segments or other runtime indices); its exact shape and semantics are defined by the kernel.

There are two modes:

  • No output provided (output is None):

    • Select an implementation for (x._format, FORMAT.RAGGED).

    • Allocate an internal buffer with shape [B, N, D], where

      \[B = \text{ctx.max_new_tokens_per_batch} \times \text{ctx.head_num},\]
    • Wrap it as a vTensor with the resolved output format.

  • Output provided (output is not None):

    • Require an exact mapping for (x._format, output._format).

    • Validate that output has rank 3 and preserves the (N, D) dimensions of x.

    • Validate device consistency between x and output.

Parameters:
  • x (vTensor) – Input tensor with logical shape [B, N, D].

  • output (Optional[vTensor]) – Optional preallocated output tensor. If None, an internal buffer is allocated; otherwise, this tensor must have shape [B_out, N, D] for some B_out and a format compatible with _impl_map.

  • loc (torch.Tensor) – Auxiliary tensor carrying per-position metadata required by the implementation (e.g., location/segment indices).

  • ctx (Context) – Execution context that provides the runtime value of B (via ctx.max_new_tokens_per_batch and ctx.head_num) and is used for auxiliary memory accounting.

Returns:

A vTensor view representing the resolved output: either the provided output or an internally allocated buffer.

Return type:

vTensor

Raises:

AssertionError – If types, ranks, formats, shapes, or devices are incompatible, or if no implementation is found in _impl_map.

execute(x, output, loc, ctx)[source]

Execute the selected unary elementwise implementation.

This method assumes that profile() has already selected an implementation and, if needed, allocated an internal output buffer.

Parameters:
  • x (torch.Tensor) – Plain input tensor with shape consistent with the vTensor validated in profile().

  • output (Optional[torch.Tensor]) – Optional preallocated output tensor. If None, the internal buffer created during profile() will be used.

  • loc (torch.Tensor) – Auxiliary tensor carrying per-position metadata required by the implementation (e.g., location/segment indices).

  • ctx (Context) – Execution context forwarded to the implementation.

Returns:

The output tensor written by the implementation: either the provided output or the internal buffer.

Return type:

torch.Tensor

Raises:

AssertionError – If profile() has not been called and no implementation or internal buffer is available.

Parameters:
class vortex_torch.cache.elementwise.Relu(alpha=0.0, beta=0.0)[source]

Bases: Elementwise

Piecewise ReLU-like activation.

This operator applies, elementwise, the scalar function

\[\begin{split}f(x; \alpha, \beta) = \begin{cases} x, & x \ge \alpha, \\ \beta, & x < \alpha. \end{cases}\end{split}\]

Given an input tensor \(X \in \mathbb{R}^{B \times N \times D}\), the output is defined by

\[Y[b, n, d] = f\bigl(X[b, n, d]; \alpha, \beta\bigr).\]
Parameters:
  • alpha (float, optional) – Threshold value \(\alpha\). Inputs greater than or equal to this threshold are passed through unchanged. Default is 0.0.

  • beta (float, optional) – Fallback value \(\beta\) used when \(x < \alpha\). Default is 0.0.

class vortex_torch.cache.elementwise.Silu(alpha=0.0, beta=0.0)[source]

Bases: Elementwise

SiLU-like activation with configurable shift and slope.

This operator applies, elementwise, the scalar function

\[\operatorname{SiLU}(x; \alpha, \beta) = \frac{x}{1 + \exp(\beta x + \alpha)}.\]

Given an input tensor \(X \in \mathbb{R}^{B \times N \times D}\), the output is

\[Y[b, n, d] = \operatorname{SiLU}\bigl(X[b, n, d]; \alpha, \beta\bigr).\]
Parameters:
  • alpha (float, optional) – Bias term \(\alpha\) added inside the exponential. Default is 0.0.

  • beta (float, optional) – Slope \(\beta\) multiplying \(x\) inside the exponential. Default is 0.0.

class vortex_torch.cache.elementwise.Sigmoid(alpha=0.0, beta=0.0)[source]

Bases: Elementwise

Sigmoid activation with configurable shift and slope.

This operator applies, elementwise, the scalar function

\[\sigma(x; \alpha, \beta) = \frac{1}{1 + \exp(\beta x + \alpha)}.\]

Given an input tensor \(X \in \mathbb{R}^{B \times N \times D}\), the output is

\[Y[b, n, d] = \sigma\bigl(X[b, n, d]; \alpha, \beta\bigr).\]
Parameters:
  • alpha (float, optional) – Bias term \(\alpha\) added inside the exponential. Default is 0.0.

  • beta (float, optional) – Slope \(\beta\) multiplying \(x\) inside the exponential. Default is 0.0.

class vortex_torch.cache.elementwise.Add_Mul(alpha=0.0, beta=1.0)[source]

Bases: Elementwise

Affine transformation \(y = \beta x + \alpha\).

This operator applies, elementwise, the scalar function

\[f(x; \alpha, \beta) = \beta x + \alpha.\]

For an input tensor \(X \in \mathbb{R}^{B \times N \times D}\), the output is

\[Y[b, n, d] = \beta \, X[b, n, d] + \alpha.\]
Parameters:
  • alpha (float, optional) – Additive term \(\alpha\) in the affine transform. Default is 0.0.

  • beta (float, optional) – Multiplicative term \(\beta\) in the affine transform. Default is 1.0.

class vortex_torch.cache.elementwise.Abs(alpha=0.0, beta=1.0)[source]

Bases: Elementwise

Absolute-value transform of an affine argument.

This operator applies, elementwise, the scalar function

\[f(x; \alpha, \beta) = \bigl|\beta x + \alpha\bigr|.\]

For an input tensor \(X \in \mathbb{R}^{B \times N \times D}\), the output is

\[Y[b, n, d] = \bigl|\beta \, X[b, n, d] + \alpha\bigr|.\]
Parameters:
  • alpha (float, optional) – Additive term \(\alpha\) inside the absolute value. Default is 0.0.

  • beta (float, optional) – Multiplicative term \(\beta\) inside the absolute value. Default is 1.0.