Vortex ====== **Vortex** turns sparse-attention algorithm design into something *AI agents can do*. You describe a sparse-attention **flow** in a few lines of high-level Python ops, and Vortex compiles it into fused Triton/CUDA kernels that plug straight into `SGLang `_'s decode loop — no manual kernel writing, and the result runs (and is benchmarked) inside a real serving stack. A flow is just three methods on a :class:`~vortex_torch.flow.flow.vFlow`: * **create_cache** — declare the auxiliary per-page state you want to keep (e.g. a centroid, a min/max envelope) alongside the K/V cache. * **forward_cache** — fill that state from the keys/values as each page completes (runs once per page). * **forward_indexer** — score the cached pages against the query and emit the sparse set of pages to attend to (runs every decode step). See :doc:`quickstart` for the shortest end-to-end path, :doc:`examples` for detailed recipes (custom flows, ``VortexConfig``, a programmable budget, MLA models, and server mode), and the :doc:`API reference ` for the full op set. .. note:: Looking for the big picture and benchmarks? See the `project page `_ and the `paper `_. .. toctree:: :maxdepth: 2 :caption: Getting started installation quickstart .. toctree:: :maxdepth: 2 :caption: Guides examples .. toctree:: :maxdepth: 2 :caption: Reference api