RaaS

RaaS (Remote Agentic Serving, astraflow/raas/) manages inference engines and rollout generation.

Key Components

  • RaaS3Manager — Central async manager that maintains per-model inference engines, handles workflow registration, and manages pause/resume for weight sync.

  • server/routes.py — FastAPI routes exposing the RaaS HTTP API.

  • server/tcp_receiver.py — TCP weight receiver that pulls weights from the Trainer’s sender agent.

  • engine/ — Backend engine implementations (SGLang, vLLM wrappers).

  • api/ — Configuration dataclasses and engine specs.

Responsibilities

  • Launch and manage vLLM/SGLang inference servers.

  • Execute rollout workflows with registered reward functions.

  • Track weight versions; pull updated weights from Trainer when behind.

  • Pause/resume inference during weight synchronization.

How It Fits

RaaS is the inference side of the loop:

  • Receives rollout requests from Dataflow

  • Generates completions using managed inference engines

  • Accepts weight updates from Trainer via pull-based TCP transfer

For a full guide on implementing a custom RaaS, see Custom RaaS Integration.


RaaS HTTP API

All endpoints use binary pickle/cloudpickle serialization (Content-Type: application/octet-stream) except GET /status and GET /availability, which return JSON. Pickle endpoints wrap responses as {"ok": True, "result": ...} on success or {"ok": False, "error": ...} on failure (HTTP 500). Source: astraflow/raas/server/routes.py.

Health & Status (JSON)

Method

Path

Purpose

GET

/status

Readiness check ("ready" / "idle" / "starting" / "error"); also polled as heartbeat by the orchestrator

GET

/availability

Capacity for load-balanced routing ({available, inflight, ...})

Rollout (pickle)

Method

Path

Purpose

POST

/register_workflow

Register a workflow class + reward fn for rollout generation

POST

/submit

Submit one prompt for rollout → {task_id}

POST

/pull

Drain completed rollout results → list[{task_id, result}]

Weight Sync (pickle)

Method

Path

Purpose

POST

/notify_version

Per-model weight update. Payload {model_id, version, sender_endpoint}. RaaS pulls weights for that one model from the sender and hot-swaps its inference engine.

Evaluation (pickle)

Method

Path

Purpose

POST

/reset_training_engine

Cancel in-flight training rollouts, drain engines, ready state for eval

POST

/eval_start

Begin eval session (reset tracking state)

POST

/eval_submit

Submit an eval sample

POST

/eval_pull

Collect eval results with progress ({items, inflight, pending, total_submitted})

POST

/eval_end

End eval session (clear tracking state)

Lifecycle (pickle)

Method

Path

Purpose

POST

/shutdown

Graceful shutdown — destroys engines and exits


APIs That RaaS Calls (RaaS as Client)

RaaS is not only a server — it also acts as a client to two external services.

Dataflow Service

On startup, RaaS self-registers with the Dataflow orchestrator:

Method

Target

Purpose

POST

Dataflow /register_raas

Register this RaaS instance into the global pool

This is triggered at boot when --astraflow-url is provided. Dataflow then knows to route rollout requests to this instance.

Trainer Sender Agent (Weight Transfer)

When /notify_version is called (one call per model) and RaaS detects a version lag, it initiates a pull-based TCP weight transfer by calling the Trainer’s sender agent:

Method

Target

Purpose

GET

Trainer /get_buffer_info

Query tensor layout (first pull only)

POST

Trainer /register_sglang_instance

Register as a weight receiver (buffer ptr, ZMQ endpoint, handshake ports)

POST

Trainer /request_transfer

Request the actual TCP weight transfer (per pull)

The flow inside manager.notify_version(model_id, version, sender_endpoint) (astraflow/raas/server/manager.py:1648):

Dataflow  ──POST /notify_version──> RaaS (for one model_id)
  {model_id, version, sender_endpoint}          │
                                                │ acquire per-model asyncio.Lock
                                                ├─ GET  /get_buffer_info ─────────> Trainer SenderAgent (first pull only)
                                                ├─ POST /register_sglang_instance ─> Trainer SenderAgent (first pull only)
                                                ├─ POST /request_transfer ────────> Trainer SenderAgent
                                                ├─ ← TCP bulk transfer + ZMQ signal
                                                ├─ save bytes to /dev/shm as safetensors
                                                ├─ engine.pause_generation()
                                                ├─ engine.load_weights_from_path(...)
                                                └─ engine.continue_generation()

For multi-model training, Dataflow’s Python-side barrier sends one such call per model, in sequence or in parallel depending on whether eval is requested — see Multi-Agent Weight Transfer.


Call Graph (RaaS-Centric)

All HTTP calls to and from RaaS, organized by phase.

                                    ┌─────────────────────┐
                                    │        RaaS         │
                                    │      (raas/)        │
                                    └──────────┬──────────┘
    ╔══════════════════════════════════════════════════════════════════╗
    ║  STARTUP                                                       ║
    ╚══════════════════════════════════════════════════════════════════╝
      RaaS ──POST /register_raas──────────> Dataflow      join the pool
      Dataflow ──POST /register_workflow──> RaaS          register rollout workflows
    ╔══════════════════════════════════════════════════════════════════╗
    ║  ROLLOUT (continuous, async)                                   ║
    ╚══════════════════════════════════════════════════════════════════╝
      Dataflow ──GET /availability────────> RaaS          check capacity
      Dataflow ──POST /submit─────────────> RaaS          submit prompts
      Dataflow ──POST /pull───────────────> RaaS          collect results
    ╔══════════════════════════════════════════════════════════════════╗
    ║  WEIGHT SYNC (per training step, one call per model)             ║
    ╚══════════════════════════════════════════════════════════════════╝
      Dataflow ──POST /notify_version───────> RaaS        trigger weight load
                  {model_id, version,
                   sender_endpoint}             │
        RaaS ──POST /register_sglang_instance──> Trainer      register as receiver (first pull only)
        RaaS ──POST /request_transfer──────────> Trainer      pull weights via TCP
    ╔══════════════════════════════════════════════════════════════════╗
    ║  EVALUATION (triggered after weight sync)                      ║
    ╚══════════════════════════════════════════════════════════════════╝
      Dataflow ──POST /eval_start─────────> RaaS          begin eval session
      Dataflow ──POST /eval_submit────────> RaaS          submit eval samples
      Dataflow ──POST /eval_pull──────────> RaaS          collect eval results
      Dataflow ──POST /eval_end───────────> RaaS          end eval session
    ╔══════════════════════════════════════════════════════════════════╗
    ║  LIFECYCLE                                                     ║
    ╚══════════════════════════════════════════════════════════════════╝
      Dataflow ──GET /status──────────────> RaaS          health check
      Dataflow ──POST /shutdown───────────> RaaS          graceful shutdown

Inbound (RaaS as server): Dataflow calls RaaS for rollout, eval, weight sync, and lifecycle management. Outbound (RaaS as client): RaaS calls Dataflow once at startup (registration) and calls Trainer directly during weight sync (TCP pull).

Initial Startup vs Recovery

  • Fresh start (version=0): Both RaaS and Trainer load the same model checkpoint independently. No weight transfer needed — data acquisition begins immediately.

  • Recovery (version > 0): Trainer sends recovered_version in POST /ready. Dataflow then fans out POST /notify_version to all RaaS instances (once per registered model) before starting data acquisition, ensuring every RaaS loads the recovered weights.