AstraFlow
/

Get Started

  • Installation
  • Quick Start

System Components

  • System Overview
  • Dataflow
  • RaaS
  • Trainer
  • WeightManager

Recipes

  • Math
  • Code
  • Multi-Agent (Math)
  • AgentBench
  • Search

On this page

  • Environment setup
  • Qwen2.5-7B-Instruct — 8 GPUs
    • Run
    • Settings
  1. AstraFlow /
  2. Search

Search¶

Reinforcement learning for search-augmented agents (ASearcher) that interleave reasoning with local retrieval against a Wikipedia knowledge base.

Search recipes: examples/search/

Each recipe ships an all-in-one launch script under scripts/ and its config under yaml/.

Environment setup¶

The search recipes query a local FAISS retrieval server over the Wikipedia 2018 corpus. Set it up once before training.

Install the retrieval server dependencies:

conda create -n rag-retriever python=3.10 -y
conda activate rag-retriever
pip install -r astraEnv/ASearcher/requirements-rag-server.txt

Download the knowledge corpus and build the index (this can take hours to build the index):

cd astraEnv/ASearcher
conda activate rag-retriever
export WIKI2018_WORK_DIR=data/wiki2018
mkdir -p "$WIKI2018_WORK_DIR"
huggingface-cli download inclusionAI/ASearcher-Local-Knowledge \
  --repo-type dataset --local-dir "$WIKI2018_WORK_DIR" --local-dir-use-symlinks False
bash scripts/build_index.sh

Start the retrieval server before training — it uses the 2 GPUs the launcher leaves free:

cd astraEnv/ASearcher
conda activate rag-retriever
export RAG_SERVER_ADDR_DIR=./tmp-log/rag_server_addrs
export PORT=7000
export USE_FAISS_GPU=1   # set 0 to disable GPU FAISS
bash scripts/launch_rag_server.sh 6,7

Qwen2.5-7B-Instruct — 8 GPUs¶

The search recipe trains an ASearcher agent on an 8-GPU node — 4 GPUs for inference, 2 for training, with 2 GPUs left for a local retrieval (RAG) server. It comes in two variants that differ only in weight transfer mode:

  • qwen2.5-7b-instruct-m2po-full/ — full weight transfer

  • qwen2.5-7b-instruct-m2po-delta/ — delta weight transfer (only changed weights are sent)

The agent uses the async-search-access search client, which queries the local FAISS retrieval server from Environment setup above. The launch script reads the server addresses from astraEnv/ASearcher/tmp-log/rag_server_addrs and aborts if the server is not running.

Run¶

With the retrieval server already running, one script launches all three processes — the AstraFlow service, the RaaS inference server, and the trainer:

# delta weight transfer
bash examples/search/qwen2.5-7b-instruct-m2po-delta/scripts/run_qwen2.5-7b-instruct-m2po-delta.sh

# full weight transfer
bash examples/search/qwen2.5-7b-instruct-m2po-full/scripts/run_qwen2.5-7b-instruct-m2po-full.sh

Settings¶

Setting

Value

Model

Qwen2.5-7B-Instruct

GPUs

6 of 8 — RaaS ×4 (SGLang, DP=4), Trainer ×2 (FSDP, DP=2)

Algorithm

M2PO (m2_threshold 0.004)

Weight transfer

TCP — full, or delta (delta_full_sync_interval 10)

Context length

16384

Max new tokens

1024

Rollouts per prompt

8 (temperature 1.0)

Train batch size

256

Learning rate

5e-6 (Adam, constant schedule)

Train steps

1000

Workflow / reward

asearcher (max_turns 32) / F1

Retrieval

Local FAISS RAG server over Wikipedia 2018, async-search-access client (topk 5)

Train dataset

ASearcher-Base-35k

Eval datasets

TriviaQA, PopQA, HotpotQA, Bamboogle

Previous
AgentBench

2025-2026, AstraFlow Team

Made with Sphinx and Shibuya theme.