Skip to content

Model Zoo

ModelDescriptionLink
Qwen2.5-VL-3B-ActionExtend Qwen2.5-VL vocabulary with fast tokens (special vocabulary extension for discretizing continuous actions into tokens)Hugging Face
Qwen3-VL-4B-ActionExtend Qwen3-VL vocabulary with fast tokens (same as above)Hugging Face
pi-fastpi-fast action tokenizer weightsHugging Face

Bridge is a WidowX tabletop manipulation dataset; Fractal is Google’s RT-1 robot manipulation dataset.

ModelFrameworkBase VLMDescriptionWidowXLink
Qwen2.5-FAST-Bridge-RT-1QwenFastQwen2.5-VL-3BBridge + Fractal58.6HF
Qwen2.5-OFT-Bridge-RT-1QwenOFTQwen2.5-VL-3BBridge + Fractal41.8HF
Qwen2.5-PI-Bridge-RT-1QwenPIQwen2.5-VL-3BBridge + Fractal62.5HF
Qwen2.5-GR00T-Bridge-RT-1QwenGR00TQwen2.5-VL-3BBridge + Fractal63.6HF
Qwen-GR00T-BridgeQwenGR00TQwen2.5-VL-3BBridge only71.4HF
Qwen3VL-OFT-Bridge-RT-1QwenOFTQwen3-VL-4BBridge + Fractal42.7HF
Qwen3VL-GR00T-Bridge-RT-1QwenGR00TQwen3-VL-4BBridge + Fractal65.3HF
Florence-GR00T-Bridge-RT-1QwenGR00TFlorence-2Bridge + Fractal (small model)-HF

WidowX column: Success rate (%) on WidowX robot tasks in SimplerEnv. Higher is better.

LIBERO has 4 task suites (Spatial, Object, Goal, Long Horizon) with 40 tasks total. All checkpoints are trained jointly on all 4 suites. See LIBERO evaluation docs.

ModelFrameworkBase VLMLink
Qwen2.5-VL-FAST-LIBERO-4in1QwenFastQwen2.5-VL-3BHF
Qwen2.5-VL-OFT-LIBERO-4in1QwenOFTQwen2.5-VL-3BHF
Qwen2.5-VL-GR00T-LIBERO-4in1QwenGR00TQwen2.5-VL-3BHF
Qwen3-VL-OFT-LIBERO-4in1QwenOFTQwen3-VL-4BHF
Qwen3-VL-PI-LIBERO-4in1QwenPIQwen3-VL-4BHF

RoboCasa GR1 Tabletop Tasks with 24 Pick-and-Place tasks. See RoboCasa evaluation docs.

ModelFrameworkBase VLMLink
Qwen3-VL-GR00T-Robocasa-gr1QwenGR00TQwen3-VL-4BHF
Qwen3-VL-OFT-RobocasaQwenOFTQwen3-VL-4BHF

RoboTwin 2.0 dual-arm manipulation benchmark with 50 tasks. See RoboTwin evaluation docs.

ModelFrameworkBase VLMLink
Qwen3-VL-OFT-Robotwin2-AllQwenOFTQwen3-VL-4BHF
Qwen3-VL-OFT-Robotwin2QwenOFTQwen3-VL-4BHF

BEHAVIOR-1K household task benchmark using R1Pro humanoid robot. See BEHAVIOR evaluation docs.

ModelDescriptionLink
BEHAVIOR-QwenDual-taskallJointly trained on all 50 tasksHF
BEHAVIOR-QwenDual-task1Single-task trainingHF
BEHAVIOR-QwenDual-task6-40k6-task joint trainingHF
BEHAVIOR-rgp-segSegmentation observation experimentHF

DatasetDescriptionLink
LLaVA-OneVision-COCOImage-text dataset for VLM co-training (ShareGPT4V-COCO subset)HF
RoboTwin-CleanRoboTwin 2.0 clean demonstrations (50 per task)HF
RoboTwin-RandomizedRoboTwin 2.0 randomized demonstrations (500 per task)HF
RoboTwin-Randomized-targzSame as above, tar.gz packed format (for bulk download)HF
DatasetDescriptionLink
BEHAVIOR-1KBEHAVIOR-1K benchmark simulation configsHF
BEHAVIOR-1K-datasetsBEHAVIOR-1K training datasetsHF
BEHAVIOR-1K-datasets-assetsBEHAVIOR-1K scene and object assetsHF
BEHAVIOR-1K-VISUALIZATION-DEMOBEHAVIOR-1K visualization demosHF
behavior-1k-task0Single-task training data sampleHF

Download a checkpoint and run the policy server:

Terminal window
# Download (requires huggingface_hub)
huggingface-cli download StarVLA/Qwen3VL-GR00T-Bridge-RT-1 --local-dir ./results/Checkpoints/Qwen3VL-GR00T-Bridge-RT-1
# Start the policy server
python deployment/model_server/server_policy.py \
# steps_XXXXX is the training step count — replace with the actual filename from your download
# e.g. steps_50000_pytorch_model.pt; run `ls` to see the exact filename
--ckpt_path ./results/Checkpoints/Qwen3VL-GR00T-Bridge-RT-1/checkpoints/steps_XXXXX_pytorch_model.pt \
--port 5694 \
--use_bf16

Then follow the evaluation guide for the benchmark you want to test on (e.g. SimplerEnv, LIBERO, RoboCasa, RoboTwin, BEHAVIOR).