LIBERO Evaluation
LIBERO is a tabletop robotic manipulation benchmark with 4 task suites (Spatial, Object, Goal, Long Horizon), totaling 40 tasks. It tests VLA models on spatial understanding, object recognition, goal reasoning, and long-horizon manipulation using a Franka robotic arm.
This document provides instructions for reproducing our experimental results with LIBERO. The evaluation process consists of two main parts:
- Setting up the
LIBEROenvironment and dependencies. - Running the evaluation by launching services in both
starVLAandLIBEROenvironments.
We have verified that this workflow runs successfully on both NVIDIA A100 and RTX 4090 GPUs.
LIBERO Evaluation
Section titled “LIBERO Evaluation”0. Download Checkpoints
Section titled “0. Download Checkpoints”We provide a collection of pretrained checkpoints on Hugging Face to make community evaluation easier: 🤗 StarVLA/bench-libero. Their corresponding results on LIBERO are summarized in the table below.
Experimental Results
Section titled “Experimental Results”| Model | Steps | Epochs | Spatial | Object | Goal | Long | Avg |
|---|---|---|---|---|---|---|---|
| $\pi_0$+FAST | - | - | 96.4 | 96.8 | 88.6 | 60.2 | 85.5 |
| OpenVLA-OFT | 175K | 223 | 97.6 | 98.4 | 97.9 | 94.5 | 97.1 |
| $\pi_0$ | - | - | 96.8 | 98.8 | 95.8 | 85.2 | 94.1 |
| GR00T-N1.5 | 20K | 203 | 92.0 | 92.0 | 86.0 | 76.0 | 86.5 |
| Qwen2.5-VL-FAST | 30K | 9.54 | 97.3 | 97.2 | 96.1 | 90.2 | 95.2 |
| Qwen2.5-VL-OFT | 30K | 9.54 | 97.4 | 98.0 | 96.8 | 92.0 | 96.1 |
| Qwen2.5-VL-GR00T | 30K | 9.54 | 97.8 | 98.2 | 94.6 | 90.8 | 95.4 |
| Qwen3-VL-FAST | 30K | 9.54 | 97.3 | 97.4 | 96.3 | 90.6 | 95.4 |
| Qwen3-VL-OFT | 30K | 9.54 | 97.8 | 98.6 | 96.2 | 93.8 | 96.6 |
| Qwen3-VL-GR00T | 30K | 9.54 | 97.8 | 98.8 | 97.4 | 92.0 | 96.5 |
We train one policy for all 4 suites. All scores are averaged over 500 trials for each task suite (10 tasks × 50 episodes).
1. Environment Setup
Section titled “1. Environment Setup”To set up the environment, please first follow the official LIBERO repository to install the base LIBERO environment.
⚠️ Common issue: LIBERO defaults to Python 3.8, but the syntax updates between 3.8 and 3.10 are substantial. We verified that using Python 3.10 avoids many issues.
Afterwards, inside the LIBERO environment, install the following dependencies:
pip install tyro matplotlib mediapy websockets msgpackpip install numpy==1.24.4 # Downgrade numpy for compatibility with the simulation environment2. Evaluation Workflow
Section titled “2. Evaluation Workflow”Run the evaluation from the starVLA repository root using two separate terminals, one for each environment.
- starVLA environment: runs the inference server.
- LIBERO environment: runs the simulation.
Step 1. Start the server (starVLA environment)
Section titled “Step 1. Start the server (starVLA environment)”In the first terminal, activate the starVLA conda environment and run:
bash examples/LIBERO/eval_files/run_policy_server.sh⚠️ Note: Please ensure that you specify the correct checkpoint path in examples/LIBERO/eval_files/run_policy_server.sh
Step 2. Start the simulation (LIBERO environment)
Section titled “Step 2. Start the simulation (LIBERO environment)”In the second terminal, activate the LIBERO conda environment and run:
bash examples/LIBERO/eval_files/eval_libero.sh⚠️ Note: Make sure you correctly set the following variables in eval_libero.sh:
| Variable | Meaning | Example |
|---|---|---|
LIBERO_HOME | Path to your LIBERO repo clone | /path/to/LIBERO |
LIBERO_Python | Python path from the LIBERO conda env | $(which python) (inside LIBERO env) |
your_ckpt | StarVLA checkpoint path | ./results/Checkpoints/.../steps_30000_pytorch_model.pt |
unnorm_key | Robot type name for loading unnormalization stats | franka (LIBERO uses Franka arm) |
unnorm_key is used to load normalization statistics (min/max, etc.) saved during training, converting normalized model outputs back to actual joint angles.
Finally, each result will also save a video for visualization, as shown below:

LIBERO Training
Section titled “LIBERO Training”Step 0: Download the training dataset
Section titled “Step 0: Download the training dataset”Download the datasets to the playground/Datasets/LEROBOT_LIBERO_DATA directory:
And move modality.json to each $LEROBOT_LIBERO_DATA/subset/meta/modality.json.
You could quickly prepare these by running:
# Set DEST to the directory where you want to store the dataexport DEST=/path/to/your/data/directorybash examples/LIBERO/data_preparation.shStep 1: Start Training
Section titled “Step 1: Start Training”Most of the required training files have been organized in examples/LIBERO/train_files/.
Run the following command to start training:
bash examples/LIBERO/train_files/run_libero_train.sh⚠️ Note: Please ensure that you specify the correct path in examples/LIBERO/train_files/run_libero_train.sh