Skip to content

RoboCasa Evaluation

RoboCasa is a large-scale household simulation benchmark. Here we use the GR1 Tabletop Tasks subset, featuring 24 tabletop Pick-and-Place tasks performed by a Fourier GR1 humanoid robot (upper body, dual arms).

This document provides instructions for reproducing our experimental results.

The evaluation process consists of two main parts:

  1. Setting up the robocasa environment and dependencies.
  2. Running the evaluation by launching services in both starVLA and robocasa environments.

We have verified that this workflow runs successfully on NVIDIA A100 GPUs.


TaskGR00T-N1.6Qwen3GR00TQwen3PIQwen3OFTQwen3FAST
PnP Bottle To Cabinet Close51.546.026.030.038.0
PnP Can To Drawer Close13.080.062.076.044.0
PnP Cup To Drawer Close8.554.042.044.056.0
PnP Milk To Microwave Close14.048.050.044.044.0
PnP Potato To Microwave Close41.528.042.032.014.0
PnP Wine To Cabinet Close16.546.032.036.014.0
PnP Novel From Cuttingboard To Basket58.048.040.050.054.0
PnP Novel From Cuttingboard To Cardboardbox46.540.046.040.042.0
PnP Novel From Cuttingboard To Pan68.568.060.070.058.0
PnP Novel From Cuttingboard To Pot65.052.040.054.058.0
PnP Novel From Cuttingboard To Tieredbasket46.556.044.038.040.0
PnP Novel From Placemat To Basket58.542.044.032.036.0
PnP Novel From Placemat To Bowl57.544.052.058.038.0
PnP Novel From Placemat To Plate63.048.050.052.042.0
PnP Novel From Placemat To Tieredshelf28.518.028.024.018.0
PnP Novel From Plate To Bowl57.060.052.060.052.0
PnP Novel From Plate To Cardboardbox43.550.040.050.030.0
PnP Novel From Plate To Pan51.054.036.066.048.0
PnP Novel From Plate To Plate78.770.048.068.050.0
PnP Novel From Tray To Cardboardbox51.538.034.044.028.0
PnP Novel From Tray To Plate71.056.064.056.034.0
PnP Novel From Tray To Pot64.550.044.062.046.0
PnP Novel From Tray To Tieredbasket57.036.050.054.036.0
PnP Novel From Tray To Tieredshelf31.516.028.030.016.0
Average47.647.843.948.839.0

Note: All values are success rates in percentage (%). A single model was trained for all 24 tasks. Results are reported over 50 rollouts per task.


First, download the checkpoints from:

To set up the environment, please first follow the official RoboCasa installation guide to install the base robocasa-gr1-tabletop-tasks environment.

Then install socket support:

Terminal window
pip install tyro

Step 1. Start the server (starVLA environment)

Section titled “Step 1. Start the server (starVLA environment)”

In the first terminal, activate the starVLA conda environment and run:

Terminal window
python deployment/model_server/server_policy.py \
--ckpt_path ${your_ckpt} \
--port 5678 \
--use_bf16

Step 2. Start the simulation (robocasa environment)

Section titled “Step 2. Start the simulation (robocasa environment)”

In the second terminal, activate the robocasa conda environment and run:

Terminal window
export PYTHONPATH=$(pwd):${PYTHONPATH}
your_ckpt=StarVLA/Qwen3-VL-OFT-Robocasa/checkpoints/steps_90000_pytorch_model.pt
python examples/Robocasa_tabletop/eval_files/simulation_env.py\
--args.env_name ${env_name} \
--args.port 5678 \
--args.n_episodes 50 \
--args.n_envs 1 \
--args.max_episode_steps 720 \
--args.n_action_steps 12 \
--args.video_out_path ${video_out_path} \
--args.pretrained_path ${your_ckpt}

If you have more GPUs, you can use the batch evaluation script:

Terminal window
bash examples/Robocasa_tabletop/batch_eval_args.sh

Note: Please ensure that you specify the correct checkpoint path in batch_eval_args.sh


Download the PhysicalAI-Robotics-GR00T-X-Embodiment-Sim directory datasets from HuggingFace to the playground/Datasets/nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim directory.

To download only the relevant finetuning folders, you can refer to GR00T-N1.5 repo’s instruction.

Or use the script to download the *_1000 folders:

Terminal window
python examples/Robocasa_tabletop/download_gr00t_ft_data.py

Different datasets can be selected by modifying the parameter data_mix, and the following script can be used to fine-tune the *_1000 datasets:

Terminal window
bash examples/Robocasa_tabletop/train_files/run_robocasa.sh