Skip to content

FAQ

Why not put preprocessing in the dataloader?

Section titled “Why not put preprocessing in the dataloader?”

Data preprocessing takes <1% time in profiling. Keeping it inside the Framework allows model-specific handling without leaking assumptions into the dataloader.

Can I use a backbone other than Qwen2.5-VL?

Section titled “Can I use a backbone other than Qwen2.5-VL?”

Yes. Implement new vision and language modules and compose them inside a Framework. Since the framework processes raw action data, swapping backbones is straightforward.

Why isn’t there an abstract interface for the vision tower?

Section titled “Why isn’t there an abstract interface for the vision tower?”

We expect VLMs to be the base model and to include their own native vision tower, so an extra abstract interface is not required.

Can I override or add parameters via the terminal?

Section titled “Can I override or add parameters via the terminal?”

Yes. StarVLA uses OmegaConf.load(args.config_yaml) as the single configuration entry. You can override values from the CLI:

Terminal window
accelerate launch \
--config_file starVLA/config/deepseeds/deepspeed_zero2.yaml \
--num_processes 8 \
starVLA/training/train_starvla.py \
--config_yaml ./starVLA/config/training/starvla_cotrain_oxe.yaml \
--framework.qwenvl.base_vlm Qwen/Qwen2.5-VL-7B-Instruct \
--framework.action_model.new_module ${module_name}

framework.action_model.new_module only adds to the global config; its behavior is defined by your framework.

Yes. Use a comma-separated list of module paths:

--trainer.freeze_modules "qwen_vl_interface.model.model.visual,dino_encoder"

Tip: run print(your_model) to verify module paths. Implementation lives in TrainerUtils.freeze_backbones.

Can I set different learning rates for different modules?

Section titled “Can I set different learning rates for different modules?”

Yes. Use a per-module dict:

trainer:
learning_rate:
base: 1e-05
qwen_vl_interface: 1.0e-05
action_model: 1.0e-04

See trainer_tools.build_param_lr_groups for reference.

Yes. Specify the latest checkpoint path in config:

trainer:
pretrained_checkpoint: path_to_steps_10000.pt
reload_modules: "action_model"

An empty reload_modules loads the full model. StarVLA uses Accelerator’s checkpoint mechanism to fully save and restore optimizer state, learning rate scheduler, and other training state, so training resumes seamlessly.

Example using Florence-2:

Terminal window
accelerate launch \
--config_file starVLA/config/deepseeds/deepspeed_zero2.yaml \
--main_process_ip $MASTER_ADDR \
--main_process_port $MASTER_PORT \
--machine_rank $SLURM_PROCID \
--num_machines $SLURM_NNODES \
--num_processes=${TOTAL_GPUS} \
starVLA/training/train_starvla.py \
--config_yaml ./starVLA/config/training/starvla_cotrain_oxe.yaml \
--framework.name QwenGR00T \
--framework.qwenvl.base_vlm microsoft/Florence-2-large \
--run_root_dir ${run_root_dir} \
--run_id ${run_id} \
--wandb_project your_project \
--wandb_entity your_name

Note: --framework.qwenvl will be unified in a future release for compatibility reasons.

Yes. Set --num_processes to 1, reduce per_device_batch_size (e.g., to 1-2), and increase gradient_accumulation_steps to compensate. Single-GPU training will be much slower but is fully functional. We recommend starting with a smaller model (e.g., Qwen2.5-VL-3B).

It depends on dataset size, GPU count, and model scale. As a reference:

  • 8xA800 + Qwen2.5-VL-3B + Bridge dataset: ~10-20 hours for 50k steps.
  • 1xRTX 4090 + Qwen2.5-VL-3B + small dataset: may take several days.

We recommend running a quick sanity check with is_debug: true for a few hundred steps first, then starting full training.

StarVLA supports two logging methods (specified in the trackers field of your YAML config):

  • jsonl: Training logs are saved as JSON Lines in a log.jsonl file in the checkpoint directory. You can parse and plot them with scripts.
  • wandb: Real-time online monitoring. Fill in wandb_entity and wandb_project in your config, and metrics (loss curves, learning rates, etc.) will be automatically uploaded to wandb.ai once training starts.

We recommend enabling both: trackers: [jsonl, wandb].