Skip to content

StarVLA Documentation

Build, train, and evaluate Vision-Language-Action models with a modular, extensible codebase.

StarVLA is a modular and flexible codebase for developing Vision-Language Models (VLMs) into Vision-Language-Action (VLA) models. Each component (model, data, trainer, configuration, evaluation) is designed for high cohesion and low coupling, enabling plug-and-play research and fast iteration.

🚀 Quick Start

Environment setup, quick checks, evaluation, and training workflows.

Get Started →

📖 Project Overview

What StarVLA is, current capabilities, and key links.

Read Overview →

🧩 Lego-like Design

The modular design principles behind StarVLA.

View Design →

📚 FAQ

Common questions about configs, backbones, and training.

Read FAQ →