Skip to content

Evaluation Framework

StarVLA standardizes the inference pipeline for real-robot or simulation evaluations by tunneling data through WebSocket (a network protocol that enables bidirectional real-time communication between client and server), enabling new models to be integrated into existing evaluation environments with minimal changes.


The StarVLA framework uses a client-server architecture to separate the evaluation/deployment environment (client) from the policy server (model inference).

  • Policy Server: Loads the model, receives observations, and outputs normalized actions.
  • Client: Interfaces with the simulator or real robot, and post-processes model outputs:
    • Unnormalize: Converts the model’s [-1, 1] normalized actions back to physical quantities (e.g., joint angles).
    • Delta-to-Absolute: If the model outputs incremental actions relative to the current position, adds them to the current state to get absolute target positions.
    • Action Ensemble: The model may predict multiple future steps at once; overlapping predictions from consecutive calls are weighted-averaged for smoother execution.

Policy Server Architecture

ComponentDescription
Sim / Real ControllerExternal to StarVLA: Contains the core loop of the evaluation environment or robot controller, handling observation collection (get_obs()) and action execution (apply_action()).
PolicyClient.py & WebSocket & PolicyServerStandard Communication Flow: Client-side wrapper responsible for data transmission (tunneling) and interfacing the environment with the server.
Framework.pyModel Infer Core: Contains the user-defined model inference function (Framework.predict_action), which is the main logic for generating actions.

Minimal pseudo-code example (evaluation-side client):

# Import path: from deployment.policy_client.policy_client import WebsocketClientPolicy
import WebsocketClientPolicy
client = WebsocketClientPolicy(
host="127.0.0.1",
port=10092
)
while True:
images = capture_multiview() # returns List[np.ndarray]
lang = get_instruction() # may come from task scripts
example = {
"image": images,
"lang": lang,
}
result = client.predict_action(example) # --> forwarded to framework.predict_action
action = result["normalized_actions"][0] # take the first item in the batch
apply_action(action)

For the Model Server, simply launch it with:

#!/bin/bash
export PYTHONPATH=$(pwd):${PYTHONPATH}
# Point to your StarVLA conda Python
# $(which python) automatically picks up the Python from your currently activated conda env
# Make sure you've run `conda activate starVLA` before executing this script
export star_vla_python=$(which python)
your_ckpt=results/Checkpoints/xxx.pt # Replace with your checkpoint path
gpu_id=0
port=5694
# export DEBUG=true
CUDA_VISIBLE_DEVICES=$gpu_id ${star_vla_python} deployment/model_server/server_policy.py \
--ckpt_path ${your_ckpt} \
--port ${port} \
--use_bf16
  • Ensure every field in example is JSON-serializable or convertible (lists, floats, ints, strings); convert custom objects explicitly.
  • Images must be sent as np.ndarray. Perform PIL.Image -> np.ndarray before transmission and convert back on the server using to_pil_preserve (from starVLA.model.utils import to_pil_preserve) if required.
  • Keep auxiliary metadata (episode IDs, timestamps, etc.) in dedicated keys so the framework can forward or log them without collisions.

Policy Interface

The *2model_interface.py interface is designed to wrap and abstract any variations originating from the simulation or real-world environment. It also supports user-defined controllers, such as converting delta actions to absolute joint positions. You can refer to the implementations for different benchmarks in examples to build your own deployment.


Q: Why do examples contain files such as model2{bench}_client.py?

A: They encapsulate benchmark-specific alignment, e.g., action ensembling, converting delta actions to absolute actions, or bridging simulator quirks, so the model server can stay generic.

Q: Why does the model expect PIL images while the transport uses ndarray?

A: WebSocket payloads do not serialize PIL objects directly. Convert to np.ndarray on the client side and restore to PIL inside the framework if the model requires it.

Feedback on environment-specific needs is welcome via issues.