Frames format (LeRobotLanceDataset)¶
One row per frame. Each row holds tabular fields (state, action, timestamps, episode / frame indices) plus the JPEG-encoded image bytes.
At read time the bytes are decoded with:
- torchvision (NVJPEG when CUDA is available —
decode_device="auto"). - PIL as a fallback if torchvision can't be imported.
Schema¶
<table>.lance/
episode_index: int32
frame_index: int32
index: int64
timestamp: float32
task_index: int32
observation.image: binary # JPEG bytes
observation.state: list<float32>[D]
action: list<float32>[A]
...other tabular features
No separate videos table — every frame stores its own JPEG.
Reader¶
from lerobot_lancedb import LeRobotLanceDataset
ds = LeRobotLanceDataset(root="./pusht_lance")
sample = ds[0]
# sample has: episode_index, frame_index, timestamp,
# observation.image (C,H,W), observation.state, action, ...
The dataset subclasses LeRobotDataset, so it plugs into:
- the upstream training factory
EpisodeAwareSampler- any code that does
isinstance(ds, LeRobotDataset)
Sources¶
Three constructor entry points:
- Local directory
- Hugging Face Hub (uses your
HF_TOKENif set): - Cloud URI (S3 / GCS / HF Buckets):
GPU NVJPEG decode¶
decode_device picks where the JPEG decode happens:
"auto"(default) —"cuda"if available, else"cpu"."cuda"— explicit GPU decode. NVJPEG is typically ~10× faster than libjpeg-turbo and tensors land on the GPU directly (no H2D copy)."cpu"— explicit CPU decode. Useful for apples-to-apples comparisons or when GPU memory is tight.
LeRobotLanceDataset(root="./pusht_lance", decode_device="cuda")
LeRobotLanceDataset(root="./pusht_lance", decode_device="cpu")
delta_timestamps¶
Same API as upstream. Multiple timestamps per camera are batched into a single decode:
ds = LeRobotLanceDataset(
root="./pusht_lance",
delta_timestamps={
"observation.image": [-0.1, -0.05, 0.0],
"observation.state": [-0.1, -0.05, 0.0],
"action": [0.0, 0.05, 0.1, 0.15, 0.2],
},
)
Quality knobs at conversion time¶
JPEG is lossy. Two knobs let you trade size for fidelity:
--jpeg-quality(default 95). Higher = larger files, fewer artifacts.--jpeg-subsampling(default 2 = 4:2:0). Set to 0 for 4:4:4 chroma (no subsampling, near-lossless when combined with--jpeg-quality=100).
See Conversion for the CLI flags, and Benchmarks for what each setting costs and buys.
If you need bit-exact pixels on a dtype=video source, prefer the video format — JPEG quality settings get close, but the video format actually matches upstream bit-for-bit.
Cloud auth¶
The reader picks up credentials from the standard environment:
- S3 —
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION - GCS —
GOOGLE_APPLICATION_CREDENTIALS - HF Hub —
HF_TOKEN(orhuggingface-cli login)
Lance does byte-range fetches on demand — no full-dataset download.
Spawn-mode workers¶
Lance forces multiprocessing.set_start_method("spawn") on import (necessary for safe fork-mode behavior).
What this means in practice:
- Launch your training script from a real file, not
python -cor a REPL. - DataLoader
num_workers > 0withpersistent_workers=Trueworks as expected.