Skip to content

Frames format (LeRobotLanceDataset)

One row per frame. Each row holds tabular fields (state, action, timestamps, episode / frame indices) plus the JPEG-encoded image bytes.

At read time the bytes are decoded with:

  • torchvision (NVJPEG when CUDA is available — decode_device="auto").
  • PIL as a fallback if torchvision can't be imported.

Schema

<table>.lance/
  episode_index:     int32
  frame_index:       int32
  index:             int64
  timestamp:         float32
  task_index:        int32
  observation.image: binary             # JPEG bytes
  observation.state: list<float32>[D]
  action:            list<float32>[A]
  ...other tabular features

No separate videos table — every frame stores its own JPEG.

Reader

from lerobot_lancedb import LeRobotLanceDataset

ds = LeRobotLanceDataset(root="./pusht_lance")
sample = ds[0]
# sample has: episode_index, frame_index, timestamp,
#             observation.image (C,H,W), observation.state, action, ...

The dataset subclasses LeRobotDataset, so it plugs into:

  • the upstream training factory
  • EpisodeAwareSampler
  • any code that does isinstance(ds, LeRobotDataset)

Sources

Three constructor entry points:

  • Local directory
    LeRobotLanceDataset(root="./pusht_lance")
    
  • Hugging Face Hub (uses your HF_TOKEN if set):
    LeRobotLanceDataset(repo_id="me/pusht_lance")
    
  • Cloud URI (S3 / GCS / HF Buckets):
    LeRobotLanceDataset(
        uri="s3://bucket/path/pusht.lance",
        meta_root="./pusht_lance",   # local meta/ sidecar
    )
    

GPU NVJPEG decode

decode_device picks where the JPEG decode happens:

  • "auto" (default) — "cuda" if available, else "cpu".
  • "cuda" — explicit GPU decode. NVJPEG is typically ~10× faster than libjpeg-turbo and tensors land on the GPU directly (no H2D copy).
  • "cpu" — explicit CPU decode. Useful for apples-to-apples comparisons or when GPU memory is tight.
LeRobotLanceDataset(root="./pusht_lance", decode_device="cuda")
LeRobotLanceDataset(root="./pusht_lance", decode_device="cpu")

delta_timestamps

Same API as upstream. Multiple timestamps per camera are batched into a single decode:

ds = LeRobotLanceDataset(
    root="./pusht_lance",
    delta_timestamps={
        "observation.image": [-0.1, -0.05, 0.0],
        "observation.state": [-0.1, -0.05, 0.0],
        "action":            [0.0, 0.05, 0.1, 0.15, 0.2],
    },
)

Quality knobs at conversion time

JPEG is lossy. Two knobs let you trade size for fidelity:

  • --jpeg-quality (default 95). Higher = larger files, fewer artifacts.
  • --jpeg-subsampling (default 2 = 4:2:0). Set to 0 for 4:4:4 chroma (no subsampling, near-lossless when combined with --jpeg-quality=100).

See Conversion for the CLI flags, and Benchmarks for what each setting costs and buys.

If you need bit-exact pixels on a dtype=video source, prefer the video format — JPEG quality settings get close, but the video format actually matches upstream bit-for-bit.

Cloud auth

The reader picks up credentials from the standard environment:

  • S3AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
  • GCSGOOGLE_APPLICATION_CREDENTIALS
  • HF HubHF_TOKEN (or huggingface-cli login)

Lance does byte-range fetches on demand — no full-dataset download.

Spawn-mode workers

Lance forces multiprocessing.set_start_method("spawn") on import (necessary for safe fork-mode behavior).

What this means in practice:

  • Launch your training script from a real file, not python -c or a REPL.
  • DataLoader num_workers > 0 with persistent_workers=True works as expected.