lerobot-lancedb¶
Lance-backed datasets for LeRobot.
Two storage layouts, both subclasses of LeRobotDataset:
- Frames format — per-frame JPEG bytes (
LeRobotLanceDataset). - Video format — per-file mp4 bytes via Lance blob v2 (
LeRobotLanceVideoDataset).
Both readers expose the same API. Pick by source dtype (see Conversion).
Install¶
Until the first PyPI release, install from GitHub:
For local development:
Either path pulls in:
lerobot[dataset]lancedb/pylancetorchcodec(used by the video format)
30-second tour¶
Convert a video-stored dataset to the recommended (bit-exact) layout:
lerobot-convert-to-lance-video \
--repo-id=lerobot/aloha_static_cups_open \
--output=./aloha_cups_open_lance_video \
--overwrite
Use it as a regular LeRobotDataset:
from lerobot_lancedb import LeRobotLanceVideoDataset
ds = LeRobotLanceVideoDataset(root="./aloha_cups_open_lance_video")
Plug it into any code that expects a LeRobotDataset:
- the upstream training factory
EpisodeAwareSampler- third-party trainers that do
isinstance(ds, LeRobotDataset)
Headline benchmark¶
Realistic training read pattern (delta_timestamps, 8 frames per sample, batch_size=32, num_workers=4, CPU decode, H100) on lerobot/aloha_static_cups_open (480×640, 4-cam bimanual):
| format | size MB | fps | speedup | bit-exact? |
|---|---|---|---|---|
| upstream parquet+mp4 | 485.6 | 18.7 | 1.00× | ✓ |
convert_to_lance (JPEG-95) |
3 626 | 46.0 | 2.46× | ✗ |
convert_to_lance --jpeg-quality=100 --jpeg-subsampling=0 |
8 735 | 32.5 | 1.74× | ✗ |
convert_to_lance_video |
487.4 | 45.6 | 2.44× | ✓ |
Full numbers across three datasets (pusht, ALOHA, Koch): Benchmarks.
Next¶
- Conversion — both CLIs + Python API
- Frames format —
LeRobotLanceDatasetreference - Video format —
LeRobotLanceVideoDatasetreference - Examples — training + benchmark scripts
- Benchmarks — size × throughput × accuracy tables