Lance ❤️ HuggingFace

The HuggingFace Hub has become the go to place for ML practitioners to find pre-trained models and useful datasets.

HuggingFace datasets can be written directly into Lance format by using the lance.write_dataset() method. You can write the entire dataset or a particular split. For example:

# Huggingface datasets
import datasets
import lance

lance.write_dataset(datasets.load_dataset(
    "poloclub/diffusiondb", split="train[:10]",
), "diffusiondb_train.lance")