Skip to content

Writing Lance Datasets

Basic Writing

(df.write
    .format("lance")
    .option("dataset_uri", "/path/to/lance/database/my_dataset")
    .save())
df.write.
    format("lance").
    option("dataset_uri", "/path/to/lance/database/my_dataset").
    save()
df.write()
    .format("lance")
    .option("dataset_uri", "/path/to/lance/database/my_dataset")
    .save();

Alternatively, you can specify the path directly in the save() method:

(df.write
    .format("lance")
    .save("/path/to/lance/database/my_dataset"))
df.write.
    format("lance").
    save("/path/to/lance/database/my_dataset")
df.write()
    .format("lance")
    .save("/path/to/lance/database/my_dataset");

Write Modes

Create

By default, writing to a dataset at a specific path means creating the dataset:

# First write - succeeds
(testData.write
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save())

# Second write - throws TableAlreadyExistsException
(testData.write
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save())
// First write - succeeds
testData.write.
    format("lance").
    option("dataset_uri", "/path/to/database/users").
    save()

// Second write - throws TableAlreadyExistsException
testData.write.
    format("lance").
    option("dataset_uri", "/path/to/database/users").
    save()
// First write - succeeds
testData.write()
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save();

// Second write - throws TableAlreadyExistsException
testData.write()
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save();

Append

Add new data to an existing dataset:

# Create initial dataset
(testData.write
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save())

# Append more data
(moreData.write
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .mode("append")
    .save())
// Create initial dataset
testData.write.
    format("lance").
    option("dataset_uri", "/path/to/database/users").
    save()

// Append more data
moreData.write.
    format("lance").
    option("dataset_uri", "/path/to/database/users").
    mode("append").
    save()
// Create initial dataset
testData.write()
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save();

// Append more data
moreData.write()
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .mode("append")
    .save();

Overwrite

Replace the entire dataset with new data:

# Create initial dataset
(initialData.write
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save())

# Completely replace the dataset
(newData.write
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .mode("overwrite")
    .save())
// Create initial dataset
initialData.write.
    format("lance").
    option("dataset_uri", "/path/to/database/users").
    save()

// Completely replace the dataset
newData.write.
    format("lance").
    option("dataset_uri", "/path/to/database/users").
    mode("overwrite").
    save()
// Create initial dataset
initialData.write()
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .save();

// Completely replace the dataset
newData.write()
    .format("lance")
    .option("dataset_uri", "/path/to/database/users")
    .mode("overwrite")
    .save();