lance.write_dataset(data_obj: ReaderLike, uri: str | Path | LanceDataset, schema: pa.Schema | None = None, mode: str = 'create', *, max_rows_per_file: int = 1048576, max_rows_per_group: int = 1024, max_bytes_per_file: int = 96636764160, commit_lock: CommitLock | None = None, progress: FragmentWriteProgress | None = None, storage_options: dict[str, str] | None = None, data_storage_version: str | None = None, use_legacy_format: bool | None = None, enable_v2_manifest_paths: bool = False, enable_move_stable_row_ids: bool = False) LanceDataset

Write a given data_obj to the given uri

Parameters:
data_obj : Reader-like

The data to be written. Acceptable types are: - Pandas DataFrame, Pyarrow Table, Dataset, Scanner, or RecordBatchReader - Huggingface dataset

uri : str, Path, or LanceDataset

Where to write the dataset to (directory). If a LanceDataset is passed, the session will be reused.

schema : Schema, optional

If specified and the input is a pandas DataFrame, use this schema instead of the default pandas to arrow table conversion.

mode : str

create - create a new dataset (raises if uri already exists). overwrite - create a new snapshot version append - create a new version that is the concat of the input the latest version (raises if uri does not exist)

max_rows_per_file : int, default 1024 * 1024

The max number of rows to write before starting a new file

max_rows_per_group : int, default 1024

The max number of rows before starting a new group (in the same file)

max_bytes_per_file : int, default 90 * 1024 * 1024 * 1024

The max number of bytes to write before starting a new file. This is a soft limit. This limit is checked after each group is written, which means larger groups may cause this to be overshot meaningfully. This defaults to 90 GB, since we have a hard limit of 100 GB per file on object stores.

commit_lock : CommitLock, optional

A custom commit lock. Only needed if your object store does not support atomic commits. See the user guide for more details.

progress : FragmentWriteProgress, optional

Experimental API. Progress tracking for writing the fragment. Pass a custom class that defines hooks to be called when each fragment is starting to write and finishing writing.

storage_options : optional, dict

Extra options that make sense for a particular storage connection. This is used to store connection parameters like credentials, endpoint, etc.

data_storage_version : optional, str, default None

The version of the data storage format to use. Newer versions are more efficient but require newer versions of lance to read. The default (None) will use the latest stable version. See the user guide for more details.

use_legacy_format : optional, bool, default None

Deprecated method for setting the data storage version. Use the data_storage_version parameter instead.

enable_v2_manifest_paths : bool, optional

If True, and this is a new dataset, uses the new V2 manifest paths. These paths provide more efficient opening of datasets with many versions on object stores. This parameter has no effect if the dataset already exists. To migrate an existing dataset, instead use the LanceDataset.migrate_manifest_paths_v2() method. Default is False.

enable_move_stable_row_ids : bool, optional

Experimental parameter: if set to true, the writer will use move-stable row ids. These row ids are stable after compaction operations, but not after updates. This makes compaction more efficient, since with stable row ids no secondary indices need to be updated to point to new row ids.