-
static lance.LanceDataset.commit_batch(dest: str | Path | LanceDataset, transactions: collections.abc.Sequence[Transaction], commit_lock: CommitLock | None =
None
, storage_options: dict[str, str] | None =None
, enable_v2_manifest_paths: bool | None =None
, detached: bool | None =False
, max_retries: int =20
) BulkCommitResult Create a new version of dataset with multiple transactions.
This method is an advanced method which allows users to describe a change that has been made to the data files. This method is not needed when using Lance to apply changes (e.g. when using
LanceDataset
orwrite_dataset()
.)- Parameters:
- dest : str, Path, or LanceDataset¶
The base uri of the dataset, or the dataset object itself. Using the dataset object can be more efficient because it can re-use the file metadata cache.
- transactions : Iterable[Transaction]¶
The transactions to apply to the dataset. These will be merged into a single transaction and applied to the dataset. Note: Only append transactions are currently supported. Other transaction types will be supported in the future.
- commit_lock : CommitLock, optional¶
A custom commit lock. Only needed if your object store does not support atomic commits. See the user guide for more details.
- storage_options : optional, dict¶
Extra options that make sense for a particular storage connection. This is used to store connection parameters like credentials, endpoint, etc.
- enable_v2_manifest_paths : bool, optional¶
If True, and this is a new dataset, uses the new V2 manifest paths. These paths provide more efficient opening of datasets with many versions on object stores. This parameter has no effect if the dataset already exists. To migrate an existing dataset, instead use the
migrate_manifest_paths_v2()
method. Default is False. WARNING: turning this on will make the dataset unreadable for older versions of Lance (prior to 0.17.0).- detached : bool, optional¶
If True, then the commit will not be part of the dataset lineage. It will never show up as the latest dataset and the only way to check it out in the future will be to specifically check it out by version. The version will be a random version that is only unique amongst detached commits. The caller should store this somewhere as there will be no other way to obtain it in the future.
- max_retries : int¶
The maximum number of retries to perform when committing the dataset.
- Returns:
- dataset: LanceDataset
A new version of Lance Dataset.
- merged: Transaction
The merged transaction that was applied to the dataset.
- Return type:
dict with keys