- class lance.LanceOperation.Delete(lance.LanceOperation.BaseOperation)
Remove fragments or rows from the dataset.
- updated_fragments¶
The fragments that have been updated with new deletion vectors.
- Type:
list[FragmentMetadata]
- deleted_fragment_ids¶
The ids of the fragments that have been deleted entirely. These are the fragments where
LanceFragment.delete()
returned None.- Type:
list[int]
- predicate¶
The original SQL predicate used to select the rows to delete.
- Type:
str
Warning
This is an advanced API for distributed operations. To delete rows from dataset on a single machine, use
lance.LanceDataset.delete()
.Examples
To delete rows from a dataset, call
lance.fragment.LanceFragment.delete()
on each of the fragments. If that returns a new fragment, add that to theupdated_fragments
list. If it returns None, that means the whole fragment was deleted, so add the fragment id to thedeleted_fragment_ids
. Finally, pass the operation to theLanceDataset.commit()
method to complete the deletion operation.>>> import lance >>> import pyarrow as pa >>> table = pa.table({"a": [1, 2], "b": ["a", "b"]}) >>> dataset = lance.write_dataset(table, "example") >>> table = pa.table({"a": [3, 4], "b": ["c", "d"]}) >>> dataset = lance.write_dataset(table, "example", mode="append") >>> dataset.to_table().to_pandas() a b 0 1 a 1 2 b 2 3 c 3 4 d >>> predicate = "a >= 2" >>> updated_fragments = [] >>> deleted_fragment_ids = [] >>> for fragment in dataset.get_fragments(): ... new_fragment = fragment.delete(predicate) ... if new_fragment is not None: ... updated_fragments.append(new_fragment) ... else: ... deleted_fragment_ids.append(fragment.fragment_id) >>> operation = lance.LanceOperation.Delete(updated_fragments, ... deleted_fragment_ids, ... predicate) >>> dataset = lance.LanceDataset.commit("example", operation, ... read_version=dataset.version) >>> dataset.to_table().to_pandas() a b 0 1 a
Public members¶
- Delete(updated_fragments: Iterable[FragmentMetadata], ...)
Initialize self. See help(type(self)) for accurate signature.
- __repr__()
Return repr(self).
- updated_fragments : Iterable[FragmentMetadata]
- deleted_fragment_ids : Iterable[int]
- predicate : str