lance.LanceOperation.Merge - Lance documentation

class lance.LanceOperation.Merge(lance.LanceOperation.BaseOperation)

Operation that adds columns. Unlike Overwrite, this should not change the structure of the fragments, allowing existing indices to be kept.

fragments¶

The fragments that make up the new dataset.

Type:: iterable of FragmentMetadata

schema¶

The schema of the new dataset. Passing a LanceSchema is preferred, and passing a pyarrow.Schema is deprecated.

Type:: LanceSchema or pyarrow.Schema

Warning

This is an advanced API for distributed operations. To overwrite or create new dataset on a single machine, use lance.write_dataset().

Examples

To add new columns to a dataset, first define a method that will create the new columns based on the existing columns. Then use lance.fragment.LanceFragment.add_columns()

>>> import lance
>>> import pyarrow as pa
>>> import pyarrow.compute as pc
>>> table = pa.table({"a": [1, 2, 3, 4], "b": ["a", "b", "c", "d"]})
>>> dataset = lance.write_dataset(table, "example")
>>> dataset.to_table().to_pandas()
   a  b
0  1  a
1  2  b
2  3  c
3  4  d
>>> def double_a(batch: pa.RecordBatch) -> pa.RecordBatch:
...     doubled = pc.multiply(batch["a"], 2)
...     return pa.record_batch([doubled], ["a_doubled"])
>>> fragments = []
>>> for fragment in dataset.get_fragments():
...     new_fragment, new_schema = fragment.merge_columns(double_a,
...                                                       columns=['a'])
...     fragments.append(new_fragment)
>>> operation = lance.LanceOperation.Merge(fragments, new_schema)
>>> dataset = lance.LanceDataset.commit("example", operation,
...                                     read_version=dataset.version)
>>> dataset.to_table().to_pandas()
   a  b  a_doubled
0  1  a          2
1  2  b          4
2  3  c          6
3  4  d          8

Public members¶

Merge(fragments: Iterable[FragmentMetadata], schema): Initialize self. See help(type(self)) for accurate signature.

__repr__(): Return repr(self).

__eq__(other): Return self==value.

fragments : Iterable[FragmentMetadata]

schema : LanceSchema | Schema