lance.LanceFragment.merge(data_obj: ReaderLike, left_on: str, right_on: str | None = None, schema=None) tuple[FragmentMetadata, LanceSchema]

Merge another dataset into this fragment.

Performs a left join, where the fragment is the left side and data_obj is the right side. Rows existing in the dataset but not on the left will be filled with null values, unless Lance doesn’t support null values for some types, in which case an error will be raised.

Parameters:
data_obj : Reader-like

The data to be merged. Acceptable types are: - Pandas DataFrame, Pyarrow Table, Dataset, Scanner, Iterator[RecordBatch], or RecordBatchReader

left_on : str

The name of the column in the dataset to join on.

right_on : str or None

The name of the column in data_obj to join on. If None, defaults to left_on.

Examples

>>> import lance
>>> import pyarrow as pa
>>> df = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
>>> dataset = lance.write_dataset(df, "dataset")
>>> dataset.to_table().to_pandas()
   x  y
0  1  a
1  2  b
2  3  c
>>> fragments = dataset.get_fragments()
>>> new_df = pa.table({'x': [1, 2, 3], 'z': ['d', 'e', 'f']})
>>> merged = []
>>> schema = None
>>> for f in fragments:
...     f, schema = f.merge(new_df, 'x')
...     merged.append(f)
>>> merge = lance.LanceOperation.Merge(merged, schema)
>>> dataset = lance.LanceDataset.commit("dataset", merge, read_version=1)
>>> dataset.to_table().to_pandas()
   x  y  z
0  1  a  d
1  2  b  e
2  3  c  f

See also

LanceDataset.merge_columns

Add columns to this Fragment.

Returns:

A new fragment with the merged column(s) and the final schema.

Return type:

Tuple[FragmentMetadata, LanceSchema]