-
lance.LanceFragment.merge(data_obj: ReaderLike, left_on: str, right_on: str | None =
None
, schema=None
) tuple[FragmentMetadata, LanceSchema] Merge another dataset into this fragment.
Performs a left join, where the fragment is the left side and data_obj is the right side. Rows existing in the dataset but not on the left will be filled with null values, unless Lance doesn’t support null values for some types, in which case an error will be raised.
- Parameters:
- data_obj : Reader-like¶
The data to be merged. Acceptable types are: - Pandas DataFrame, Pyarrow Table, Dataset, Scanner, Iterator[RecordBatch], or RecordBatchReader
- left_on : str¶
The name of the column in the dataset to join on.
- right_on : str or None¶
The name of the column in data_obj to join on. If None, defaults to left_on.
Examples
>>> import lance >>> import pyarrow as pa >>> df = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']}) >>> dataset = lance.write_dataset(df, "dataset") >>> dataset.to_table().to_pandas() x y 0 1 a 1 2 b 2 3 c >>> fragments = dataset.get_fragments() >>> new_df = pa.table({'x': [1, 2, 3], 'z': ['d', 'e', 'f']}) >>> merged = [] >>> schema = None >>> for f in fragments: ... f, schema = f.merge(new_df, 'x') ... merged.append(f) >>> merge = lance.LanceOperation.Merge(merged, schema) >>> dataset = lance.LanceDataset.commit("dataset", merge, read_version=1) >>> dataset.to_table().to_pandas() x y z 0 1 a d 1 2 b e 2 3 c f
See also
LanceDataset.merge_columns
Add columns to this Fragment.
- Returns:
A new fragment with the merged column(s) and the final schema.
- Return type:
Tuple[FragmentMetadata, LanceSchema]