lance.LanceDataset.alter_columns - Lance documentation

lance.LanceDataset.alter_columns(*alterations: Iterable[AlterColumn])

Alter column name, data type, and nullability.

Columns that are renamed can keep any indices that are on them. If a column has an IVF_PQ index, it can be kept if the column is casted to another type. However, other index types don’t support casting at this time.

Column types can be upcasted (such as int32 to int64) or downcasted (such as int64 to int32). However, downcasting will fail if there are any values that cannot be represented in the new type. In general, columns can be casted to same general type: integers to integers, floats to floats, and strings to strings. However, strings, binary, and list columns can be casted between their size variants. For example, string to large string, binary to large binary, and list to large list.

Columns that are renamed can keep any indices that are on them. However, if the column is casted to a different type, its indices will be dropped.

Parameters:

alterations : Iterable[Dict[str, Any]]¶

A sequence of dictionaries, each with the following keys:

”path”: str
The column path to alter. For a top-level column, this is the name. For a nested column, this is the dot-separated path, e.g. “a.b.c”.
”name”: str, optional
The new name of the column. If not specified, the column name is not changed.
”nullable”: bool, optional
Whether the column should be nullable. If not specified, the column nullability is not changed. Only non-nullable columns can be changed to nullable. Currently, you cannot change a nullable column to non-nullable.
”data_type”: pyarrow.DataType, optional
The new data type to cast the column to. If not specified, the column data type is not changed.

Examples

>>> import lance
>>> import pyarrow as pa
>>> schema = pa.schema([pa.field('a', pa.int64()),
...                     pa.field('b', pa.string(), nullable=False)])
>>> table = pa.table({"a": [1, 2, 3], "b": ["a", "b", "c"]})
>>> dataset = lance.write_dataset(table, "example")
>>> dataset.alter_columns({"path": "a", "name": "x"},
...                       {"path": "b", "nullable": True})
>>> dataset.to_table().to_pandas()
   x  b
0  1  a
1  2  b
2  3  c
>>> dataset.alter_columns({"path": "x", "data_type": pa.int32()})
>>> dataset.schema
x: int32
b: string