lance.LanceDataset.alter_columns(*alterations: Iterable[AlterColumn])

Alter column name, data type, and nullability.

Columns that are renamed can keep any indices that are on them. If a column has an IVF_PQ index, it can be kept if the column is casted to another type. However, other index types don’t support casting at this time.

Column types can be upcasted (such as int32 to int64) or downcasted (such as int64 to int32). However, downcasting will fail if there are any values that cannot be represented in the new type. In general, columns can be casted to same general type: integers to integers, floats to floats, and strings to strings. However, strings, binary, and list columns can be casted between their size variants. For example, string to large string, binary to large binary, and list to large list.

Columns that are renamed can keep any indices that are on them. However, if the column is casted to a different type, it’s indices will be dropped.

Parameters:
alterations : Iterable[Dict[str, Any]]

A sequence of dictionaries, each with the following keys:

  • ”path”: str

    The column path to alter. For a top-level column, this is the name. For a nested column, this is the dot-separated path, e.g. “a.b.c”.

  • ”name”: str, optional

    The new name of the column. If not specified, the column name is not changed.

  • ”nullable”: bool, optional

    Whether the column should be nullable. If not specified, the column nullability is not changed. Only non-nullable columns can be changed to nullable. Currently, you cannot change a nullable column to non-nullable.

  • ”data_type”: pyarrow.DataType, optional

    The new data type to cast the column to. If not specified, the column data type is not changed.

Examples

>>> import lance
>>> import pyarrow as pa
>>> schema = pa.schema([pa.field('a', pa.int64()),
...                     pa.field('b', pa.string(), nullable=False)])
>>> table = pa.table({"a": [1, 2, 3], "b": ["a", "b", "c"]})
>>> dataset = lance.write_dataset(table, "example")
>>> dataset.alter_columns({"path": "a", "name": "x"},
...                       {"path": "b", "nullable": True})
>>> dataset.to_table().to_pandas()
   x  b
0  1  a
1  2  b
2  3  c
>>> dataset.alter_columns({"path": "x", "data_type": pa.int32()})
>>> dataset.schema
x: int32
b: string