-
lance.batch_udf(output_schema=
None
, checkpoint_file=None
) Create a user defined function (UDF) that adds columns to a dataset.
This function is used to add columns to a dataset. It takes a function that takes a single argument, a RecordBatch, and returns a RecordBatch. The function is called once for each batch in the dataset. The function should not modify the input batch, but instead create a new batch with the new columns added.
- Parameters:
- output_schema : Schema, optional¶
The schema of the output RecordBatch. This is used to validate the output of the function. If not provided, the schema of the first output RecordBatch will be used.
- checkpoint_file : str or Path, optional¶
If specified, this file will be used as a cache for unsaved results of this UDF. If the process fails, and you call add_columns again with this same file, it will resume from the last saved state. This is useful for long running processes that may fail and need to be resumed. This file may get very large. It will hold up to an entire data files’ worth of results on disk, which can be multiple gigabytes of data.
- Return type:
AddColumnsUDF