lance.LanceDataset.to_table - Lance documentation

Read the data into memory as a pyarrow.Table

Parameters:

columns : list of str, or dict of str to str default None¶

List of column names to be fetched. Or a dictionary of column names to SQL expressions. All columns are fetched if None or unspecified.

filter : pa.compute.Expression or str¶

Expression or str that is a valid SQL where clause. See Lance filter pushdown for valid SQL expressions.

limit : int, default None¶

Fetch up to this many rows. All rows if None or unspecified.

offset : int, default None¶

Fetch starting with this row. 0 if None or unspecified.

nearest : dict, default None¶

Get the rows corresponding to the K most similar vectors. Example:

{
    "column": <embedding col name>,
    "q": <query vector as pa.Float32Array>,
    "k": 10,
    "metric": "cosine",
    "minimum_nprobes": 20,
    "maximum_nprobes": 50,
    "refine_factor": 1
}

batch_size : int, optional¶

The number of rows to read at a time.

io_buffer_size : int, default None¶

The size of the IO buffer. See ScannerBuilder.io_buffer_size for more information.

batch_readahead : int, optional¶

The number of batches to read ahead.

fragment_readahead : int, optional¶

The number of fragments to read ahead.

scan_in_order : bool, optional, default True¶

Whether to read the fragments and batches in order. If false, throughput may be higher, but batches will be returned out of order and memory use might increase.

prefilter : bool, optional, default False¶

Run filter before the vector search.

late_materialization : bool or List[str], default None¶

Allows custom control over late materialization. See ScannerBuilder.late_materialization for more information.

use_scalar_index : bool, default True¶

Allows custom control over scalar index usage. See ScannerBuilder.use_scalar_index for more information.

with_row_id : bool, optional, default False¶

Return row ID.

with_row_address : bool, optional, default False¶

Return row address

use_stats : bool, optional, default True¶

Use stats pushdown during filters.

fast_search : bool, optional, default False¶

full_text_query : str or dict, optional¶

query string to search for, the results will be ranked by BM25. e.g. “hello world”, would match documents contains “hello” or “world”. or a dictionary with the following keys:

columns: list[str]
The columns to search, currently only supports a single column in the columns list.
query: str
The query string to search for.

include_deleted_rows : bool, optional, default False¶

If True, then rows that have been deleted, but are still present in the fragment, will be returned. These rows will have the _rowid column set to null. All other columns will reflect the value stored on disk and may not be null.

Note: if this is a search operation, or a take operation (including scalar indexed scans) then deleted rows cannot be returned.

Notes

If BOTH filter and nearest is specified, then:

nearest is executed first.
The results are filtered afterward, unless pre-filter sets to True.