lance.LanceDataset.to_table(columns: list[str] | dict[str, str] | None = None, filter: str | Expression | None = None, limit: int | None = None, offset: int | None = None, nearest: dict | None = None, batch_size: int | None = None, batch_readahead: int | None = None, fragment_readahead: int | None = None, scan_in_order: bool | None = None, *, prefilter: bool | None = None, with_row_id: bool | None = None, with_row_address: bool | None = None, use_stats: bool | None = None, fast_search: bool | None = None, full_text_query: str | dict | None = None, io_buffer_size: int | None = None, late_materialization: bool | list[str] | None = None, use_scalar_index: bool | None = None, include_deleted_rows: bool | None = None) Table

Read the data into memory as a pyarrow.Table

Parameters:
columns : list of str, or dict of str to str default None

List of column names to be fetched. Or a dictionary of column names to SQL expressions. All columns are fetched if None or unspecified.

filter : pa.compute.Expression or str

Expression or str that is a valid SQL where clause. See Lance filter pushdown for valid SQL expressions.

limit : int, default None

Fetch up to this many rows. All rows if None or unspecified.

offset : int, default None

Fetch starting with this row. 0 if None or unspecified.

nearest : dict, default None

Get the rows corresponding to the K most similar vectors. Example:

{
    "column": <embedding col name>,
    "q": <query vector as pa.Float32Array>,
    "k": 10,
    "metric": "cosine",
    "nprobes": 1,
    "refine_factor": 1
}

batch_size : int, optional

The number of rows to read at a time.

io_buffer_size : int, default None

The size of the IO buffer. See ScannerBuilder.io_buffer_size for more information.

batch_readahead : int, optional

The number of batches to read ahead.

fragment_readahead : int, optional

The number of fragments to read ahead.

scan_in_order : bool, optional, default True

Whether to read the fragments and batches in order. If false, throughput may be higher, but batches will be returned out of order and memory use might increase.

prefilter : bool, optional, default False

Run filter before the vector search.

late_materialization : bool or List[str], default None

Allows custom control over late materialization. See ScannerBuilder.late_materialization for more information.

use_scalar_index : bool, default True

Allows custom control over scalar index usage. See ScannerBuilder.use_scalar_index for more information.

with_row_id : bool, optional, default False

Return row ID.

with_row_address : bool, optional, default False

Return row address

use_stats : bool, optional, default True

Use stats pushdown during filters.

full_text_query : str or dict, optional

query string to search for, the results will be ranked by BM25. e.g. “hello world”, would match documents contains “hello” or “world”. or a dictionary with the following keys:

  • columns: list[str]

    The columns to search, currently only supports a single column in the columns list.

  • query: str

    The query string to search for.

include_deleted_rows : bool, optional, default False

If True, then rows that have been deleted, but are still present in the fragment, will be returned. These rows will have the _rowid column set to null. All other columns will reflect the value stored on disk and may not be null.

Note: if this is a search operation, or a take operation (including scalar indexed scans) then deleted rows cannot be returned.

Notes

If BOTH filter and nearest is specified, then:

  1. nearest is executed first.

  2. The results are filtered afterward, unless pre-filter sets to True.