-
lance.LanceDataset.to_table(columns: list[str] | dict[str, str] | None =
None
, filter: str | Expression | None =None
, limit: int | None =None
, offset: int | None =None
, nearest: dict | None =None
, batch_size: int | None =None
, batch_readahead: int | None =None
, fragment_readahead: int | None =None
, scan_in_order: bool | None =None
, *, prefilter: bool | None =None
, with_row_id: bool | None =None
, with_row_address: bool | None =None
, use_stats: bool | None =None
, fast_search: bool | None =None
, full_text_query: str | dict | None =None
, io_buffer_size: int | None =None
, late_materialization: bool | list[str] | None =None
, use_scalar_index: bool | None =None
, include_deleted_rows: bool | None =None
) Table Read the data into memory as a
pyarrow.Table
- Parameters:
- columns : list of str, or dict of str to str default None¶
List of column names to be fetched. Or a dictionary of column names to SQL expressions. All columns are fetched if None or unspecified.
- filter : pa.compute.Expression or str¶
Expression or str that is a valid SQL where clause. See Lance filter pushdown for valid SQL expressions.
- limit : int, default None¶
Fetch up to this many rows. All rows if None or unspecified.
- offset : int, default None¶
Fetch starting with this row. 0 if None or unspecified.
- nearest : dict, default None¶
Get the rows corresponding to the K most similar vectors. Example:
{ "column": <embedding col name>, "q": <query vector as pa.Float32Array>, "k": 10, "metric": "cosine", "nprobes": 1, "refine_factor": 1 }
- batch_size : int, optional¶
The number of rows to read at a time.
- io_buffer_size : int, default None¶
The size of the IO buffer. See
ScannerBuilder.io_buffer_size
for more information.- batch_readahead : int, optional¶
The number of batches to read ahead.
- fragment_readahead : int, optional¶
The number of fragments to read ahead.
- scan_in_order : bool, optional, default True¶
Whether to read the fragments and batches in order. If false, throughput may be higher, but batches will be returned out of order and memory use might increase.
- prefilter : bool, optional, default False¶
Run filter before the vector search.
- late_materialization : bool or List[str], default None¶
Allows custom control over late materialization. See
ScannerBuilder.late_materialization
for more information.- use_scalar_index : bool, default True¶
Allows custom control over scalar index usage. See
ScannerBuilder.use_scalar_index
for more information.- with_row_id : bool, optional, default False¶
Return row ID.
- with_row_address : bool, optional, default False¶
Return row address
- use_stats : bool, optional, default True¶
Use stats pushdown during filters.
- fast_search : bool, optional, default False¶
- full_text_query : str or dict, optional¶
query string to search for, the results will be ranked by BM25. e.g. “hello world”, would match documents contains “hello” or “world”. or a dictionary with the following keys:
- columns: list[str]
The columns to search, currently only supports a single column in the columns list.
- query: str
The query string to search for.
- include_deleted_rows : bool, optional, default False¶
If True, then rows that have been deleted, but are still present in the fragment, will be returned. These rows will have the _rowid column set to null. All other columns will reflect the value stored on disk and may not be null.
Note: if this is a search operation, or a take operation (including scalar indexed scans) then deleted rows cannot be returned.
Notes
If BOTH filter and nearest is specified, then:
nearest is executed first.
The results are filtered afterward, unless pre-filter sets to True.