Random Access

Lance stands out with its super fast random access, unlike other table or file formats.

LanceDataset.take(indices: List[int] | Array, columns: List[str] | Dict[str, str] | None = None) Table

Select rows of data by index.

Parameters:
  • indices (Array or array-like) – indices of rows to select in the dataset.

  • columns (list of str, or dict of str to str default None) – List of column names to be fetched. Or a dictionary of column names to SQL expressions. All columns are fetched if None or unspecified.

Returns:

table

Return type:

pyarrow.Table

LanceDataset.take_blobs(blob_column: str, ids: List[int] | Array | None = None, addresses: List[int] | Array | None = None, indices: List[int] | Array | None = None) List[BlobFile]

Select blobs by row IDs.

Instead of loading large binary blob data into memory before processing it, this API allows you to open binary blob data as a regular Python file-like object. For more details, see lance.BlobFile.

Exactly one of ids, addresses, or indices must be specified. :param blob_column: The name of the blob column to select. :type blob_column: str :param ids: row IDs to select in the dataset. :type ids: Integer Array or array-like :param addresses: The (unstable) row addresses to select in the dataset. :type addresses: Integer Array or array-like :param indices: The offset / indices of the row in the dataset. :type indices: Integer Array or array-like

Returns:

blob_files

Return type:

List[BlobFile]