-
lance.dataset(uri: str | Path, version: int | str | None =
None
, asof: ts_types | None =None
, block_size: int | None =None
, commit_lock: CommitLock | None =None
, index_cache_size: int | None =None
, storage_options: dict[str, str] | None =None
, default_scan_options: dict[str, str] | None =None
) LanceDataset Opens the Lance dataset from the address specified.
- Parameters:
- uri : str¶
Address to the Lance dataset. It can be a local file path /tmp/data.lance, or a cloud object store URI, i.e., s3://bucket/data.lance.
- version : optional, int | str¶
If specified, load a specific version of the Lance dataset. Else, loads the latest version. A version number (int) or a tag (str) can be provided.
- asof : optional, datetime or str¶
If specified, find the latest version created on or earlier than the given argument value. If a version is already specified, this arg is ignored.
- block_size : optional, int¶
Block size in bytes. Provide a hint for the size of the minimal I/O request.
- commit_lock : optional, lance.commit.CommitLock¶
A custom commit lock. Only needed if your object store does not support atomic commits. See the user guide for more details.
- index_cache_size : optional, int¶
Index cache size. Index cache is a LRU cache with TTL. This number specifies the number of index pages, for example, IVF partitions, to be cached in the host memory. Default value is
256
.Roughly, for an
IVF_PQ
partition withn
rows, the size of each index page equals the combination of the pq code (nd.array([n,pq], dtype=uint8))
and the row ids (nd.array([n], dtype=uint64)
). Approximately,n = Total Rows / number of IVF partitions
.pq = number of PQ sub-vectors
.- storage_options : optional, dict¶
Extra options that make sense for a particular storage connection. This is used to store connection parameters like credentials, endpoint, etc.
- default_scan_options : optional, dict¶
Default scan options that are used when scanning the dataset. This accepts the same arguments described in
lance.LanceDataset.scanner()
. The arguments will be applied to any scan operation.This can be useful to supply defaults for common parameters such as
batch_size
.It can also be used to create a view of the dataset that includes meta fields such as
_rowid
or_rowaddr
. Ifdefault_scan_options
is provided then the schema returned bylance.LanceDataset.schema()
will include these fields if the appropriate scan options are set.