lance.cuvs package¶

Submodules¶

lance.cuvs.kmeans module¶

class lance.cuvs.kmeans.KMeans(k: int, *, metric: Literal['l2', 'euclidean', 'cosine', 'dot'] = 'l2', init: Literal['random'] = 'random', max_iters: int = 50, tolerance: float = 0.0001, centroids: Tensor | None = None, seed: int | None = None, device: str | None = None, itopk_size: int = 10)¶

Bases: KMeans

K-Means trains over vectors and divide into K clusters, using cuVS as accelerator.

This implement is built on PyTorch+cuVS, supporting Nvidia GPU only.

Parameters:

k (int) – The number of clusters
metric (str) – Metric type, support “l2”, “cosine” or “dot”
init (str) – Initialization method. Only support “random” now.
max_iters (int) – Max number of iterations to train the kmean model.
tolerance (float) – Relative tolerance in regard to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.
centroids (torch.Tensor, optional.) – Provide existing centroids.
seed (int, optional) – Random seed
device (str, optional) – The device to run the PyTorch algorithms. Default we will pick the most performant device on the host. See lance.torch.preferred_device() For the cuVS implementation, it will be verified this is a cuda device.

fit(data: IterableDataset | ndarray | Tensor | FixedSizeListArray) → None¶

Fit - Train the kmeans model.

Parameters:: data (pa.FixedSizeListArray, np.ndarray, or torch.Tensor) – 2-D vectors to train kmeans.

rebuild_index()¶

lance.cuvs package¶

Submodules¶

lance.cuvs.kmeans module¶

Module contents¶