lance.cuvs package

Submodules

lance.cuvs.kmeans module

class lance.cuvs.kmeans.KMeans(k: int, *, metric: Literal['l2', 'euclidean', 'cosine', 'dot'] = 'l2', init: Literal['random'] = 'random', max_iters: int = 50, tolerance: float = 0.0001, centroids: Tensor | None = None, seed: int | None = None, device: str | None = None, itopk_size: int = 10)

Bases: KMeans

K-Means trains over vectors and divide into K clusters, using cuVS as accelerator.

This implement is built on PyTorch+cuVS, supporting Nvidia GPU only.

Parameters:
  • k (int) – The number of clusters

  • metric (str) – Metric type, support “l2”, “cosine” or “dot”

  • init (str) – Initialization method. Only support “random” now.

  • max_iters (int) – Max number of iterations to train the kmean model.

  • tolerance (float) – Relative tolerance in regard to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.

  • centroids (torch.Tensor, optional.) – Provide existing centroids.

  • seed (int, optional) – Random seed

  • device (str, optional) – The device to run the PyTorch algorithms. Default we will pick the most performant device on the host. See lance.torch.preferred_device() For the cuVS implementation, it will be verified this is a cuda device.

fit(data: IterableDataset | ndarray | Tensor | FixedSizeListArray) None

Fit - Train the kmeans model.

Parameters:

data (pa.FixedSizeListArray, np.ndarray, or torch.Tensor) – 2-D vectors to train kmeans.

rebuild_index()

Module contents