Skip to content

Vector Indices

Lance provides a powerful and extensible secondary index system for efficient vector similarity search. All vector indices are stored as regular Lance files, making them portable and easy to manage. It is designed for efficient similarity search across large-scale vector datasets.

Concepts

Lance splits each vector index into 3 parts - clustering, sub-index and quantization.

Clustering

Clustering divides all the vectors into different disjoint clusters (a.k.a. partitions). Lance currently supports using Inverted File (IVF) as the primary clustering mechanism. IVF partitions the vectors into clusters using the k-means clustering algorithm. Each cluster contains vectors that are similar to the cluster centroid. During search, only the most relevant clusters are examined, dramatically reducing search time. IVF can be combined with any sub-index type and quantization method.

Sub-Index

The sub-index determines how vectors are organized for search. Lance currently supports:

  • FLAT: Exact search with no approximation - scans all vectors
  • HNSW: Hierarchical Navigable Small World graphs for fast approximate search

Quantization

The quantization method determines how vectors are stored and compressed. Lance currently supports:

  • Product Quantization (PQ): Compresses vectors by splitting them into smaller sub-vectors and quantizing each independently
  • Scalar Quantization (SQ): Applies scalar quantization to each dimension of the vector independently
  • RabitQ (RQ): Uses random rotation and binary quantization for extreme compression
  • FLAT: No quantization, keeps original vectors for exact search

Common Combinations

When we refer to an index type, it is typically {clustering}_{sub_index}_{quantization}. If sub-index is just FLAT, we usually omit it and just refer to it by {clustering}_{quantization}. Here are the commonly used combinations:

Index Type Name Description
IVF_PQ Inverted File with Product Quantization Combines IVF clustering with PQ compression for efficient storage and search
IVF_HNSW_SQ Inverted File with HNSW and Scalar Quantization Uses IVF for coarse clustering and HNSW for fine-grained search with scalar quantization
IVF_SQ Inverted File with Scalar Quantization Combines IVF clustering with scalar quantization for balanced compression
IVF_RQ Inverted File with RabitQ Combines IVF clustering with RabitQ for extreme compression using binary quantization
IVF_FLAT Inverted File without quantization Uses IVF clustering with exact vector storage for precise search within clusters

Versioning

The Lance vector index format has gone through 3 versions so far. This document currently only records version 3 which is the latest version. The specific version of the vector index is recorded in the index_version field of the generic index metadata.

Storage Layout (V3)

Each vector index is stored as 2 regular Lance files - index file and auxiliary file.

Index File

The index structure file containing the search graph/structure with index-specific schema. It is stored as a Lance file with name index.idx within the index directory.

Arrow Schema

The index file stores the search structure with graph or flat organization. The Arrow schema of the Lance file varies depending on the sub-index type used.

Note

All partitions are stored in the same file, and partitions must be written in order.

FLAT

FLAT indices perform exact search with no approximation. This is essentially an empty file with a minimal schema:

Column Type Nullable Description
__flat_marker uint64 false Marker field for FLAT index (no actual data)
HNSW

HNSW (Hierarchical Navigable Small World) indices provide fast approximate search through a multi-level graph structure. This stores the HNSW graph with the following schema:

Column Type Nullable Description
__vector_id uint64 false Vector identifier
__neighbors list false Neighbor node IDs
_distance list false Distances to neighbors

Note

HNSW consists of multiple levels, and all levels must be written in order starting from level 0.

Arrow Schema Metadata

The index file contains metadata in its Arrow schema metadata to describe the index configuration and structure. Here are the metadata keys and their corresponding values:

"lance:index"

Contains basic index configuration information in JSON:

JSON Key Type Expected Values
type String Index type (e.g., "IVF_PQ", "IVF_RQ", "IVF_HNSW", "FLAT")
distance_type String Distance metric (e.g., "l2", "cosine", "dot")
"lance:ivf"

References the IVF metadata stored in the Lance file global buffer. This value records the global buffer index, currently this is always "1".

Note

Global buffer indices in Lance files are 1-based, so you need to subtract 1 when accessing them through code.

"lance:flat"

Contains partition-specific metadata for the FLAT sub-index structure. This is an empty string since FLAT indices don't require additional metadata at this moment.

"lance:hnsw"

Contains the HNSW-specific JSON metadata for each partition, including graph structure information:

JSON Key Type Expected Values
entry_point u32 Starting node for graph traversal
params Object HNSW construction parameters (see below)
level_offsets Array Offset for each level in the graph

The params object contains the following HNSW construction parameters:

JSON Key Type Description Default
max_level u16 Maximum level of the HNSW graph 7
m usize Number of connections to establish while inserting new element 20
ef_construction usize Size of the dynamic list for candidates 150
prefetch_distance Option Number of vectors ahead to prefetch while building Some(2)

Lance File Global Buffer

IVF Metadata

For efficiency, Lance serializes IVF metadata to protobuf format and stores it in the Lance file global buffer:

message IVF {
  // Centroids of partitions. `dimension * num_partitions` of float32s.
  //
  // Deprecated, use centroids_tensor instead.
  repeated float centroids = 1;  // [deprecated = true];

  // File offset of each partition.
  repeated uint64 offsets = 2;

  // Number of records in the partition.
  repeated uint32 lengths = 3;

  // Tensor of centroids. `num_partitions * dimension` of float32s.
  Tensor centroids_tensor = 4;

  // KMeans loss.
  optional double loss = 5;

}

Auxiliary File

The auxiliary file is a vector storage for quantized vectors. It is stored as a Lance file named auxiliary.idx within the index directory.

Arrow Schema

Since the auxiliary file stores the actual (quantized) vectors, the Arrow schema of the Lance file varies depending on the quantization method used.

Note

All partitions are stored in the same file, and partitions must be written in order.

FLAT

No quantization applied - stores original vectors in their full precision:

Column Type Nullable Description
_rowid uint64 false Row identifier
flat list[dimension] false Original vector values (list_size = vector dimension)
PQ

Compresses vectors using product quantization for significant memory savings:

Column Type Nullable Description
_rowid uint64 false Row identifier
__pq_code list[m] false PQ codes (list_size = number of subvectors)
SQ

Compresses vectors using scalar quantization for moderate memory savings:

Column Type Nullable Description
_rowid uint64 false Row identifier
__sq_code list[dimension] false SQ codes (list_size = vector dimension)
RQ

Compresses vectors using RabitQ with random rotation and binary quantization for extreme compression:

Column Type Nullable Description
_rowid uint64 false Row identifier
_rabit_codes list[dimension / 8] false Binary quantized codes (1 bit per dimension, packed into bytes)
__add_factors float32 false Additive correction factors for distance computation
__scale_factors float32 false Scale correction factors for distance computation

Arrow Schema Metadata

The auxiliary file also contains metadata in its Arrow schema metadata for vector storage configuration. Here are the metadata keys and their corresponding values:

"distance_type"

The distance metric used to compute similarity between vectors (e.g., "l2", "cosine", "dot").

"lance:ivf"

Similar to the index file's "lance:ivf" but focused on vector storage layout. This doesn't contain the partitions' centroids. It's only used for tracking each partition's offset and length in the auxiliary file.

"lance:rabit"

Contains RabitQ-specific metadata in JSON format (only present for RQ quantization). This includes the rotation matrix position, number of bits, and packing information. See the RQ metadata specification in the "storage_metadata" section below.

"storage_metadata"

Contains quantizer-specific metadata as a list of JSON strings. Currently, the list always contains exactly 1 element with the quantizer metadata.

For Product Quantization (PQ):

JSON Key Type Description
codebook_position usize Position of the codebook in the global buffer
nbits u32 Number of bits per subvector code (e.g., 8 bits = 256 codewords)
num_sub_vectors usize Number of subvectors (m)
dimension usize Original vector dimension
transposed bool Whether the codebook is stored in transposed layout

For Scalar Quantization (SQ):

JSON Key Type Description
dim usize Vector dimension
num_bits u16 Number of bits for quantization
bounds Range Min/max bounds for scalar quantization

For RabitQ (RQ):

JSON Key Type Description
rotate_mat_position u32 Position of the rotation matrix in the global buffer
num_bits u8 Number of bits per dimension (currently always 1)
packed bool Whether codes are packed for optimized computation

Lance File Global Buffer

Quantization Codebook

For product quantization, the codebook is stored in Tensor format in the auxiliary file's global buffer for efficient access:

message Tensor {
  enum DataType {
    BFLOAT16 = 0;
    FLOAT16 = 1;
    FLOAT32 = 2;
    FLOAT64 = 3;
    UINT8 = 4;
    UINT16 = 5;
    UINT32 = 6;
    UINT64 = 7;
  }

  DataType data_type = 1;

  // Data shape, [dim1, dim2, ...]
  repeated uint32 shape = 2;

  // Data buffer
  bytes data = 3;

}
Rotation Matrix

For RabitQ, the rotation matrix is stored in Tensor format in the auxiliary file's global buffer. The rotation matrix is an orthogonal matrix used to rotate vectors before binary quantization:

message Tensor {
  enum DataType {
    BFLOAT16 = 0;
    FLOAT16 = 1;
    FLOAT32 = 2;
    FLOAT64 = 3;
    UINT8 = 4;
    UINT16 = 5;
    UINT32 = 6;
    UINT64 = 7;
  }

  DataType data_type = 1;

  // Data shape, [dim1, dim2, ...]
  repeated uint32 shape = 2;

  // Data buffer
  bytes data = 3;

}

The rotation matrix has shape [code_dim, code_dim] where code_dim = dimension * num_bits.

Appendices

Appendix 1: Example IVF_PQ Format

This example shows how an IVF_PQ index is physically laid out. Assume vectors have dimension 128, PQ uses 16 num_sub_vectors (m=16) with 8 num_bits per subvector, and distance type is "l2".

Index File

  • Arrow Schema Metadata:

    • "lance:index"{ "type": "IVF_PQ", "distance_type": "l2" }
    • "lance:ivf" → "1" (references IVF metadata in the global buffer)
    • "lance:flat"["", "", ...] (one empty string per partition; IVF_PQ uses a FLAT sub-index inside each partition)
  • Lance File Global buffer (Protobuf):

    • Ivf message containing:
      • centroids_tensor: shape [num_partitions, 128] (float32)
      • offsets: start offset (row) of each partition in auxiliary.idx
      • lengths: number of vectors in each partition
      • loss: k-means loss (optional)

Auxiliary File

  • Arrow Schema Metadata:
    • "distance_type""l2"
    • "lance:ivf" → tracks per-partition offsets and lengths (no centroids here)
    • "storage_metadata"[ "{"pq":{"num_sub_vectors":16,"nbits":8,"dimension":128,"transposed":true}}" ]
  • Lance File Global buffer:
    • Tensor codebook with shape [256, num_sub_vectors, dim/num_sub_vectors] = [256, 16, 8] (float32)
  • Rows with Arrow schema:
pa.schema([
    pa.field("_rowid", pa.uint64()),
    pa.field("__pq_code", pa.list(pa.uint8(), list_size=16)), # m subvector codes
])

Appendix 2: Example IVF_RQ Format

This example shows how an IVF_RQ index is physically laid out. Assume vectors have dimension 128, RQ uses 1 bit per dimension (num_bits=1), and distance type is "l2".

Index File

  • Arrow Schema Metadata:

    • "lance:index"{ "type": "IVF_RQ", "distance_type": "l2" }
    • "lance:ivf" → "1" (references IVF metadata in the global buffer)
    • "lance:flat"["", "", ...] (one empty string per partition; IVF_RQ uses a FLAT sub-index inside each partition)
  • Lance File Global buffer (Protobuf):

    • Ivf message containing:
      • centroids_tensor: shape [num_partitions, 128] (float32)
      • offsets: start offset (row) of each partition in auxiliary.idx
      • lengths: number of vectors in each partition
      • loss: k-means loss (optional)

Auxiliary File

  • Arrow Schema Metadata:
    • "distance_type""l2"
    • "lance:ivf" → tracks per-partition offsets and lengths (no centroids here)
    • "lance:rabit""{"rotate_mat_position":1,"num_bits":1,"packed":true}"
  • Lance File Global buffer:
    • Tensor rotation matrix with shape [code_dim, code_dim] = [128, 128] (float32)
  • Rows with Arrow schema:
pa.schema([
    pa.field("_rowid", pa.uint64()),
    pa.field("_rabit_codes", pa.list(pa.uint8(), list_size=16)), # dimension/8 = 128/8 = 16 bytes
    pa.field("__add_factors", pa.float32()),
    pa.field("__scale_factors", pa.float32()),
])

Appendix 3: Accessing Index File with Python

The following example demonstrates how to read and parse different components in the Lance index files using Python:

import pyarrow as pa
import lance

# Open the index file
index_reader = lance.LanceFileReader.read_file("path/to/index.idx")

# Access schema metadata
schema_metadata = index_reader.metadata().schema.metadata

# Get the IVF metadata reference from schema
ivf_ref = schema_metadata.get(b"lance:ivf")  # Returns b"1" for global buffer index

# Read the global buffer containing IVF metadata
if ivf_ref:
    buffer_index = int(ivf_ref) - 1  # Global buffer indices are 1-based
    ivf_buffer = index_reader.global_buffer(buffer_index)

    # Parse the protobuf message (requires lance protobuf definitions)
    # ivf_metadata = parse_ivf_protobuf(ivf_buffer)

# For auxiliary file with PQ codebook
aux_reader = lance.LanceFileReader.read_file("path/to/auxiliary.idx")

# Get storage metadata
storage_metadata = aux_reader.metadata().schema.metadata.get(b"storage_metadata")
if storage_metadata:
    import json
    pq_metadata = json.loads(storage_metadata.decode())[0]  # First element of the list
    pq_params = json.loads(pq_metadata)

    # Access the codebook from global buffer
    codebook_position = pq_params.get("codebook_position", 1)
    if codebook_position > 0:
        codebook_buffer = aux_reader.global_buffer(codebook_position - 1)
        # Parse the tensor protobuf
        # codebook_tensor = parse_tensor_protobuf(codebook_buffer)