Skip to content

Quick start

LanceDB can be run in a number of ways:

  • Embedded within an existing backend (like your Django, Flask, Node.js or FastAPI application)
  • Directly from a client application like a Jupyter notebook for analytical workloads
  • Deployed as a remote serverless database

Installation

pip install lancedb
npm install @lancedb/lancedb

Bundling @lancedb/lancedb apps with Webpack

Since LanceDB contains a prebuilt Node binary, you must configure next.config.js to exclude it from webpack. This is required for both using Next.js and deploying a LanceDB app on Vercel.

/** @type {import('next').NextConfig} */
module.exports = ({
webpack(config) {
    config.externals.push({ '@lancedb/lancedb': '@lancedb/lancedb' })
    return config;
}
})

Yarn users

Unlike other package managers, Yarn does not automatically resolve peer dependencies. If you are using Yarn, you will need to manually install 'apache-arrow':

yarn add apache-arrow
npm install vectordb

Bundling vectordb apps with Webpack

Since LanceDB contains a prebuilt Node binary, you must configure next.config.js to exclude it from webpack. This is required for both using Next.js and deploying a LanceDB app on Vercel.

/** @type {import('next').NextConfig} */
module.exports = ({
webpack(config) {
    config.externals.push({ vectordb: 'vectordb' })
    return config;
}
})

Yarn users

Unlike other package managers, Yarn does not automatically resolve peer dependencies. If you are using Yarn, you will need to manually install 'apache-arrow':

yarn add apache-arrow
cargo add lancedb

To use the lancedb create, you first need to install protobuf.

brew install protobuf
sudo apt install -y protobuf-compiler libssl-dev

Please also make sure you're using the same version of Arrow as in the lancedb crate

Preview releases

Stable releases are created about every 2 weeks. For the latest features and bug fixes, you can install the preview release. These releases receive the same level of testing as stable releases, but are not guaranteed to be available for more than 6 months after they are released. Once your application is stable, we recommend switching to stable releases.

pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb
npm install @lancedb/lancedb@preview
npm install vectordb@preview

We don't push preview releases to crates.io, but you can referent the tag in GitHub within your Cargo dependencies:

[dependencies]
lancedb = { git = "https://github.com/lancedb/lancedb.git", tag = "vX.Y.Z-beta.N" }

Connect to a database

import lancedb
import pandas as pd
import pyarrow as pa

uri = "data/sample-lancedb"
db = lancedb.connect(uri)

# LanceDb offers both a synchronous and an asynchronous client.  There are still a
# few operations that are only supported by the synchronous client (e.g. embedding
# functions, full text search) but both APIs should soon be equivalent

# In this guide we will give examples of both clients.  In other guides we will
# typically only provide examples with one client or the other.
uri = "data/sample-lancedb"
async_db = await lancedb.connect_async(uri)

Asynchronous Python API

The asynchronous Python API is new and has some slight differences compared to the synchronous API. Feel free to start using the asynchronous version. Once all features have migrated we will start to move the synchronous API to use the same syntax as the asynchronous API. To help with this migration we have created a migration guide detailing the differences.

import * as lancedb from "@lancedb/lancedb";
import * as arrow from "apache-arrow";

const uri = "/tmp/lancedb/";
const db = await lancedb.connect(uri);
const lancedb = require("vectordb");
const uri = "data/sample-lancedb";
const db = await lancedb.connect(uri);
#[tokio::main]
async fn main() -> Result<()> {
    let uri = "data/sample-lancedb";
    let db = connect(uri).execute().await?;
}

See examples/simple.rs for a full working example.

LanceDB will create the directory if it doesn't exist (including parent directories).

If you need a reminder of the uri, you can call db.uri().

Create a table

Create a table from initial data

If you have data to insert into the table at creation time, you can simultaneously create a table and insert the data into it. The schema of the data will be used as the schema of the table.

data = [
    {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
    {"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
]

# Synchronous client
tbl = db.create_table("my_table", data=data)
# Asynchronous client
async_tbl = await async_db.create_table("my_table2", data=data)

If the table already exists, LanceDB will raise an error by default. If you want to overwrite the table, you can pass in mode="overwrite" to the create_table method.

You can also pass in a pandas DataFrame directly:

df = pd.DataFrame(
    [
        {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
        {"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
    ]
)
# Synchronous client
tbl = db.create_table("table_from_df", data=df)
# Asynchronous client
async_tbl = await async_db.create_table("table_from_df2", df)
const tbl = await db.createTable(
  "myTable",
  [
    { vector: [3.1, 4.1], item: "foo", price: 10.0 },
    { vector: [5.9, 26.5], item: "bar", price: 20.0 },
  ],
  { mode: "overwrite" },
);
const tbl = await db.createTable(
  "myTable",
  [
    { vector: [3.1, 4.1], item: "foo", price: 10.0 },
    { vector: [5.9, 26.5], item: "bar", price: 20.0 },
  ],
  { writeMode: lancedb.WriteMode.Overwrite },
);

If the table already exists, LanceDB will raise an error by default. If you want to overwrite the table, you can pass in mode:"overwrite" to the createTable function.

let initial_data = create_some_records()?;
let tbl = db
    .create_table("my_table", initial_data)
    .execute()
    .await
    .unwrap();

If the table already exists, LanceDB will raise an error by default. See the mode option for details on how to overwrite (or open) existing tables instead.

Providing

The Rust SDK currently expects data to be provided as an Arrow RecordBatchReader Support for additional formats (such as serde or polars) is on the roadmap.

Under the hood, LanceDB reads in the Apache Arrow data and persists it to disk using the Lance format.

Automatic embedding generation with Embedding API

When working with embedding models, it is recommended to use the LanceDB embedding API to automatically create vector representation of the data and queries in the background. See the quickstart example or the embedding API guide

Create an empty table

Sometimes you may not have the data to insert into the table at creation time. In this case, you can create an empty table and specify the schema, so that you can add data to the table at a later time (as long as it conforms to the schema). This is similar to a CREATE TABLE statement in SQL.

schema = pa.schema([pa.field("vector", pa.list_(pa.float32(), list_size=2))])
# Synchronous client
tbl = db.create_table("empty_table", schema=schema)
# Asynchronous client
async_tbl = await async_db.create_table("empty_table2", schema=schema)

You can define schema in Pydantic

LanceDB comes with Pydantic support, which allows you to define the schema of your data using Pydantic models. This makes it easy to work with LanceDB tables and data. Learn more about all supported types in tables guide.

const schema = new arrow.Schema([
  new arrow.Field("id", new arrow.Int32()),
  new arrow.Field("name", new arrow.Utf8()),
]);

const empty_tbl = await db.createEmptyTable("empty_table", schema);
const schema = new arrow.Schema([
  new arrow.Field("id", new arrow.Int32()),
  new arrow.Field("name", new arrow.Utf8()),
]);

const empty_tbl = await db.createTable({ name: "empty_table", schema });
let schema = Arc::new(Schema::new(vec![
    Field::new("id", DataType::Int32, false),
    Field::new("item", DataType::Utf8, true),
]));
db.create_empty_table("empty_table", schema).execute().await

Open an existing table

Once created, you can open a table as follows:

# Synchronous client
tbl = db.open_table("my_table")
# Asynchronous client
async_tbl = await async_db.open_table("my_table2")
const _tbl = await db.openTable("myTable");
const tbl = await db.openTable("myTable");
let table = db.open_table("my_table").execute().await.unwrap();

If you forget the name of your table, you can always get a listing of all table names:

# Synchronous client
print(db.table_names())
# Asynchronous client
print(await async_db.table_names())
const tableNames = await db.tableNames();
console.log(tableNames);
console.log(await db.tableNames());
println!("{:?}", db.table_names().execute().await?);

Add data to a table

After a table has been created, you can always add more data to it as follows:

# Option 1: Add a list of dicts to a table
data = [
    {"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
    {"vector": [9.5, 56.2], "item": "buzz", "price": 200.0},
]
tbl.add(data)

# Option 2: Add a pandas DataFrame to a table
df = pd.DataFrame(data)
tbl.add(data)
# Asynchronous client
await async_tbl.add(data)
const data = [
  { vector: [1.3, 1.4], item: "fizz", price: 100.0 },
  { vector: [9.5, 56.2], item: "buzz", price: 200.0 },
];
await tbl.add(data);
const newData = Array.from({ length: 500 }, (_, i) => ({
  vector: [i, i + 1],
  item: "fizz",
  price: i * 0.1,
}));
await tbl.add(newData);
let new_data = create_some_records()?;
tbl.add(new_data).execute().await.unwrap();

Search for nearest neighbors

Once you've embedded the query, you can find its nearest neighbors as follows:

# Synchronous client
tbl.search([100, 100]).limit(2).to_pandas()
# Asynchronous client
await async_tbl.vector_search([100, 100]).limit(2).to_pandas()

This returns a pandas DataFrame with the results.

const _res = tbl.search([100, 100]).limit(2).toArray();
const query = await tbl.search([100, 100]).limit(2).execute();
use futures::TryStreamExt;

table
    .query()
    .limit(2)
    .nearest_to(&[1.0; 128])?
    .execute()
    .await?
    .try_collect::<Vec<_>>()
    .await

Query

Rust does not yet support automatic execution of embedding functions. You will need to calculate embeddings yourself. Support for this is on the roadmap and can be tracked at https://github.com/lancedb/lancedb/issues/994

Query vectors can be provided as Arrow arrays or a Vec/slice of Rust floats. Support for additional formats (e.g. polars::series::Series) is on the roadmap.

By default, LanceDB runs a brute-force scan over dataset to find the K nearest neighbours (KNN). For tables with more than 50K vectors, creating an ANN index is recommended to speed up search performance. LanceDB allows you to create an ANN index on a table as follows:

# Synchronous client
tbl.create_index(num_sub_vectors=1)
# Asynchronous client (must specify column to index)
await async_tbl.create_index("vector")
await tbl.createIndex("vector");
await tbl.createIndex({
  type: "ivf_pq",
  num_partitions: 2,
  num_sub_vectors: 2,
});
table.create_index(&["vector"], Index::Auto).execute().await

Why do I need to create an index manually?

LanceDB does not automatically create the ANN index for two reasons. The first is that it's optimized for really fast retrievals via a disk-based index, and the second is that data and query workloads can be very diverse, so there's no one-size-fits-all index configuration. LanceDB provides many parameters to fine-tune index size, query latency and accuracy. See the section on ANN indexes for more details.

Delete rows from a table

Use the delete() method on tables to delete rows from a table. To choose which rows to delete, provide a filter that matches on the metadata columns. This can delete any number of rows that match the filter.

# Synchronous client
tbl.delete('item = "fizz"')
# Asynchronous client
await async_tbl.delete('item = "fizz"')
await tbl.delete('item = "fizz"');
await tbl.delete('item = "fizz"');
tbl.delete("id > 24").await.unwrap();

The deletion predicate is a SQL expression that supports the same expressions as the where() clause (only_if() in Rust) on a search. They can be as simple or complex as needed. To see what expressions are supported, see the SQL filters section.

Drop a table

Use the drop_table() method on the database to remove a table.

# Synchronous client
db.drop_table("my_table")
# Asynchronous client
await async_db.drop_table("my_table2")

This permanently removes the table and is not recoverable, unlike deleting rows. By default, if the table does not exist an exception is raised. To suppress this, you can pass in ignore_missing=True.

await db.dropTable("myTable");
await db.dropTable("myTable");

This permanently removes the table and is not recoverable, unlike deleting rows. If the table does not exist an exception is raised.

db.drop_table("my_table").await.unwrap();

Using the Embedding API

You can use the embedding API when working with embedding models. It automatically vectorizes the data at ingestion and query time and comes with built-in integrations with popular embedding models like Openai, Hugging Face, Sentence Transformers, CLIP and more.

from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry

db = lancedb.connect("/tmp/db")
func = get_registry().get("openai").create(name="text-embedding-ada-002")

class Words(LanceModel):
    text: str = func.SourceField()
    vector: Vector(func.ndims()) = func.VectorField()

table = db.create_table("words", schema=Words, mode="overwrite")
table.add([{"text": "hello world"}, {"text": "goodbye world"}])

query = "greetings"
actual = table.search(query).limit(1).to_pydantic(Words)[0]
print(actual.text)
import * as lancedb from "@lancedb/lancedb";
import { LanceSchema, getRegistry, register } from "@lancedb/lancedb/embedding";
import { EmbeddingFunction } from "@lancedb/lancedb/embedding";
import { type Float, Float32, Utf8 } from "apache-arrow";

const db = await lancedb.connect("/tmp/db");
const func = getRegistry()
  .get("openai")
  ?.create({ model: "text-embedding-ada-002" }) as EmbeddingFunction;

const wordsSchema = LanceSchema({
  text: func.sourceField(new Utf8()),
  vector: func.vectorField(),
});
const tbl = await db.createEmptyTable("words", wordsSchema, {
  mode: "overwrite",
});
await tbl.add([{ text: "hello world" }, { text: "goodbye world" }]);

const query = "greetings";
const actual = (await (await tbl.search(query)).limit(1).toArray())[0];
use std::{iter::once, sync::Arc};

use arrow_array::{Float64Array, Int32Array, RecordBatch, RecordBatchIterator, StringArray};
use arrow_schema::{DataType, Field, Schema};
use futures::StreamExt;
use lancedb::{
    arrow::IntoArrow,
    connect,
    embeddings::{openai::OpenAIEmbeddingFunction, EmbeddingDefinition, EmbeddingFunction},
    query::{ExecutableQuery, QueryBase},
    Result,
};

#[tokio::main]
async fn main() -> Result<()> {
    let tempdir = tempfile::tempdir().unwrap();
    let tempdir = tempdir.path().to_str().unwrap();
    let api_key = std::env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY is not set");
    let embedding = Arc::new(OpenAIEmbeddingFunction::new_with_model(
        api_key,
        "text-embedding-3-large",
    )?);

    let db = connect(tempdir).execute().await?;
    db.embedding_registry()
        .register("openai", embedding.clone())?;

    let table = db
        .create_table("vectors", make_data())
        .add_embedding(EmbeddingDefinition::new(
            "text",
            "openai",
            Some("embeddings"),
        ))?
        .execute()
        .await?;

    let query = Arc::new(StringArray::from_iter_values(once("something warm")));
    let query_vector = embedding.compute_query_embeddings(query)?;
    let mut results = table
        .vector_search(query_vector)?
        .limit(1)
        .execute()
        .await?;

    let rb = results.next().await.unwrap()?;
    let out = rb
        .column_by_name("text")
        .unwrap()
        .as_any()
        .downcast_ref::<StringArray>()
        .unwrap();
    let text = out.iter().next().unwrap().unwrap();
    println!("Closest match: {}", text);
    Ok(())
}

Learn about using the existing integrations and creating custom embedding functions in the embedding API guide.

What's next

This section covered the very basics of using LanceDB. If you're learning about vector databases for the first time, you may want to read the page on indexing to get familiar with the concepts.

If you've already worked with other vector databases, you may want to read the guides to learn how to work with LanceDB in more detail.


  1. The vectordb package is a legacy package that is deprecated in favor of @lancedb/lancedb. The vectordb package will continue to receive bug fixes and security updates until September 2024. We recommend all new projects use @lancedb/lancedb. See the migration guide for more information.