Working with tables

A Table is a collection of Records in a LanceDB Database. Tables in Lance have a schema that defines the columns and their types. These schemas can include nested columns and can evolve over time.

This guide will show how to create tables, insert data into them, and update the data.

Creating a LanceDB Table

Initialize a LanceDB connection and create a table

PythonTypescript¹

Sync APIAsync API

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)

import lancedb

uri = "data/sample-lancedb"
async_db = await lancedb.connect_async(uri)

LanceDB allows ingesting data from various sources - dict, list[dict], pd.DataFrame, pa.Table or a Iterator[pa.RecordBatch]. Let's take a look at some of the these.

@lancedb/lancedbvectordb (deprecated)

import * as lancedb from "@lancedb/lancedb";
import * as arrow from "apache-arrow";

const uri = "data/sample-lancedb";
const db = await lancedb.connect(uri);

const lancedb = require("vectordb");
const arrow = require("apache-arrow");

const uri = "data/sample-lancedb";
const db = await lancedb.connect(uri);

From list of tuples or dictionaries

PythonTypescript¹

Sync APIAsync API

data = [
    {"vector": [1.1, 1.2], "lat": 45.5, "long": -122.7},
    {"vector": [0.2, 1.8], "lat": 40.1, "long": -74.1},
]
db.create_table("test_table", data)
db["test_table"].head()

data = [
    {"vector": [1.1, 1.2], "lat": 45.5, "long": -122.7},
    {"vector": [0.2, 1.8], "lat": 40.1, "long": -74.1},
]
async_tbl = await async_db.create_table("test_table_async", data)
await async_tbl.head()

Note

If the table already exists, LanceDB will raise an error by default.

create_table supports an optional exist_ok parameter. When set to True and the table exists, then it simply opens the existing table. The data you passed in will NOT be appended to the table in that case.

Sync APIAsync API

db.create_table("test_table", data, exist_ok=True)

await async_db.create_table("test_table_async", data, exist_ok=True)

Sometimes you want to make sure that you start fresh. If you want to overwrite the table, you can pass in mode="overwrite" to the createTable function.

Sync APIAsync API

db.create_table("test_table", data, mode="overwrite")

await async_db.create_table("test_table_async", data, mode="overwrite")

You can create a LanceDB table in JavaScript using an array of records as follows.

@lancedb/lancedbvectordb (deprecated)

const _tbl = await db.createTable(
  "myTable",
  [
    { vector: [3.1, 4.1], item: "foo", price: 10.0 },
    { vector: [5.9, 26.5], item: "bar", price: 20.0 },
  ],
  { mode: "overwrite" },
);

This will infer the schema from the provided data. If you want to explicitly provide a schema, you can use apache-arrow to declare a schema

const schema = new arrow.Schema([
  new arrow.Field(
    "vector",
    new arrow.FixedSizeList(
      2,
      new arrow.Field("item", new arrow.Float32(), true),
    ),
  ),
  new arrow.Field("item", new arrow.Utf8(), true),
  new arrow.Field("price", new arrow.Float32(), true),
]);
const data = [
  { vector: [3.1, 4.1], item: "foo", price: 10.0 },
  { vector: [5.9, 26.5], item: "bar", price: 20.0 },
];
const tbl = await db.createTable("myTable", data, {
  schema,
});

Note

createTable supports an optional existsOk parameter. When set to true and the table exists, then it simply opens the existing table. The data you passed in will NOT be appended to the table in that case.

const tbl = await db.createTable("myTable", data, {
  existOk: true,
});

Sometimes you want to make sure that you start fresh. If you want to overwrite the table, you can pass in mode: "overwrite" to the createTable function.

const tbl = await db.createTable("myTable", data, {
  mode: "overwrite",
});

const tbl = await db.createTable(
  "myTable",
  [
    { vector: [3.1, 4.1], item: "foo", price: 10.0 },
    { vector: [5.9, 26.5], item: "bar", price: 20.0 },
  ],
  { writeMode: lancedb.WriteMode.Overwrite },
);

This will infer the schema from the provided data. If you want to explicitly provide a schema, you can use apache-arrow to declare a schema

const schema = new arrow.Schema([
  new arrow.Field(
    "vector",
    new arrow.FixedSizeList(
      2,
      new arrow.Field("item", new arrow.Float32(), true),
    ),
  ),
  new arrow.Field("item", new arrow.Utf8(), true),
  new arrow.Field("price", new arrow.Float32(), true),
]);
const data = [
  { vector: [3.1, 4.1], item: "foo", price: 10.0 },
  { vector: [5.9, 26.5], item: "bar", price: 20.0 },
];
const tbl = await db.createTable({
  name: "myTableWithSchema",
  data,
  schema,
});

Warning

existsOk is not available in vectordb

If the table already exists, vectordb will raise an error by default. You can use writeMode: WriteMode.Overwrite to overwrite the table. But this will delete the existing table and create a new one with the same name.

Sometimes you want to make sure that you start fresh.

If you want to overwrite the table, you can pass in writeMode: lancedb.WriteMode.Overwrite to the createTable function.

const table = await con.createTable(tableName, data, {
    writeMode: WriteMode.Overwrite
})

From a Pandas DataFrame

Sync APIAsync API

import pandas as pd

data = pd.DataFrame(
    {
        "vector": [[1.1, 1.2, 1.3, 1.4], [0.2, 1.8, 0.4, 3.6]],
        "lat": [45.5, 40.1],
        "long": [-122.7, -74.1],
    }
)
db.create_table("my_table_pandas", data)
db["my_table_pandas"].head()

import pandas as pd

data = pd.DataFrame(
    {
        "vector": [[1.1, 1.2, 1.3, 1.4], [0.2, 1.8, 0.4, 3.6]],
        "lat": [45.5, 40.1],
        "long": [-122.7, -74.1],
    }
)
async_tbl = await async_db.create_table("my_table_async_pd", data)
await async_tbl.head()

Note

Data is converted to Arrow before being written to disk. For maximum control over how data is saved, either provide the PyArrow schema to convert to or else provide a PyArrow Table directly.

The vector column needs to be a Vector (defined as pyarrow.FixedSizeList) type.

Sync APIAsync API

import pyarrow as pa

custom_schema = pa.schema(
    [
        pa.field("vector", pa.list_(pa.float32(), 4)),
        pa.field("lat", pa.float32()),
        pa.field("long", pa.float32()),
    ]
)

tbl = db.create_table("my_table_custom_schema", data, schema=custom_schema)

import pyarrow as pa

custom_schema = pa.schema(
    [
        pa.field("vector", pa.list_(pa.float32(), 4)),
        pa.field("lat", pa.float32()),
        pa.field("long", pa.float32()),
    ]
)
async_tbl = await async_db.create_table(
    "my_table_async_custom_schema", data, schema=custom_schema
)

From a Polars DataFrame

LanceDB supports Polars, a modern, fast DataFrame library written in Rust. Just like in Pandas, the Polars integration is enabled by PyArrow under the hood. A deeper integration between LanceDB Tables and Polars DataFrames is on the way.

Sync APIAsync API

import polars as pl

data = pl.DataFrame(
    {
        "vector": [[3.1, 4.1], [5.9, 26.5]],
        "item": ["foo", "bar"],
        "price": [10.0, 20.0],
    }
)
tbl = db.create_table("my_table_pl", data)

import polars as pl

data = pl.DataFrame(
    {
        "vector": [[3.1, 4.1], [5.9, 26.5]],
        "item": ["foo", "bar"],
        "price": [10.0, 20.0],
    }
)
async_tbl = await async_db.create_table("my_table_async_pl", data)

From an Arrow Table

You can also create LanceDB tables directly from Arrow tables. LanceDB supports float16 data type!

PythonTypescript¹

Sync APIAsync API

import pyarrow as pa

import numpy as np

dim = 16
total = 2
schema = pa.schema(
    [pa.field("vector", pa.list_(pa.float16(), dim)), pa.field("text", pa.string())]
)
data = pa.Table.from_arrays(
    [
        pa.array(
            [np.random.randn(dim).astype(np.float16) for _ in range(total)],
            pa.list_(pa.float16(), dim),
        ),
        pa.array(["foo", "bar"]),
    ],
    ["vector", "text"],
)
tbl = db.create_table("f16_tbl", data, schema=schema)

import polars as pl

import numpy as np

dim = 16
total = 2
schema = pa.schema(
    [pa.field("vector", pa.list_(pa.float16(), dim)), pa.field("text", pa.string())]
)
data = pa.Table.from_arrays(
    [
        pa.array(
            [np.random.randn(dim).astype(np.float16) for _ in range(total)],
            pa.list_(pa.float16(), dim),
        ),
        pa.array(["foo", "bar"]),
    ],
    ["vector", "text"],
)
async_tbl = await async_db.create_table("f16_tbl_async", data, schema=schema)

@lancedb/lancedbvectordb (deprecated)

const db = await lancedb.connect(databaseDir);
const dim = 16;
const total = 10;
const f16Schema = new Schema([
  new Field("id", new Int32()),
  new Field(
    "vector",
    new FixedSizeList(dim, new Field("item", new Float16(), true)),
    false,
  ),
]);
const data = lancedb.makeArrowTable(
  Array.from(Array(total), (_, i) => ({
    id: i,
    vector: Array.from(Array(dim), Math.random),
  })),
  { schema: f16Schema },
);
const _table = await db.createTable("f16_tbl", data);

const dim = 16;
const total = 10;
const schema = new Schema([
  new Field("id", new Int32()),
  new Field(
    "vector",
    new FixedSizeList(dim, new Field("item", new Float16(), true)),
    false,
  ),
]);
const data = lancedb.makeArrowTable(
  Array.from(Array(total), (_, i) => ({
    id: i,
    vector: Array.from(Array(dim), Math.random),
  })),
  { schema },
);
const table = await db.createTable("f16_tbl", data);

From Pydantic Models

When you create an empty table without data, you must specify the table schema. LanceDB supports creating tables by specifying a PyArrow schema or a specialized Pydantic model called LanceModel.

For example, the following Content model specifies a table with 5 columns: movie_id, vector, genres, title, and imdb_id. When you create a table, you can pass the class as the value of the schema parameter to create_table. The vector column is a Vector type, which is a specialized Pydantic type that can be configured with the vector dimensions. It is also important to note that LanceDB only understands subclasses of lancedb.pydantic.LanceModel (which itself derives from pydantic.BaseModel).

Sync APIAsync API

from lancedb.pydantic import Vector, LanceModel

import pyarrow as pa

class Content(LanceModel):
    movie_id: int
    vector: Vector(128)
    genres: str
    title: str
    imdb_id: int

    @property
    def imdb_url(self) -> str:
        return f"https://www.imdb.com/title/tt{self.imdb_id}"


tbl = db.create_table("movielens_small", schema=Content)

from lancedb.pydantic import Vector, LanceModel

import pyarrow as pa

class Content(LanceModel):
    movie_id: int
    vector: Vector(128)
    genres: str
    title: str
    imdb_id: int

    @property
    def imdb_url(self) -> str:
        return f"https://www.imdb.com/title/tt{self.imdb_id}"


async_tbl = await async_db.create_table("movielens_small_async", schema=Content)

Nested schemas

Sometimes your data model may contain nested objects. For example, you may want to store the document string and the document source name as a nested Document object:

from pydantic import BaseModel

class Document(BaseModel):
    content: str
    source: str

This can be used as the type of a LanceDB table column:

Sync APIAsync API

class NestedSchema(LanceModel):
    id: str
    vector: Vector(1536)
    document: Document


tbl = db.create_table("nested_table", schema=NestedSchema)

class NestedSchema(LanceModel):
    id: str
    vector: Vector(1536)
    document: Document


async_tbl = await async_db.create_table("nested_table_async", schema=NestedSchema)

This creates a struct column called "document" that has two subfields called "content" and "source":

In [28]: tbl.schema
Out[28]:
id: string not null
vector: fixed_size_list<item: float>[1536] not null
    child 0, item: float
document: struct<content: string not null, source: string not null> not null
    child 0, content: string not null
    child 1, source: string not null

Validators

Note that neither Pydantic nor PyArrow automatically validates that input data is of the correct timezone, but this is easy to add as a custom field validator:

from datetime import datetime
from zoneinfo import ZoneInfo

from lancedb.pydantic import LanceModel
from pydantic import Field, field_validator, ValidationError, ValidationInfo

tzname = "America/New_York"
tz = ZoneInfo(tzname)

class TestModel(LanceModel):
    dt_with_tz: datetime = Field(json_schema_extra={"tz": tzname})

    @field_validator('dt_with_tz')
    @classmethod
    def tz_must_match(cls, dt: datetime) -> datetime:
        assert dt.tzinfo == tz
        return dt

ok = TestModel(dt_with_tz=datetime.now(tz))

try:
    TestModel(dt_with_tz=datetime.now(ZoneInfo("Asia/Shanghai")))
    assert 0 == 1, "this should raise ValidationError"
except ValidationError:
    print("A ValidationError was raised.")
    pass

When you run this code it should print "A ValidationError was raised."

Pydantic custom types

LanceDB does NOT yet support converting pydantic custom types. If this is something you need, please file a feature request on the LanceDB Github repo.

Using Iterators / Writing Large Datasets

It is recommended to use iterators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using table.add()

LanceDB additionally supports PyArrow's RecordBatch Iterators or other generators producing supported data types.

Here's an example using using RecordBatch iterator for creating tables.

Sync APIAsync API

import pyarrow as pa

def make_batches():
    for i in range(5):
        yield pa.RecordBatch.from_arrays(
            [
                pa.array(
                    [[3.1, 4.1, 5.1, 6.1], [5.9, 26.5, 4.7, 32.8]],
                    pa.list_(pa.float32(), 4),
                ),
                pa.array(["foo", "bar"]),
                pa.array([10.0, 20.0]),
            ],
            ["vector", "item", "price"],
        )


schema = pa.schema(
    [
        pa.field("vector", pa.list_(pa.float32(), 4)),
        pa.field("item", pa.utf8()),
        pa.field("price", pa.float32()),
    ]
)
db.create_table("batched_tale", make_batches(), schema=schema)

import pyarrow as pa

def make_batches():
    for i in range(5):
        yield pa.RecordBatch.from_arrays(
            [
                pa.array(
                    [[3.1, 4.1, 5.1, 6.1], [5.9, 26.5, 4.7, 32.8]],
                    pa.list_(pa.float32(), 4),
                ),
                pa.array(["foo", "bar"]),
                pa.array([10.0, 20.0]),
            ],
            ["vector", "item", "price"],
        )


schema = pa.schema(
    [
        pa.field("vector", pa.list_(pa.float32(), 4)),
        pa.field("item", pa.utf8()),
        pa.field("price", pa.float32()),
    ]
)
await async_db.create_table("batched_table", make_batches(), schema=schema)

You can also use iterators of other types like Pandas DataFrame or Pylists directly in the above example.

Open existing tables

PythonTypescript¹

If you forget the name of your table, you can always get a listing of all table names.

Sync APIAsync API

print(db.table_names())

print(await async_db.table_names())

Then, you can open any existing tables.

Sync APIAsync API

tbl = db.open_table("test_table")

async_tbl = await async_db.open_table("test_table_async")

If you forget the name of your table, you can always get a listing of all table names.

console.log(await db.tableNames());

Then, you can open any existing tables.

const tbl = await db.openTable("my_table");

Creating empty table

You can create an empty table for scenarios where you want to add data to the table later. An example would be when you want to collect data from a stream/external file and then add it to a table in batches.

PythonTypescript¹

An empty table can be initialized via a PyArrow schema.

Sync APIAsync API

import lancedb

import pyarrow as pa

schema = pa.schema(
    [
        pa.field("vector", pa.list_(pa.float32(), 2)),
        pa.field("item", pa.string()),
        pa.field("price", pa.float32()),
    ]
)
tbl = db.create_table("test_empty_table", schema=schema)

import lancedb

import pyarrow as pa

schema = pa.schema(
    [
        pa.field("vector", pa.list_(pa.float32(), 2)),
        pa.field("item", pa.string()),
        pa.field("price", pa.float32()),
    ]
)
async_tbl = await async_db.create_table("test_empty_table_async", schema=schema)

Alternatively, you can also use Pydantic to specify the schema for the empty table. Note that we do not directly import pydantic but instead use lancedb.pydantic which is a subclass of pydantic.BaseModel that has been extended to support LanceDB specific types like Vector.

Sync APIAsync API

import lancedb

from lancedb.pydantic import Vector, LanceModel

class Item(LanceModel):
    vector: Vector(2)
    item: str
    price: float


tbl = db.create_table("test_empty_table_new", schema=Item.to_arrow_schema())

import lancedb

from lancedb.pydantic import Vector, LanceModel

class Item(LanceModel):
    vector: Vector(2)
    item: str
    price: float


async_tbl = await async_db.create_table(
    "test_empty_table_async_new", schema=Item.to_arrow_schema()
)

Once the empty table has been created, you can add data to it via the various methods listed in the Adding to a table section.

@lancedb/lancedbvectordb (deprecated)

const schema = new arrow.Schema([
  new arrow.Field("id", new arrow.Int32()),
  new arrow.Field("name", new arrow.Utf8()),
]);

const emptyTbl = await db.createEmptyTable("empty_table", schema);

const schema = new arrow.Schema([
  new arrow.Field("id", new arrow.Int32()),
  new arrow.Field("name", new arrow.Utf8()),
]);

const empty_tbl = await db.createTable({ name: "empty_table", schema });

Adding to a table

After a table has been created, you can always add more data to it using the add method

PythonTypescript¹

You can add any of the valid data structures accepted by LanceDB table, i.e, dict, list[dict], pd.DataFrame, or Iterator[pa.RecordBatch]. Below are some examples.

Add a Pandas DataFrame

Sync APIAsync API

df = pd.DataFrame(
    {
        "vector": [[1.3, 1.4], [9.5, 56.2]],
        "item": ["banana", "apple"],
        "price": [5.0, 7.0],
    }
)

tbl.add(df)

df = pd.DataFrame(
    {
        "vector": [[1.3, 1.4], [9.5, 56.2]],
        "item": ["banana", "apple"],
        "price": [5.0, 7.0],
    }
)
await async_tbl.add(df)

Add a Polars DataFrame

Sync APIAsync API

df = pl.DataFrame(
    {
        "vector": [[1.3, 1.4], [9.5, 56.2]],
        "item": ["banana", "apple"],
        "price": [5.0, 7.0],
    }
)

tbl.add(df)

df = pl.DataFrame(
    {
        "vector": [[1.3, 1.4], [9.5, 56.2]],
        "item": ["banana", "apple"],
        "price": [5.0, 7.0],
    }
)
await async_tbl.add(df)

Add an Iterator

You can also add a large dataset batch in one go using Iterator of any supported data types.

Sync APIAsync API

def make_batches_for_add():
    for i in range(5):
        yield [
            {"vector": [3.1, 4.1], "item": "peach", "price": 6.0},
            {"vector": [5.9, 26.5], "item": "pear", "price": 5.0},
        ]


tbl.add(make_batches_for_add())

def make_batches_for_add():
    for i in range(5):
        yield [
            {"vector": [3.1, 4.1], "item": "peach", "price": 6.0},
            {"vector": [5.9, 26.5], "item": "pear", "price": 5.0},
        ]


await async_tbl.add(make_batches_for_add())

Add a PyArrow table

If you have data coming in as a PyArrow table, you can add it directly to the LanceDB table.

Sync APIAsync API

pa_table = pa.Table.from_arrays(
    [
        pa.array([[9.1, 6.7], [9.9, 31.2]], pa.list_(pa.float32(), 2)),
        pa.array(["mango", "orange"]),
        pa.array([7.0, 4.0]),
    ],
    ["vector", "item", "price"],
)
tbl.add(pa_table)

pa_table = pa.Table.from_arrays(
    [
        pa.array([[9.1, 6.7], [9.9, 31.2]], pa.list_(pa.float32(), 2)),
        pa.array(["mango", "orange"]),
        pa.array([7.0, 4.0]),
    ],
    ["vector", "item", "price"],
)
await async_tbl.add(pa_table)

Add a Pydantic Model

Assuming that a table has been created with the correct schema as shown above, you can add data items that are valid Pydantic models to the table.

Sync APIAsync API

pydantic_model_items = [
    Item(vector=[8.1, 4.7], item="pineapple", price=10.0),
    Item(vector=[6.9, 9.3], item="avocado", price=9.0),
]
tbl.add(pydantic_model_items)

pydantic_model_items = [
    Item(vector=[8.1, 4.7], item="pineapple", price=10.0),
    Item(vector=[6.9, 9.3], item="avocado", price=9.0),
]
await async_tbl.add(pydantic_model_items)

Ingesting Pydantic models with LanceDB embedding API

When using LanceDB's embedding API, you can add Pydantic models directly to the table. LanceDB will automatically convert the vector field to a vector before adding it to the table. You need to specify the default value of vector field as None to allow LanceDB to automatically vectorize the data.

Sync APIAsync API

import lancedb

from lancedb.pydantic import Vector, LanceModel

from lancedb.embeddings import get_registry

embed_fcn = get_registry().get("huggingface").create(name="BAAI/bge-small-en-v1.5")

class Schema(LanceModel):
    text: str = embed_fcn.SourceField()
    vector: Vector(embed_fcn.ndims()) = embed_fcn.VectorField(default=None)

tbl = db.create_table("my_table_with_embedding", schema=Schema, mode="overwrite")
models = [Schema(text="hello"), Schema(text="world")]
tbl.add(models)

import lancedb

from lancedb.pydantic import Vector, LanceModel

from lancedb.embeddings import get_registry

embed_fcn = get_registry().get("huggingface").create(name="BAAI/bge-small-en-v1.5")

class Schema(LanceModel):
    text: str = embed_fcn.SourceField()
    vector: Vector(embed_fcn.ndims()) = embed_fcn.VectorField(default=None)

async_tbl = await async_db.create_table(
    "my_table_async_with_embedding", schema=Schema, mode="overwrite"
)
models = [Schema(text="hello"), Schema(text="world")]
await async_tbl.add(models)

await tbl.add(
    [
        {vector: [1.3, 1.4], item: "fizz", price: 100.0},
        {vector: [9.5, 56.2], item: "buzz", price: 200.0}
    ]
)

Upserting into a table

Upserting lets you insert new rows or update existing rows in a table. To upsert in LanceDB, use the merge insert API.

PythonTypescript¹

Sync APIAsync API

table = db.create_table(
    "users",
    [
        {"id": 0, "name": "Alice"},
        {"id": 1, "name": "Bob"},
    ],
)
new_users = [
    {"id": 1, "name": "Bobby"},
    {"id": 2, "name": "Charlie"},
]
res = (
    table.merge_insert("id")
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute(new_users)
)
table.count_rows()  # 3
res  # {'num_inserted_rows': 1, 'num_updated_rows': 1, 'num_deleted_rows': 0}

API Reference: lancedb.table.Table.merge_insert

table = await db.create_table(
    "users",
    [
        {"id": 0, "name": "Alice"},
        {"id": 1, "name": "Bob"},
    ],
)
new_users = [
    {"id": 1, "name": "Bobby"},
    {"id": 2, "name": "Charlie"},
]
res = await (
    table.merge_insert("id")
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute(new_users)
)
await table.count_rows()  # 3
res
# MergeResult(version=2, num_updated_rows=1,
# num_inserted_rows=1, num_deleted_rows=0)

API Reference: lancedb.table.AsyncTable.merge_insert

@lancedb/lancedb

const table = await db.createTable("users", [
  { id: 0, name: "Alice" },
  { id: 1, name: "Bob" },
]);

const newUsers = [
  { id: 1, name: "Bobby" },
  { id: 2, name: "Charlie" },
];
await table
  .mergeInsert("id")
  .whenMatchedUpdateAll()
  .whenNotMatchedInsertAll()
  .execute(newUsers);

await table.countRows(); // 3

API Reference: lancedb.Table.mergeInsert

Deleting from a table

Use the delete() method on tables to delete rows from a table. To choose which rows to delete, provide a filter that matches on the metadata columns. This can delete any number of rows that match the filter.

PythonTypescript¹

Sync APIAsync API

tbl.delete('item = "fizz"')

await async_tbl.delete('item = "fizz"')

Deleting row with specific column value

Sync APIAsync API

data = [
    {"x": 1, "vector": [1, 2]},
    {"x": 2, "vector": [3, 4]},
    {"x": 3, "vector": [5, 6]},
]
# Synchronous client
tbl = db.create_table("delete_row", data)
tbl.to_pandas()
#   x      vector
# 0  1  [1.0, 2.0]
# 1  2  [3.0, 4.0]
# 2  3  [5.0, 6.0]

tbl.delete("x = 2")
tbl.to_pandas()
#   x      vector
# 0  1  [1.0, 2.0]
# 1  3  [5.0, 6.0]

data = [
    {"x": 1, "vector": [1, 2]},
    {"x": 2, "vector": [3, 4]},
    {"x": 3, "vector": [5, 6]},
]
async_db = await lancedb.connect_async(uri)
async_tbl = await async_db.create_table("delete_row_async", data)
await async_tbl.to_pandas()
#   x      vector
# 0  1  [1.0, 2.0]
# 1  2  [3.0, 4.0]
# 2  3  [5.0, 6.0]

await async_tbl.delete("x = 2")
await async_tbl.to_pandas()
#   x      vector
# 0  1  [1.0, 2.0]
# 1  3  [5.0, 6.0]

Delete from a list of values

Sync APIAsync API

to_remove = [1, 5]
to_remove = ", ".join(str(v) for v in to_remove)

tbl.delete(f"x IN ({to_remove})")
tbl.to_pandas()
#   x      vector
# 0  3  [5.0, 6.0]

to_remove = [1, 5]
to_remove = ", ".join(str(v) for v in to_remove)

await async_tbl.delete(f"x IN ({to_remove})")
await async_tbl.to_pandas()
#   x      vector
# 0  3  [5.0, 6.0]

await tbl.delete('item = "fizz"')

Deleting row with specific column value

const con = await lancedb.connect("./.lancedb")
const data = [
  {id: 1, vector: [1, 2]},
  {id: 2, vector: [3, 4]},
  {id: 3, vector: [5, 6]},
];
const tbl = await con.createTable("my_table", data)
await tbl.delete("id = 2")
await tbl.countRows() // Returns 2

Delete from a list of values

const to_remove = [1, 5];
await tbl.delete(`id IN (${to_remove.join(",")})`)
await tbl.countRows() // Returns 1

Updating a table

This can be used to update zero to all rows depending on how many rows match the where clause. The update queries follow the form of a SQL UPDATE statement. The where parameter is a SQL filter that matches on the metadata columns. The values or values_sql parameters are used to provide the new values for the columns.

Parameter	Type	Description
`where`	`str`	The SQL where clause to use when updating rows. For example, `'x = 2'` or `'x IN (1, 2, 3)'`. The filter must not be empty, or it will error.
`values`	`dict`	The values to update. The keys are the column names and the values are the values to set.
`values_sql`	`dict`	The values to update. The keys are the column names and the values are the SQL expressions to set. For example, `{'x': 'x + 1'}` will increment the value of the `x` column by 1.

SQL syntax

See SQL filters for more information on the supported SQL syntax.

Warning

Updating nested columns is not yet supported.

PythonTypescript¹

API Reference: lancedb.table.Table.update

Sync APIAsync API

import lancedb

import pandas as pd

# Create a table from a pandas DataFrame
data = pd.DataFrame({"x": [1, 2, 3], "vector": [[1, 2], [3, 4], [5, 6]]})

tbl = db.create_table("test_table", data, mode="overwrite")
# Update the table where x = 2
tbl.update(where="x = 2", values={"vector": [10, 10]})
# Get the updated table as a pandas DataFrame
df = tbl.to_pandas()
print(df)

import lancedb

import pandas as pd

# Create a table from a pandas DataFrame
data = pd.DataFrame({"x": [1, 2, 3], "vector": [[1, 2], [3, 4], [5, 6]]})

async_tbl = await async_db.create_table("update_table_async", data)
# Update the table where x = 2
await async_tbl.update({"vector": [10, 10]}, where="x = 2")
# Get the updated table as a pandas DataFrame
df = await async_tbl.to_pandas()
# Print the DataFrame
print(df)

Output

    x  vector
0  1  [1.0, 2.0]
1  3  [5.0, 6.0]
2  2  [10.0, 10.0]

@lancedb/lancedbvectordb (deprecated)

API Reference: lancedb.Table.update

import * as lancedb from "@lancedb/lancedb";

const db = await lancedb.connect("./.lancedb");

const data = [
    {x: 1, vector: [1, 2]},
    {x: 2, vector: [3, 4]},
    {x: 3, vector: [5, 6]},
];
const tbl = await db.createTable("my_table", data)

await tbl.update({
    values: { vector: [10, 10] },
    where: "x = 2"
});

API Reference: vectordb.Table.update

const lancedb = require("vectordb");

const db = await lancedb.connect("./.lancedb");

const data = [
    {x: 1, vector: [1, 2]},
    {x: 2, vector: [3, 4]},
    {x: 3, vector: [5, 6]},
];
const tbl = await db.createTable("my_table", data)

await tbl.update({
    where: "x = 2",
    values: { vector: [10, 10] }
});

Updating using a sql query

The values parameter is used to provide the new values for the columns as literal values. You can also use the values_sql / valuesSql parameter to provide SQL expressions for the new values. For example, you can use values_sql="x + 1" to increment the value of the x column by 1.

PythonTypescript¹

Sync APIAsync API

# Update the table where x = 2
tbl.update(values_sql={"x": "x + 1"})
print(tbl.to_pandas())

# Update the table where x = 2
await async_tbl.update(updates_sql={"x": "x + 1"})
print(await async_tbl.to_pandas())

Output

    x  vector
0  2  [1.0, 2.0]
1  4  [5.0, 6.0]
2  3  [10.0, 10.0]

@lancedb/lancedbvectordb (deprecated)

Coming Soon!

await tbl.update({ valuesSql: { x: "x + 1" } })

Note

When rows are updated, they are moved out of the index. The row will still show up in ANN queries, but the query will not be as fast as it would be if the row was in the index. If you update a large proportion of rows, consider rebuilding the index afterwards.

Drop a table

Use the drop_table() method on the database to remove a table.

PythonTypeScript

Sync APIAsync API

db.drop_table("my_table")

await db.drop_table("my_table_async")

This permanently removes the table and is not recoverable, unlike deleting rows. By default, if the table does not exist an exception is raised. To suppress this, you can pass in ignore_missing=True.

await db.dropTable("myTable");

This permanently removes the table and is not recoverable, unlike deleting rows. If the table does not exist an exception is raised.

Changing schemas

While tables must have a schema specified when they are created, you can change the schema over time. There's three methods to alter the schema of a table:

add_columns: Add new columns to the table
alter_columns: Alter the name, nullability, or data type of a column
drop_columns: Drop columns from the table

Adding new columns

You can add new columns to the table with the add_columns method. New columns are filled with values based on a SQL expression. For example, you can add a new column y to the table, fill it with the value of x * 2 and set the expected data type for it.

PythonTypescript

Sync APIAsync API

tbl.add_columns({"double_price": "cast((price * 2) as float)"})

await tbl.add_columns({"double_price": "cast((price * 2) as float)"})

API Reference: lancedb.table.Table.add_columns

await tbl.addColumns([
  { name: "double_price", valueSql: "cast((price * 2) as Float)" },
]);

API Reference: lancedb.Table.addColumns

If you want to fill it with null, you can use cast(NULL as <data_type>) as the SQL expression to fill the column with nulls, while controlling the data type of the column. Available data types are base on the DataFusion data types. You can use any of the SQL types, such as BIGINT:

cast(NULL as BIGINT)

Using Arrow data types and the arrow_typeof function is not yet supported.

Altering existing columns

You can alter the name, nullability, or data type of a column with the alter_columns method.

Changing the name or nullability of a column just updates the metadata. Because of this, it's a fast operation. Changing the data type of a column requires rewriting the column, which can be a heavy operation.

PythonTypescript

Sync APIAsync API

import pyarrow as pa

tbl.alter_columns(
    {
        "path": "double_price",
        "rename": "dbl_price",
        "data_type": pa.float64(),
        "nullable": True,
    }
)

import pyarrow as pa

await tbl.alter_columns(
    {
        "path": "double_price",
        "rename": "dbl_price",
        "data_type": pa.float64(),
        "nullable": True,
    }
)

API Reference: lancedb.table.Table.alter_columns

await tbl.alterColumns([
  {
    path: "double_price",
    rename: "dbl_price",
    dataType: "float",
    nullable: true,
  },
]);

API Reference: lancedb.Table.alterColumns

Dropping columns

You can drop columns from the table with the drop_columns method. This will will remove the column from the schema.

PythonTypescript

Sync APIAsync API

tbl.drop_columns(["dbl_price"])

await tbl.drop_columns(["dbl_price"])

API Reference: lancedb.table.Table.drop_columns

await tbl.dropColumns(["dbl_price"]);

API Reference: lancedb.Table.dropColumns

Handling bad vectors

In LanceDB Python, you can use the on_bad_vectors parameter to choose how invalid vector values are handled. Invalid vectors are vectors that are not valid because:

They are the wrong dimension
They contain NaN values
They are null but are on a non-nullable field

By default, LanceDB will raise an error if it encounters a bad vector. You can also choose one of the following options:

drop: Ignore rows with bad vectors
fill: Replace bad values (NaNs) or missing values (too few dimensions) with the fill value specified in the fill_value parameter. An input like [1.0, NaN, 3.0] will be replaced with [1.0, 0.0, 3.0] if fill_value=0.0.
null: Replace bad vectors with null (only works if the column is nullable). A bad vector [1.0, NaN, 3.0] will be replaced with null if the column is nullable. If the vector column is non-nullable, then bad vectors will cause an error

Consistency

In LanceDB OSS, users can set the read_consistency_interval parameter on connections to achieve different levels of read consistency. This parameter determines how frequently the database synchronizes with the underlying storage system to check for updates made by other processes. If another process updates a table, the database will not see the changes until the next synchronization.

There are three possible settings for read_consistency_interval:

Unset (default): The database does not check for updates to tables made by other processes. This provides the best query performance, but means that clients may not see the most up-to-date data. This setting is suitable for applications where the data does not change during the lifetime of the table reference.
Zero seconds (Strong consistency): The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS.
Custom interval (Eventual consistency): The database checks for updates at a custom interval, such as every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually.

Consistency in LanceDB Cloud

This is only tune-able in LanceDB OSS. In LanceDB Cloud, readers are always eventually consistent.

PythonTypescript¹

To set strong consistency, use timedelta(0):

Sync APIAsync API

from datetime import timedelta

uri = "data/sample-lancedb"
db = lancedb.connect(uri, read_consistency_interval=timedelta(0))
tbl = db.open_table("test_table")

from datetime import timedelta

uri = "data/sample-lancedb"
async_db = await lancedb.connect_async(uri, read_consistency_interval=timedelta(0))
async_tbl = await async_db.open_table("test_table_async")

For eventual consistency, use a custom timedelta:

Sync APIAsync API

from datetime import timedelta

uri = "data/sample-lancedb"
db = lancedb.connect(uri, read_consistency_interval=timedelta(seconds=5))
tbl = db.open_table("test_table")

from datetime import timedelta

uri = "data/sample-lancedb"
async_db = await lancedb.connect_async(
    uri, read_consistency_interval=timedelta(seconds=5)
)
async_tbl = await async_db.open_table("test_table_async")

By default, a Table will never check for updates from other writers. To manually check for updates you can use checkout_latest:

Sync APIAsync API

tbl = db.open_table("test_table")

# (Other writes happen to my_table from another process)

# Check for updates
tbl.checkout_latest()

async_tbl = await async_db.open_table("test_table_async")

# (Other writes happen to test_table_async from another process)

# Check for updates
await async_tbl.checkout_latest()

To set strong consistency, use 0:

const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 0 });
const tbl = await db.openTable("my_table");

For eventual consistency, specify the update interval as seconds:

const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 5 });
const tbl = await db.openTable("my_table");

What's next?

Learn the best practices on creating an ANN index and getting the most out of it.

The vectordb package is a legacy package that is deprecated in favor of @lancedb/lancedb. The vectordb package will continue to receive bug fixes and security updates until September 2024. We recommend all new projects use @lancedb/lancedb. See the migration guide for more information. ↩↩↩↩↩↩↩↩↩↩↩