Getting Started with LanceDB

LanceDB Hero Image

This is a minimal tutorial for Python users. In Basic Usage, we'll show you how to work with our Typescript and Rust SDKs.

.

1. Install LanceDB

LanceDB requires Python 3.8+ and can be installed via pip. The pandas package is optional but recommended for data manipulation. By default, you can manage data using Python lists or dictionaries. LanceDB also integrates seamlessly with popular data libraries like pyarrow, pydantic, and polars to provide flexible data handling options.

pip install lancedb pandas

2. Import Libraries

Import the libraries. LanceDB provides the core vector database functionality, while pandas helps with data handling.

import lancedb
import pandas as pd

3. Connect to LanceDB

LanceDB supports both managed and local deployments. The connection uri determines where your data is stored. We recommend using LanceDB Cloud or Enterprise for production workloads as they provide a managed infrastructure, security, and automatic backups.

CloudEnterpriseOpen Source

Sync APIAsync API

db = lancedb.connect(
    uri="db://your-project-slug",
    api_key="your-api-key",
    region="us-east-1"
)

db = await lancedb.connect_async(
    uri="db://your-project-slug",
    api_key="your-api-key",
    region="us-east-1"
)

For LanceDB Enterprise, set the host override to your private cloud endpoint.

Sync APIAsync API

host_override = os.environ.get("LANCEDB_HOST_OVERRIDE")

db = lancedb.connect(
    uri=uri,
    api_key=api_key,
    region=region,
    host_override=host_override
)

host_override = os.environ.get("LANCEDB_HOST_OVERRIDE")

db = await lancedb.connect_async(
    uri=uri,
    api_key=api_key,
    region=region,
    host_override=host_override
)

Sync APIAsync API

uri = "data/sample-lancedb"
db = lancedb.connect(uri)

uri = "data/sample-lancedb"
db = await lancedb.connect_async(uri)

4. Add Data

Create a pandas DataFrame with your data. Each row must contain a vector field (list of floats) and can include additional metadata.

data = pd.DataFrame([
    {"id": "1", "vector": [0.9, 0.4, 0.8], "text": "knight"},    
    {"id": "2", "vector": [0.8, 0.5, 0.3], "text": "ranger"},  
    {"id": "3", "vector": [0.5, 0.9, 0.6], "text": "cleric"},    
    {"id": "4", "vector": [0.3, 0.8, 0.7], "text": "rogue"},     
    {"id": "5", "vector": [0.2, 1.0, 0.5], "text": "thief"},     
])

5. Create a Table

Create a table in the database. The table takes on the schema of your ingested data.

Sync APIAsync API

table = db.create_table("adventurers", data)

table = await db.create_table_async("adventurers", data)

6. Vector Search

Perform a vector similarity search. The query vector should have the same dimensionality as your data vectors. The search returns the most similar vectors based on euclidean distance.

Our query is "warrior" - [0.8, 0.3, 0.8]. Let's find the most similar adventurer:

Sync APIAsync API

query_vector = [0.8, 0.3, 0.8]  
results = table.search(query_vector).limit(3).to_pandas()
print(results)

query_vector = [0.8, 0.3, 0.8]  
results = await table.search(query_vector).limit(3).to_pandas()
print(results)

7. Results

The results show the most similar vectors to your query, sorted by similarity score (distance). Lower distance means higher similarity.

| id | vector          | text    | distance  |
|----|-----------------|---------|-----------|
| 1  | [0.9, 0.4, 0.8] | knight  | 0.02      |
| 2  | [0.8, 0.5, 0.3] | ranger  | 0.29      |
| 3  | [0.5, 0.9, 0.6] | cleric  | 0.49      |

What's Next?

Check out some Basic Usage tips. After that, we'll teach you how to build a small app.