Polars
LanceDB supports Polars, a blazingly fast DataFrame library for Python written in Rust. Just like in Pandas, the Polars integration is enabled by PyArrow under the hood. A deeper integration between Lance Tables and Polars DataFrames is in progress, but at the moment, you can read a Polars DataFrame into LanceDB and output the search results from a query to a Polars DataFrame.
Create & Query LanceDB Table
From Polars DataFrame
First, we connect to a LanceDB database.
We can load a Polars DataFrame
to LanceDB directly.
import polars as pl
data = pl.DataFrame({
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0]
})
table = db.create_table("pl_table", data=data)
We can now perform similarity search via the LanceDB Python API.
query = [3.0, 4.0]
result = table.search(query).limit(1).to_polars()
print(result)
print(type(result))
In addition to the selected columns, LanceDB also returns a vector
and also the _distance
column which is the distance between the query
vector and the returned vector.
shape: (1, 4)
βββββββββββββββββ¬βββββββ¬ββββββββ¬ββββββββββββ
β vector β item β price β _distance β
β --- β --- β --- β --- β
β array[f32, 2] β str β f64 β f32 β
βββββββββββββββββͺβββββββͺββββββββͺββββββββββββ‘
β [3.1, 4.1] β foo β 10.0 β 0.0 β
βββββββββββββββββ΄βββββββ΄ββββββββ΄ββββββββββββ
<class 'polars.dataframe.frame.DataFrame'>
Note that the type of the result from a table search is a Polars DataFrame.
From Pydantic Models
Alternately, we can create an empty LanceDB Table using a Pydantic schema and populate it with a Polars DataFrame.
import polars as pl
from lancedb.pydantic import Vector, LanceModel
class Item(LanceModel):
vector: Vector(2)
item: str
price: float
data = {
"vector": [[3.1, 4.1]],
"item": "foo",
"price": 10.0,
}
table = db.create_table("test_table", schema=Item)
df = pl.DataFrame(data)
# Add Polars DataFrame to table
table.add(df)
The table can now be queried as usual.
shape: (1, 4)
βββββββββββββββββ¬βββββββ¬ββββββββ¬ββββββββββββ
β vector β item β price β _distance β
β --- β --- β --- β --- β
β array[f32, 2] β str β f64 β f32 β
βββββββββββββββββͺβββββββͺββββββββͺββββββββββββ‘
β [3.1, 4.1] β foo β 10.0 β 0.02 β
βββββββββββββββββ΄βββββββ΄ββββββββ΄ββββββββββββ
<class 'polars.dataframe.frame.DataFrame'>
This result is the same as the previous one, with a DataFrame returned.
Dump Table to LazyFrame
As you iterate on your application, you'll likely need to work with the whole table's data pretty frequently. LanceDB tables can also be converted directly into a polars LazyFrame for further processing.
Unlike the search result from a query, we can see that the type of the result is a LazyFrame.
We can now work with the LazyFrame as we would in Polars, and collect the first result.
shape: (1, 3)
βββββββββββββββββ¬βββββββ¬ββββββββ
β vector β item β price β
β --- β --- β --- β
β array[f32, 2] β str β f64 β
βββββββββββββββββͺβββββββͺββββββββ‘
β [3.1, 4.1] β foo β 10.0 β
βββββββββββββββββ΄βββββββ΄ββββββββ
The reason it's beneficial to not convert the LanceDB Table to a DataFrame is because the table can potentially be way larger than memory, and Polars LazyFrames allow us to work with such larger-than-memory datasets by not loading it into memory all at once.