Pydantic
Pydantic is a data validation library in Python. LanceDB integrates with Pydantic for schema inference, data ingestion, and query result casting.
Schema
LanceDB supports to create Apache Arrow Schema from a Pydantic BaseModel via pydantic_to_schema() method.
lancedb.pydantic.pydantic_to_schema(model: Type[pydantic.BaseModel]) -> pa.Schema
Convert a Pydantic model to a PyArrow Schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Type[BaseModel]
|
The Pydantic BaseModel to convert to Arrow Schema. |
required |
Returns:
Type | Description |
---|---|
Schema
|
|
Examples:
>>> from typing import List, Optional
>>> import pydantic
>>> from lancedb.pydantic import pydantic_to_schema
>>> class FooModel(pydantic.BaseModel):
... id: int
... s: str
... vec: List[float]
... li: List[int]
...
>>> schema = pydantic_to_schema(FooModel)
>>> assert schema == pa.schema([
... pa.field("id", pa.int64(), False),
... pa.field("s", pa.utf8(), False),
... pa.field("vec", pa.list_(pa.float64()), False),
... pa.field("li", pa.list_(pa.int64()), False),
... ])
Source code in lancedb/pydantic.py
Vector Field
LanceDB provides a Vector(dim)
method to define a
vector Field in a Pydantic Model.
lancedb.pydantic.Vector(dim: int, value_type: pa.DataType = pa.float32()) -> Type[FixedSizeListMixin]
Pydantic Vector Type.
Warning
Experimental feature.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dim |
int
|
The dimension of the vector. |
required |
value_type |
DataType
|
The value type of the vector, by default pa.float32() |
float32()
|
Examples:
>>> import pydantic
>>> from lancedb.pydantic import Vector
...
>>> class MyModel(pydantic.BaseModel):
... id: int
... url: str
... embeddings: Vector(768)
>>> schema = pydantic_to_schema(MyModel)
>>> assert schema == pa.schema([
... pa.field("id", pa.int64(), False),
... pa.field("url", pa.utf8(), False),
... pa.field("embeddings", pa.list_(pa.float32(), 768), False)
... ])
Source code in lancedb/pydantic.py
Type Conversion
LanceDB automatically convert Pydantic fields to Apache Arrow DataType.
Current supported type conversions:
Pydantic Field Type | PyArrow Data Type |
---|---|
int |
pyarrow.int64 |
float |
pyarrow.float64 |
bool |
pyarrow.bool |
str |
pyarrow.utf8() |
list |
pyarrow.List |
BaseModel |
pyarrow.Struct |
Vector(n) |
pyarrow.FixedSizeList(float32, n) |