Example - MultiModal CLIP Embeddings¶
The Disappearing Embedding Function¶
Previously, to use vector databases, you had to do the embedding process yourself and interact with the system using vectors directly. With this new release of LanceDB, we make it much more convenient so you don't need to worry about that at all.
- We present you with sentence-transformer, openai, and openclip embedding functions that can be saved directly as table metadata
- You no longer have to generate the vectors directly either during query time or ingestion time
- The embedding function interface is extensible so you can create your own
- The function is persisted as table metadata so you can use it across sessions
import lancedb
Multi-modal search made easy¶
In this example we'll go over multi-modal image search using:
- Oxford Pet dataset
- OpenClip model
- LanceDB
Data¶
First, download the dataset from https://www.robots.ox.ac.uk/~vgg/data/pets/ Specifically, download the images.tar.gz
This notebook assumes you've downloaded it into your ~/Downloads directory.
When you extract the tarball, it will create an images
directory.
Define embedding function¶
We'll use the OpenClipEmbeddingFunction here for multi-modal image search.
from lancedb.embeddings import EmbeddingFunctionRegistry
registry = EmbeddingFunctionRegistry.get_instance()
clip = registry.get("open-clip").create()
/home/saksham/Documents/lancedb/env/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm Downloading (…)ip_pytorch_model.bin: 100%|██████████| 605M/605M [00:41<00:00, 14.6MB/s]
!pip install open_clip_torch
Collecting open_clip_torch Downloading open_clip_torch-2.20.0-py3-none-any.whl (1.5 MB) |████████████████████████████████| 1.5 MB 771 kB/s eta 0:00:01 Requirement already satisfied: regex in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (2023.10.3) Requirement already satisfied: tqdm in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (4.66.1) Collecting torchvision Downloading torchvision-0.16.0-cp38-cp38-manylinux1_x86_64.whl (6.9 MB) |████████████████████████████████| 6.9 MB 21.0 MB/s eta 0:00:01 Collecting huggingface-hub Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB) |████████████████████████████████| 295 kB 43.1 MB/s eta 0:00:01 Collecting protobuf<4 Using cached protobuf-3.20.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB) Collecting timm Downloading timm-0.9.7-py3-none-any.whl (2.2 MB) |████████████████████████████████| 2.2 MB 28.3 MB/s eta 0:00:01 Collecting sentencepiece Downloading sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) |████████████████████████████████| 1.3 MB 39.9 MB/s eta 0:00:01 Collecting torch>=1.9.0 Downloading torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl (670.2 MB) |████████████████████████████████| 670.2 MB 47 kB/s s eta 0:00:01 Collecting ftfy Downloading ftfy-6.1.1-py3-none-any.whl (53 kB) |████████████████████████████████| 53 kB 2.3 MB/s eta 0:00:01 Collecting pillow!=8.3.*,>=5.3.0 Using cached Pillow-10.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB) Requirement already satisfied: requests in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (2.31.0) Requirement already satisfied: numpy in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (1.24.4) Requirement already satisfied: packaging>=20.9 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (23.2) Collecting fsspec Downloading fsspec-2023.9.2-py3-none-any.whl (173 kB) |████████████████████████████████| 173 kB 22.0 MB/s eta 0:00:01 Collecting filelock Using cached filelock-3.12.4-py3-none-any.whl (11 kB) Requirement already satisfied: pyyaml>=5.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (6.0.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (4.8.0) Collecting safetensors Downloading safetensors-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) |████████████████████████████████| 1.3 MB 22.8 MB/s eta 0:00:01 Collecting networkx Downloading networkx-3.1-py3-none-any.whl (2.1 MB) |████████████████████████████████| 2.1 MB 16.6 MB/s eta 0:00:01 Collecting triton==2.1.0; platform_system == "Linux" and platform_machine == "x86_64" Downloading triton-2.1.0-0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB) |████████████████████████████████| 89.2 MB 31.6 MB/s eta 0:00:01 Collecting nvidia-curand-cu12==10.3.2.106; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB) |████████████████████████████████| 56.5 MB 15.9 MB/s eta 0:00:01 Collecting nvidia-nvtx-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB) |████████████████████████████████| 99 kB 9.4 MB/s eta 0:00:01 Collecting sympy Downloading sympy-1.12-py3-none-any.whl (5.7 MB) |████████████████████████████████| 5.7 MB 16.4 MB/s eta 0:00:01 Collecting nvidia-cusparse-cu12==12.1.0.106; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB) |████████████████████████████████| 196.0 MB 78 kB/s eta 0:00:011 Collecting nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB) |████████████████████████████████| 23.7 MB 619 kB/s eta 0:00:011 Collecting nvidia-cufft-cu12==11.0.2.54; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB) |████████████████████████████████| 121.6 MB 93 kB/s s eta 0:00:01 Collecting nvidia-cuda-cupti-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB) |████████████████████████████████| 14.1 MB 19.5 MB/s eta 0:00:01 Requirement already satisfied: jinja2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torch>=1.9.0->open_clip_torch) (3.1.2) Collecting nvidia-nccl-cu12==2.18.1; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB) |████████████████████████████████| 209.8 MB 5.2 kB/s eta 0:00:01 |███████████████████████████████▊| 208.2 MB 17.0 MB/s eta 0:00:01 Collecting nvidia-cudnn-cu12==8.9.2.26; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) |████████████████████████████████| 731.7 MB 22 kB/s eta 0:00:011 Collecting nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB) |████████████████████████████████| 410.6 MB 9.2 kB/s eta 0:00:012 Collecting nvidia-cuda-runtime-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB) |████████████████████████████████| 823 kB 18.5 MB/s eta 0:00:01 Collecting nvidia-cusolver-cu12==11.4.5.107; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB) |████████████████████████████████| 124.2 MB 43 kB/s s eta 0:00:01ta 0:00:02 Requirement already satisfied: wcwidth>=0.2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from ftfy->open_clip_torch) (0.2.8) Requirement already satisfied: certifi>=2017.4.17 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2023.7.22) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2.0.6) Requirement already satisfied: idna<4,>=2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.4) Requirement already satisfied: charset-normalizer<4,>=2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.3.0) Collecting mpmath>=0.19 Downloading mpmath-1.3.0-py3-none-any.whl (536 kB) |████████████████████████████████| 536 kB 14.2 MB/s eta 0:00:01 Collecting nvidia-nvjitlink-cu12 Downloading nvidia_nvjitlink_cu12-12.2.140-py3-none-manylinux1_x86_64.whl (20.2 MB) |████████████████████████████████| 20.2 MB 14.3 MB/s eta 0:00:01 Requirement already satisfied: MarkupSafe>=2.0 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from jinja2->torch>=1.9.0->open_clip_torch) (2.1.3) Installing collected packages: pillow, networkx, filelock, triton, nvidia-curand-cu12, nvidia-nvtx-cu12, mpmath, sympy, nvidia-nvjitlink-cu12, nvidia-cusparse-cu12, fsspec, nvidia-cuda-nvrtc-cu12, nvidia-cufft-cu12, nvidia-cuda-cupti-cu12, nvidia-nccl-cu12, nvidia-cublas-cu12, nvidia-cudnn-cu12, nvidia-cuda-runtime-cu12, nvidia-cusolver-cu12, torch, torchvision, huggingface-hub, protobuf, safetensors, timm, sentencepiece, ftfy, open-clip-torch Successfully installed filelock-3.12.4 fsspec-2023.9.2 ftfy-6.1.1 huggingface-hub-0.17.3 mpmath-1.3.0 networkx-3.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.2.140 nvidia-nvtx-cu12-12.1.105 open-clip-torch-2.20.0 pillow-10.0.1 protobuf-3.20.3 safetensors-0.3.3 sentencepiece-0.1.99 sympy-1.12 timm-0.9.7 torch-2.1.0 torchvision-0.16.0 triton-2.1.0
clip
OpenClipEmbeddings(name='ViT-B-32', pretrained='laion2b_s34b_b79k', device='cpu', batch_size=64, normalize=True)
The data model¶
We'll declare a new model that subclasses LanceModel (special pydantic model) to represent the table. This table has two columns, one for the image_uri and one for the vector generated from those images. The embedding function defines the number of dimensions in its vectors so you don't need to look it up.
We use the VectorField
method from the embedding function to annotate the model
so that LanceDB knows to use the open-clip embedding function to generate query embeddings that
correspond to the vector
column.
We also use the SourceField
so that when adding data, LanceDB knows to automatically use
open-clip to encode the input images.
Finally, because we're working with images, we add a convenience property image
to open the image and
return a PIL Image so it can be visualized in Jupyter Notebook
from PIL import Image
from lancedb.pydantic import LanceModel, Vector
class Pets(LanceModel):
vector: Vector(clip.ndims()) = clip.VectorField()
image_uri: str = clip.SourceField()
@property
def image(self):
return Image.open(self.image_uri)
Create the table¶
First we connect to a local lancedb directory
db = lancedb.connect("~/.lancedb")
Next we get all of the paths for the images we downloaded and create a table. Notice that we didn't have to worry about generating the image embeddings ourselves.
import pandas as pd
from pathlib import Path
from random import sample
if "pets" in db:
table = db["pets"]
else:
table = db.create_table("pets", schema=Pets)
# use a sampling of 1000 images
p = Path("~/Downloads/images").expanduser()
uris = [str(f) for f in p.glob("*.jpg")]
uris = sample(uris, 1000)
table.add(pd.DataFrame({"image_uri": uris}))
table.head().to_pandas()
vector | image_uri | |
---|---|---|
0 | [0.018789755, 0.11621179, -0.09760579, -0.0268... | /Users/changshe/Downloads/images/leonberger_14... |
1 | [0.021960497, 0.06073219, -0.1625527, 0.021481... | /Users/changshe/Downloads/images/havanese_63.jpg |
2 | [0.0074375155, 0.084355146, -0.027461205, -0.0... | /Users/changshe/Downloads/images/english_cocke... |
3 | [-0.01220356, 0.020815236, -0.08587208, -0.027... | /Users/changshe/Downloads/images/shiba_inu_143... |
4 | [-0.010112503, 0.14021927, -0.14588796, -0.046... | /Users/changshe/Downloads/images/saint_bernard... |
Querying via text¶
We also don't need to generate the embeddings when querying either. LanceDB does that automatically so you can query directly using text input.
The pydantic model we declared for the table schema also makes it really easy for us to work with the search results
rs = table.search("dog").limit(3).to_pydantic(Pets)
rs[0].image
Querying via images¶
The great thing about CLIP is that it's multi-modal. So you can search using not just text but images as well.
Create a query image using PIL
from PIL import Image
p = Path("~/Downloads/images/samoyed_100.jpg").expanduser()
query_image = Image.open(p)
query_image
Pass in the query_image to the search API
rs = table.search(query_image).limit(3).to_pydantic(Pets)
rs[2].image
Persistence¶
Embedding functions are persisted as table metadata so it's much easier to use across sessions.
For example we can recreate the database connection and table object
db = lancedb.connect("~/.lancedb")
table = db["pets"]
We can observe that it's read out as table metadata
import json
json.loads(table.schema.metadata[b"embedding_functions"])[0]
{'name': 'open-clip', 'model': {'name': 'ViT-B-32', 'pretrained': 'laion2b_s34b_b79k', 'device': 'cpu', 'batch_size': 64, 'normalize': True}, 'source_column': 'image_uri', 'vector_column': 'vector'}
And we can also run queries as before without having to reinstantiate the embedding function explicitly
rs = table.search("big dog").limit(3).to_pydantic(Pets)
rs[0].image
LanceDB makes multimodal AI easy¶
- LanceDB's new embedding functions feature makes it easy for builders of LLM apps
- You no longer need to manually encode the data yourself
- You no longer need to figure out how many dimensions is your vector
- You no longer need to manually encode the query
- And with the right embedding model, you can search way more than just text