Example - MultiModal CLIP Embeddings¶

The Disappearing Embedding Function¶

Previously, to use vector databases, you had to do the embedding process yourself and interact with the system using vectors directly. With this new release of LanceDB, we make it much more convenient so you don't need to worry about that at all.

We present you with sentence-transformer, openai, and openclip embedding functions that can be saved directly as table metadata
You no longer have to generate the vectors directly either during query time or ingestion time
The embedding function interface is extensible so you can create your own
The function is persisted as table metadata so you can use it across sessions

In [1]:

Copied!

import lancedb
import lancedb

In this example we'll go over multi-modal image search using:

Oxford Pet dataset
OpenClip model
LanceDB

Data¶

First, download the dataset from https://www.robots.ox.ac.uk/~vgg/data/pets/ Specifically, download the images.tar.gz

This notebook assumes you've downloaded it into your ~/Downloads directory. When you extract the tarball, it will create an images directory.

Define embedding function¶

We'll use the OpenClipEmbeddingFunction here for multi-modal image search.

In [7]:

Copied!

from lancedb.embeddings import EmbeddingFunctionRegistry

registry = EmbeddingFunctionRegistry.get_instance()
clip = registry.get("open-clip").create()
from lancedb.embeddings import EmbeddingFunctionRegistry

registry = EmbeddingFunctionRegistry.get_instance()
clip = registry.get("open-clip").create()

/home/saksham/Documents/lancedb/env/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Downloading (…)ip_pytorch_model.bin: 100%|██████████| 605M/605M [00:41<00:00, 14.6MB/s]

In [6]:

Copied!

!pip install open_clip_torch
!pip install open_clip_torch

Collecting open_clip_torch
  Downloading open_clip_torch-2.20.0-py3-none-any.whl (1.5 MB)
     |████████████████████████████████| 1.5 MB 771 kB/s eta 0:00:01
Requirement already satisfied: regex in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (2023.10.3)
Requirement already satisfied: tqdm in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (4.66.1)
Collecting torchvision
  Downloading torchvision-0.16.0-cp38-cp38-manylinux1_x86_64.whl (6.9 MB)
     |████████████████████████████████| 6.9 MB 21.0 MB/s eta 0:00:01
Collecting huggingface-hub
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
     |████████████████████████████████| 295 kB 43.1 MB/s eta 0:00:01
Collecting protobuf<4
  Using cached protobuf-3.20.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
Collecting timm
  Downloading timm-0.9.7-py3-none-any.whl (2.2 MB)
     |████████████████████████████████| 2.2 MB 28.3 MB/s eta 0:00:01
Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     |████████████████████████████████| 1.3 MB 39.9 MB/s eta 0:00:01
Collecting torch>=1.9.0
  Downloading torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl (670.2 MB)
     |████████████████████████████████| 670.2 MB 47 kB/s s eta 0:00:01
Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
     |████████████████████████████████| 53 kB 2.3 MB/s  eta 0:00:01
Collecting pillow!=8.3.*,>=5.3.0
  Using cached Pillow-10.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB)
Requirement already satisfied: requests in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (2.31.0)
Requirement already satisfied: numpy in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (1.24.4)
Requirement already satisfied: packaging>=20.9 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (23.2)
Collecting fsspec
  Downloading fsspec-2023.9.2-py3-none-any.whl (173 kB)
     |████████████████████████████████| 173 kB 22.0 MB/s eta 0:00:01
Collecting filelock
  Using cached filelock-3.12.4-py3-none-any.whl (11 kB)
Requirement already satisfied: pyyaml>=5.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (6.0.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (4.8.0)
Collecting safetensors
  Downloading safetensors-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     |████████████████████████████████| 1.3 MB 22.8 MB/s eta 0:00:01
Collecting networkx
  Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 16.6 MB/s eta 0:00:01
Collecting triton==2.1.0; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading triton-2.1.0-0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)
     |████████████████████████████████| 89.2 MB 31.6 MB/s eta 0:00:01
Collecting nvidia-curand-cu12==10.3.2.106; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
     |████████████████████████████████| 56.5 MB 15.9 MB/s eta 0:00:01
Collecting nvidia-nvtx-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     |████████████████████████████████| 99 kB 9.4 MB/s  eta 0:00:01
Collecting sympy
  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
     |████████████████████████████████| 5.7 MB 16.4 MB/s eta 0:00:01
Collecting nvidia-cusparse-cu12==12.1.0.106; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
     |████████████████████████████████| 196.0 MB 78 kB/s  eta 0:00:011
Collecting nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     |████████████████████████████████| 23.7 MB 619 kB/s eta 0:00:011
Collecting nvidia-cufft-cu12==11.0.2.54; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
     |████████████████████████████████| 121.6 MB 93 kB/s s eta 0:00:01
Collecting nvidia-cuda-cupti-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     |████████████████████████████████| 14.1 MB 19.5 MB/s eta 0:00:01
Requirement already satisfied: jinja2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torch>=1.9.0->open_clip_torch) (3.1.2)
Collecting nvidia-nccl-cu12==2.18.1; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)
     |████████████████████████████████| 209.8 MB 5.2 kB/s  eta 0:00:01     |███████████████████████████████▊| 208.2 MB 17.0 MB/s eta 0:00:01
Collecting nvidia-cudnn-cu12==8.9.2.26; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
     |████████████████████████████████| 731.7 MB 22 kB/s  eta 0:00:011
Collecting nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
     |████████████████████████████████| 410.6 MB 9.2 kB/s eta 0:00:012
Collecting nvidia-cuda-runtime-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     |████████████████████████████████| 823 kB 18.5 MB/s eta 0:00:01
Collecting nvidia-cusolver-cu12==11.4.5.107; platform_system == "Linux" and platform_machine == "x86_64"
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
     |████████████████████████████████| 124.2 MB 43 kB/s s eta 0:00:01ta 0:00:02
Requirement already satisfied: wcwidth>=0.2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from ftfy->open_clip_torch) (0.2.8)
Requirement already satisfied: certifi>=2017.4.17 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2023.7.22)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2.0.6)
Requirement already satisfied: idna<4,>=2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.3.0)
Collecting mpmath>=0.19
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     |████████████████████████████████| 536 kB 14.2 MB/s eta 0:00:01
Collecting nvidia-nvjitlink-cu12
  Downloading nvidia_nvjitlink_cu12-12.2.140-py3-none-manylinux1_x86_64.whl (20.2 MB)
     |████████████████████████████████| 20.2 MB 14.3 MB/s eta 0:00:01
Requirement already satisfied: MarkupSafe>=2.0 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from jinja2->torch>=1.9.0->open_clip_torch) (2.1.3)
Installing collected packages: pillow, networkx, filelock, triton, nvidia-curand-cu12, nvidia-nvtx-cu12, mpmath, sympy, nvidia-nvjitlink-cu12, nvidia-cusparse-cu12, fsspec, nvidia-cuda-nvrtc-cu12, nvidia-cufft-cu12, nvidia-cuda-cupti-cu12, nvidia-nccl-cu12, nvidia-cublas-cu12, nvidia-cudnn-cu12, nvidia-cuda-runtime-cu12, nvidia-cusolver-cu12, torch, torchvision, huggingface-hub, protobuf, safetensors, timm, sentencepiece, ftfy, open-clip-torch
Successfully installed filelock-3.12.4 fsspec-2023.9.2 ftfy-6.1.1 huggingface-hub-0.17.3 mpmath-1.3.0 networkx-3.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.2.140 nvidia-nvtx-cu12-12.1.105 open-clip-torch-2.20.0 pillow-10.0.1 protobuf-3.20.3 safetensors-0.3.3 sentencepiece-0.1.99 sympy-1.12 timm-0.9.7 torch-2.1.0 torchvision-0.16.0 triton-2.1.0

In [8]:

Copied!

clip
clip

Out[8]:

OpenClipEmbeddings(name='ViT-B-32', pretrained='laion2b_s34b_b79k', device='cpu', batch_size=64, normalize=True)

The data model¶

We'll declare a new model that subclasses LanceModel (special pydantic model) to represent the table. This table has two columns, one for the image_uri and one for the vector generated from those images. The embedding function defines the number of dimensions in its vectors so you don't need to look it up.

We use the VectorField method from the embedding function to annotate the model so that LanceDB knows to use the open-clip embedding function to generate query embeddings that correspond to the vector column.

We also use the SourceField so that when adding data, LanceDB knows to automatically use open-clip to encode the input images.

Finally, because we're working with images, we add a convenience property image to open the image and return a PIL Image so it can be visualized in Jupyter Notebook

In [ ]:

Copied!





from PIL import Image
from lancedb.pydantic import LanceModel, Vector

class Pets(LanceModel):
    vector: Vector(clip.ndims()) = clip.VectorField()
    image_uri: str = clip.SourceField()

    @property
    def image(self):
        return Image.open(self.image_uri)
from PIL import Image
from lancedb.pydantic import LanceModel, Vector

class Pets(LanceModel):
    vector: Vector(clip.ndims()) = clip.VectorField()
    image_uri: str = clip.SourceField()

    @property
    def image(self):
        return Image.open(self.image_uri)

Create the table¶

First we connect to a local lancedb directory

In [ ]:

Copied!

db = lancedb.connect("~/.lancedb")
db = lancedb.connect("~/.lancedb")

Next we get all of the paths for the images we downloaded and create a table. Notice that we didn't have to worry about generating the image embeddings ourselves.

In [ ]:

Copied!





import pandas as pd
from pathlib import Path
from random import sample

if "pets" in db:
    table = db["pets"]
else:
    table = db.create_table("pets", schema=Pets)
    # use a sampling of 1000 images
    p = Path("~/Downloads/images").expanduser()
    uris = [str(f) for f in p.glob("*.jpg")]
    uris = sample(uris, 1000)
    table.add(pd.DataFrame({"image_uri": uris}))
import pandas as pd
from pathlib import Path
from random import sample

if "pets" in db:
    table = db["pets"]
else:
    table = db.create_table("pets", schema=Pets)
    # use a sampling of 1000 images
    p = Path("~/Downloads/images").expanduser()
    uris = [str(f) for f in p.glob("*.jpg")]
    uris = sample(uris, 1000)
    table.add(pd.DataFrame({"image_uri": uris}))

In [ ]:

Copied!

table.head().to_pandas()
table.head().to_pandas()

Out[ ]:

	vector	image_uri
0	[0.018789755, 0.11621179, -0.09760579, -0.0268...	/Users/changshe/Downloads/images/leonberger_14...
1	[0.021960497, 0.06073219, -0.1625527, 0.021481...	/Users/changshe/Downloads/images/havanese_63.jpg
2	[0.0074375155, 0.084355146, -0.027461205, -0.0...	/Users/changshe/Downloads/images/english_cocke...
3	[-0.01220356, 0.020815236, -0.08587208, -0.027...	/Users/changshe/Downloads/images/shiba_inu_143...
4	[-0.010112503, 0.14021927, -0.14588796, -0.046...	/Users/changshe/Downloads/images/saint_bernard...

Querying via text¶

We also don't need to generate the embeddings when querying either. LanceDB does that automatically so you can query directly using text input.

The pydantic model we declared for the table schema also makes it really easy for us to work with the search results

In [ ]:

Copied!

rs = table.search("dog").limit(3).to_pydantic(Pets)
rs[0].image
rs = table.search("dog").limit(3).to_pydantic(Pets)
rs[0].image

Out[ ]:

No description has been provided for this image

Querying via images¶

The great thing about CLIP is that it's multi-modal. So you can search using not just text but images as well.

Create a query image using PIL

In [ ]:

Copied!





from PIL import Image
p = Path("~/Downloads/images/samoyed_100.jpg").expanduser()
query_image = Image.open(p)
query_image
from PIL import Image
p = Path("~/Downloads/images/samoyed_100.jpg").expanduser()
query_image = Image.open(p)
query_image

Out[ ]:

Pass in the query_image to the search API

In [ ]:

Copied!

rs = table.search(query_image).limit(3).to_pydantic(Pets)
rs[2].image
rs = table.search(query_image).limit(3).to_pydantic(Pets)
rs[2].image

Out[ ]:

Persistence¶

Embedding functions are persisted as table metadata so it's much easier to use across sessions.

For example we can recreate the database connection and table object

In [ ]:

Copied!

db = lancedb.connect("~/.lancedb")
table = db["pets"]
db = lancedb.connect("~/.lancedb")
table = db["pets"]

We can observe that it's read out as table metadata

In [ ]:

Copied!

import json

json.loads(table.schema.metadata[b"embedding_functions"])[0]
import json

json.loads(table.schema.metadata[b"embedding_functions"])[0]

Out[ ]:

{'name': 'open-clip',
 'model': {'name': 'ViT-B-32',
  'pretrained': 'laion2b_s34b_b79k',
  'device': 'cpu',
  'batch_size': 64,
  'normalize': True},
 'source_column': 'image_uri',
 'vector_column': 'vector'}

And we can also run queries as before without having to reinstantiate the embedding function explicitly

In [ ]:

Copied!

rs = table.search("big dog").limit(3).to_pydantic(Pets)
rs[0].image
rs = table.search("big dog").limit(3).to_pydantic(Pets)
rs[0].image

Out[ ]:

LanceDB makes multimodal AI easy¶

LanceDB's new embedding functions feature makes it easy for builders of LLM apps
You no longer need to manually encode the data yourself
You no longer need to figure out how many dimensions is your vector
You no longer need to manually encode the query
And with the right embedding model, you can search way more than just text

In [ ]:

Example - MultiModal CLIP Embeddings¶

The Disappearing Embedding Function¶

Multi-modal search made easy¶

Data¶

Define embedding function¶

The data model¶

Create the table¶

Querying via text¶

Querying via images¶

Persistence¶

LanceDB makes multimodal AI easy¶