Imagebind embeddings

We have support for imagebind model embeddings. You can download our version of the packaged model via - pip install imagebind-packaged==0.1.2.

This function is registered as imagebind and supports Audio, Video and Text modalities(extending to Thermal,Depth,IMU data):

Parameter	Type	Default Value	Description
`name`	`str`	`"imagebind_huge"`	Name of the model.
`device`	`str`	`"cpu"`	The device to run the model on. Can be `"cpu"` or `"gpu"`.
`normalize`	`bool`	`False`	set to `True` to normalize your inputs before model ingestion.

Below is an example demonstrating how the API works:

import lancedb
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry

db = lancedb.connect(tmp_path)
func = get_registry().get("imagebind").create()

class ImageBindModel(LanceModel):
    text: str
    image_uri: str = func.SourceField()
    audio_path: str
    vector: Vector(func.ndims()) = func.VectorField()

# add locally accessible image paths
text_list=["A dog.", "A car", "A bird"]
image_paths=[".assets/dog_image.jpg", ".assets/car_image.jpg", ".assets/bird_image.jpg"]
audio_paths=[".assets/dog_audio.wav", ".assets/car_audio.wav", ".assets/bird_audio.wav"]

# Load data
inputs = [
    {"text": a, "audio_path": b, "image_uri": c}
    for a, b, c in zip(text_list, audio_paths, image_paths)
]

#create table and add data
table = db.create_table("img_bind", schema=ImageBindModel)
table.add(inputs)

Now, we can search using any modality:

image search

query_image = "./assets/dog_image2.jpg" #download an image and enter that path here
actual = table.search(query_image).limit(1).to_pydantic(ImageBindModel)[0]
print(actual.text == "dog")

audio search

query_audio = "./assets/car_audio2.wav" #download an audio clip and enter path here
actual = table.search(query_audio).limit(1).to_pydantic(ImageBindModel)[0]
print(actual.text == "car")

Text search

You can add any input query and fetch the result as follows:

query = "an animal which flies and tweets" 
actual = table.search(query).limit(1).to_pydantic(ImageBindModel)[0]
print(actual.text == "bird")

If you have any questions about the embeddings API, supported models, or see a relevant model missing, please raise an issue on GitHub.