📚 Available Embedding Models

There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. 🚀

Before jumping on the list of available models, let's understand how to get an embedding model initialized and configured to use in our code:

Example usage

model = get_registry()
          .get("openai")
          .create(name="text-embedding-ada-002")

Now let's understand the above syntax:

model = get_registry().get("model_id").create(...params)

This👆 line effectively creates a configured instance of an embedding function with model of choice that is ready for use.

get_registry() : This function call returns an instance of a EmbeddingFunctionRegistry object. This registry manages the registration and retrieval of embedding functions.
.get("model_id") : This method call on the registry object and retrieves the embedding models functions associated with the "model_id" (1) .
1. Hover over the names in table below to find out the model_id of different embedding functions.
.create(...params) : This method call is on the object returned by the get method. It instantiates an embedding model function using the specified parameters.

What parameters does the .create(...params) method accepts?

Checkout the documentation of specific embedding models (links in the table below👇) to know what parameters it takes.

Moving on

Now that we know how to get the desired embedding model and use it in our code, let's explore the comprehensive list of embedding models supported by LanceDB, in the tables below.

Text Embedding Functions 📝

These functions are registered by default to handle text embeddings.

🔄 Embedding functions have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential backoff.
🌕 Each EmbeddingFunction implementation automatically takes max_retries as an argument which has the default value of 7.

🌟 Available Text Embeddings

Embedding	Description	Documentation
Sentence Transformers	🧠 SentenceTransformers is a Python framework for state-of-the-art sentence, text, and image embeddings.
Huggingface Models	🤗 We offer support for all Huggingface models. The default model is `colbert-ir/colbertv2.0`.
Ollama Embeddings	🔍 Generate embeddings via the Ollama python library. Ollama supports embedding models, making it possible to build RAG apps.
OpenAI Embeddings	🔑 OpenAI’s text embeddings measure the relatedness of text strings. LanceDB supports state-of-the-art embeddings from OpenAI.
Instructor Embeddings	📚 Instructor: An instruction-finetuned text embedding model that can generate text embeddings tailored to any task and domains by simply providing the task instruction, without any finetuning.
Gemini Embeddings	🌌 Google’s Gemini API generates state-of-the-art embeddings for words, phrases, and sentences.
Cohere Embeddings	💬 This will help you get started with Cohere embedding models using LanceDB. Using cohere API requires cohere package. Install it via `pip`.
Jina Embeddings	🔗 World-class embedding models to improve your search and RAG systems. You will need jina api key.
AWS Bedrock Functions	☁️ AWS Bedrock supports multiple base models for generating text embeddings. You need to setup the AWS credentials to use this embedding function.
IBM Watsonx.ai	💡 Generate text embeddings using IBM's watsonx.ai platform. Note: watsonx.ai library is an optional dependency.
VoyageAI Embeddings	🌕 Voyage AI provides cutting-edge embedding and rerankers. This will help you get started with VoyageAI embedding models using LanceDB. Using voyageai API requires voyageai package. Install it via `pip`.

Multi-modal embedding functions allow you to query your table using both images and text. 💬🖼️

🌐 Available Multi-modal Embeddings

Embedding	Description	Documentation
OpenClip Embeddings	🎨 We support CLIP model embeddings using the open source alternative, open-clip which supports various customizations.
Imagebind Embeddings	🌌 We have support for imagebind model embeddings. You can download our version of the packaged model via - `pip install imagebind-packaged==0.1.2`.
Jina Multi-modal Embeddings	🔗 Jina embeddings can also be used to embed both text and image data, only some of the models support image data and you can check the detailed documentation. 👉

Note

If you'd like to request support for additional embedding functions, please feel free to open an issue on our LanceDB GitHub issue page.

📚 Available Embedding Models

Text Embedding Functions 📝

Multi-modal Embedding Functions🖼️