Skip to content

πŸ“š Available Embedding Models

There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. πŸš€

Before jumping on the list of available models, let's understand how to get an embedding model initialized and configured to use in our code:

Example usage

model = get_registry()
          .get("openai")
          .create(name="text-embedding-ada-002")

Now let's understand the above syntax:

model = get_registry().get("model_id").create(...params)
ThisπŸ‘† line effectively creates a configured instance of an embedding function with model of choice that is ready for use.

  • get_registry() : This function call returns an instance of a EmbeddingFunctionRegistry object. This registry manages the registration and retrieval of embedding functions.

  • .get("model_id") : This method call on the registry object and retrieves the embedding models functions associated with the "model_id" (1) .

    1. Hover over the names in table below to find out the model_id of different embedding functions.
  • .create(...params) : This method call is on the object returned by the get method. It instantiates an embedding model function using the specified parameters.

What parameters does the .create(...params) method accepts?

Checkout the documentation of specific embedding models (links in the table belowπŸ‘‡) to know what parameters it takes.

Moving on

Now that we know how to get the desired embedding model and use it in our code, let's explore the comprehensive list of embedding models supported by LanceDB, in the tables below.

Text Embedding Functions πŸ“

These functions are registered by default to handle text embeddings.

  • πŸ”„ Embedding functions have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential backoff.

  • πŸŒ• Each EmbeddingFunction implementation automatically takes max_retries as an argument which has the default value of 7.

🌟 Available Text Embeddings

Embedding Description Documentation
Sentence Transformers 🧠 SentenceTransformers is a Python framework for state-of-the-art sentence, text, and image embeddings. Sentence Transformers Icon
Huggingface Models πŸ€— We offer support for all Huggingface models. The default model is colbert-ir/colbertv2.0. Huggingface Icon
Ollama Embeddings πŸ” Generate embeddings via the Ollama python library. Ollama supports embedding models, making it possible to build RAG apps. Ollama Icon
OpenAI Embeddings πŸ”‘ OpenAI’s text embeddings measure the relatedness of text strings. LanceDB supports state-of-the-art embeddings from OpenAI. OpenAI Icon
Instructor Embeddings πŸ“š Instructor: An instruction-finetuned text embedding model that can generate text embeddings tailored to any task and domains by simply providing the task instruction, without any finetuning. Instructor Embedding Icon
Gemini Embeddings 🌌 Google’s Gemini API generates state-of-the-art embeddings for words, phrases, and sentences. Gemini Icon
Cohere Embeddings πŸ’¬ This will help you get started with Cohere embedding models using LanceDB. Using cohere API requires cohere package. Install it via pip. Cohere Icon
Jina Embeddings πŸ”— World-class embedding models to improve your search and RAG systems. You will need jina api key. Jina Icon
AWS Bedrock Functions ☁️ AWS Bedrock supports multiple base models for generating text embeddings. You need to setup the AWS credentials to use this embedding function. AWS Bedrock Icon
IBM Watsonx.ai πŸ’‘ Generate text embeddings using IBM's watsonx.ai platform. Note: watsonx.ai library is an optional dependency. Watsonx Icon
VoyageAI Embeddings πŸŒ• Voyage AI provides cutting-edge embedding and rerankers. This will help you get started with VoyageAI embedding models using LanceDB. Using voyageai API requires voyageai package. Install it via pip. VoyageAI Icon

Multi-modal Embedding FunctionsπŸ–ΌοΈ

Multi-modal embedding functions allow you to query your table using both images and text. πŸ’¬πŸ–ΌοΈ

🌐 Available Multi-modal Embeddings

Embedding Description Documentation
OpenClip Embeddings 🎨 We support CLIP model embeddings using the open source alternative, open-clip which supports various customizations. openclip Icon
Imagebind Embeddings 🌌 We have support for imagebind model embeddings. You can download our version of the packaged model via - pip install imagebind-packaged==0.1.2. imagebind Icon
Jina Multi-modal Embeddings πŸ”— Jina embeddings can also be used to embed both text and image data, only some of the models support image data and you can check the detailed documentation. πŸ‘‰ jina Icon

Note

If you'd like to request support for additional embedding functions, please feel free to open an issue on our LanceDB GitHub issue page.