π Available Embedding Models
There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. π
Before jumping on the list of available models, let's understand how to get an embedding model initialized and configured to use in our code:
Now let's understand the above syntax:
Thisπ line effectively creates a configured instance of anembedding function
with model
of choice that is ready for use.
-
get_registry()
: This function call returns an instance of aEmbeddingFunctionRegistry
object. This registry manages the registration and retrieval of embedding functions. -
.get("model_id")
: This method call on the registry object and retrieves the embedding models functions associated with the"model_id"
(1) .- Hover over the names in table below to find out the
model_id
of different embedding functions.
- Hover over the names in table below to find out the
-
.create(...params)
: This method call is on the object returned by theget
method. It instantiates an embedding model function using the specified parameters.
What parameters does the .create(...params)
method accepts?
Checkout the documentation of specific embedding models (links in the table belowπ) to know what parameters it takes.
Moving on
Now that we know how to get the desired embedding model and use it in our code, let's explore the comprehensive list of embedding models supported by LanceDB, in the tables below.
Text Embedding Functions π
These functions are registered by default to handle text embeddings.
-
π Embedding functions have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential backoff.
-
π Each
EmbeddingFunction
implementation automatically takesmax_retries
as an argument which has the default value of 7.
π Available Text Embeddings
Embedding | Description | Documentation |
---|---|---|
Sentence Transformers | π§ SentenceTransformers is a Python framework for state-of-the-art sentence, text, and image embeddings. | |
Huggingface Models | π€ We offer support for all Huggingface models. The default model is colbert-ir/colbertv2.0 . |
|
Ollama Embeddings | π Generate embeddings via the Ollama python library. Ollama supports embedding models, making it possible to build RAG apps. | |
OpenAI Embeddings | π OpenAIβs text embeddings measure the relatedness of text strings. LanceDB supports state-of-the-art embeddings from OpenAI. | |
Instructor Embeddings | π Instructor: An instruction-finetuned text embedding model that can generate text embeddings tailored to any task and domains by simply providing the task instruction, without any finetuning. | |
Gemini Embeddings | π Googleβs Gemini API generates state-of-the-art embeddings for words, phrases, and sentences. | |
Cohere Embeddings | π¬ This will help you get started with Cohere embedding models using LanceDB. Using cohere API requires cohere package. Install it via pip . |
|
Jina Embeddings | π World-class embedding models to improve your search and RAG systems. You will need jina api key. | |
AWS Bedrock Functions | βοΈ AWS Bedrock supports multiple base models for generating text embeddings. You need to setup the AWS credentials to use this embedding function. | |
IBM Watsonx.ai | π‘ Generate text embeddings using IBM's watsonx.ai platform. Note: watsonx.ai library is an optional dependency. | |
VoyageAI Embeddings | π Voyage AI provides cutting-edge embedding and rerankers. This will help you get started with VoyageAI embedding models using LanceDB. Using voyageai API requires voyageai package. Install it via pip . |
Multi-modal Embedding FunctionsπΌοΈ
Multi-modal embedding functions allow you to query your table using both images and text. π¬πΌοΈ
π Available Multi-modal Embeddings
Embedding | Description | Documentation |
---|---|---|
OpenClip Embeddings | π¨ We support CLIP model embeddings using the open source alternative, open-clip which supports various customizations. | |
Imagebind Embeddings | π We have support for imagebind model embeddings. You can download our version of the packaged model via - pip install imagebind-packaged==0.1.2 . |
|
Jina Multi-modal Embeddings | π Jina embeddings can also be used to embed both text and image data, only some of the models support image data and you can check the detailed documentation. π |
Note
If you'd like to request support for additional embedding functions, please feel free to open an issue on our LanceDB GitHub issue page.