Skip to content

Cohere Embeddings

Using cohere API requires cohere package, which can be installed using pip install cohere. Cohere embeddings are used to generate embeddings for text data. The embeddings can be used for various tasks like semantic search, clustering, and classification. You also need to set the COHERE_API_KEY environment variable to use the Cohere API.

Supported models are:

  • embed-english-v3.0
  • embed-multilingual-v3.0
  • embed-english-light-v3.0
  • embed-multilingual-light-v3.0
  • embed-english-v2.0
  • embed-english-light-v2.0
  • embed-multilingual-v2.0

Supported parameters (to be passed in create method) are:

Parameter Type Default Value Description
name str "embed-english-v2.0" The model ID of the cohere model to use. Supported base models for Text Embeddings: embed-english-v3.0, embed-multilingual-v3.0, embed-english-light-v3.0, embed-multilingual-light-v3.0, embed-english-v2.0, embed-english-light-v2.0, embed-multilingual-v2.0
source_input_type str "search_document" The type of input data to be used for the source column.
query_input_type str "search_query" The type of input data to be used for the query.

Cohere supports following input types:

Input Type Description
"search_document" Used for embeddings stored in a vector
database for search use-cases.
"search_query" Used for embeddings of search queries
run against a vector DB
"semantic_similarity" Specifies the given text will be used
for Semantic Textual Similarity (STS)
"classification" Used for embeddings passed through a
text classifier.
"clustering" Used for the embeddings run through a
clustering algorithm

Usage Example:

    import lancedb
    from lancedb.pydantic import LanceModel, Vector
    from lancedb.embeddings import EmbeddingFunctionRegistry

    cohere = EmbeddingFunctionRegistry
        .get_instance()
        .get("cohere")
        .create(name="embed-multilingual-v2.0")

    class TextModel(LanceModel):
        text: str = cohere.SourceField()
        vector: Vector(cohere.ndims()) =  cohere.VectorField()

    data = [ { "text": "hello world" },
            { "text": "goodbye world" }]

    db = lancedb.connect("~/.lancedb")
    tbl = db.create_table("test", schema=TextModel, mode="overwrite")

    tbl.add(data)