Cohere Embeddings
Using cohere API requires cohere package, which can be installed using pip install cohere
. Cohere embeddings are used to generate embeddings for text data. The embeddings can be used for various tasks like semantic search, clustering, and classification.
You also need to set the COHERE_API_KEY
environment variable to use the Cohere API.
Supported models are:
- embed-english-v3.0
- embed-multilingual-v3.0
- embed-english-light-v3.0
- embed-multilingual-light-v3.0
- embed-english-v2.0
- embed-english-light-v2.0
- embed-multilingual-v2.0
Supported parameters (to be passed in create
method) are:
Parameter | Type | Default Value | Description |
---|---|---|---|
name |
str |
"embed-english-v2.0" |
The model ID of the cohere model to use. Supported base models for Text Embeddings: embed-english-v3.0, embed-multilingual-v3.0, embed-english-light-v3.0, embed-multilingual-light-v3.0, embed-english-v2.0, embed-english-light-v2.0, embed-multilingual-v2.0 |
source_input_type |
str |
"search_document" |
The type of input data to be used for the source column. |
query_input_type |
str |
"search_query" |
The type of input data to be used for the query. |
Cohere supports following input types:
Input Type | Description |
---|---|
"search_document " |
Used for embeddings stored in a vector |
database for search use-cases. | |
"search_query " |
Used for embeddings of search queries |
run against a vector DB | |
"semantic_similarity " |
Specifies the given text will be used |
for Semantic Textual Similarity (STS) | |
"classification " |
Used for embeddings passed through a |
text classifier. | |
"clustering " |
Used for the embeddings run through a |
clustering algorithm |
Usage Example:
import lancedb
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import EmbeddingFunctionRegistry
cohere = EmbeddingFunctionRegistry
.get_instance()
.get("cohere")
.create(name="embed-multilingual-v2.0")
class TextModel(LanceModel):
text: str = cohere.SourceField()
vector: Vector(cohere.ndims()) = cohere.VectorField()
data = [ { "text": "hello world" },
{ "text": "goodbye world" }]
db = lancedb.connect("~/.lancedb")
tbl = db.create_table("test", schema=TextModel, mode="overwrite")
tbl.add(data)