Vector embedding search using TransformersJS

Embed and query data from LanceDB using TransformersJS

transformersjs

This example shows how to use the transformers.js library to perform vector embedding search using LanceDB's Javascript API.

Setting up

First, install the dependencies:

npm install vectordb
npm i @xenova/transformers

We will also be using the all-MiniLM-L6-v2 model to make it compatible with Transformers.js

Within our index.js file we will import the necessary libraries and define our model and database:

const lancedb = require('vectordb')
const { pipeline } = await import('@xenova/transformers')
const pipe = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

Creating the embedding function

Next, we will create a function that will take in a string and return the vector embedding of that string. We will use the pipe function we defined earlier to get the vector embedding of the string.

// Define the function. `sourceColumn` is required for LanceDB to know
// which column to use as input.
const embed_fun = {}
embed_fun.sourceColumn = 'text'
embed_fun.embed = async function (batch) {
    let result = []
    // Given a batch of strings, we will use the `pipe` function to get
    // the vector embedding of each string.
    for (let text of batch) {
        // 'mean' pooling and normalizing allows the embeddings to share the
        // same length.
        const res = await pipe(text, { pooling: 'mean', normalize: true })
        result.push(Array.from(res['data']))
    }
    return (result)
}

Creating the database

Now, we will create the LanceDB database and add the embedding function we defined earlier.

// Link a folder and create a table with data
const db = await lancedb.connect('data/sample-lancedb')

// You can also import any other data, but make sure that you have a column
// for the embedding function to use.
const data = [
    { id: 1, text: 'Cherry', type: 'fruit' },
    { id: 2, text: 'Carrot', type: 'vegetable' },
    { id: 3, text: 'Potato', type: 'vegetable' },
    { id: 4, text: 'Apple', type: 'fruit' },
    { id: 5, text: 'Banana', type: 'fruit' }
]

// Create the table with the embedding function
const table = await db.createTable('food_table', data, "create", embed_fun)

Performing the search

Now, we can perform the search using the search function. LanceDB automatically uses the embedding function we defined earlier to get the vector embedding of the query string.

// Query the table
const results = await table
    .search("a sweet fruit to eat")
    .metricType("cosine")
    .limit(2)
    .execute()
console.log(results.map(r => r.text))

[ 'Banana', 'Cherry' ]

Output of results:

href="#__codelineno-6-1">[ { vector: Float32Array(384) [ -0.057455405592918396, 0.03617725893855095, -0.0367760956287384, ... 381 more items ], id: 5, text: 'Banana', type: 'fruit', _distance: 0.4919965863227844 }, { vector: Float32Array(384) [ 0.0009714411571621895, 0.008223623037338257, 0.009571489877998829, ... 381 more items ], id: 1, text: 'Cherry', type: 'fruit', _distance: 0.5540297031402588 } ]

Wrapping it up

In this example, we showed how to use the transformers.js library to perform vector embedding search using LanceDB's Javascript API. You can find the full code for this example on Github!