LanceDB - Open Source Multimodal Database

LanceDB OSS is an open-source, batteries-included embedded multimodal database that you can run on your own infrastructure. "Embedded" means that it runs in-process, making it incredibly simple to self-host your own AI retrieval workflows for RAG and more. No servers, no hassle.

It is a multimodal vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale data of different modalities. Most existing vector databases that store and query just the embeddings and their metadata. The actual data is stored elsewhere, requiring you to manage their storage and versioning separately.

LanceDB can be run in a number of ways:

Embedded within an existing backend (like your Django, Flask, Node.js or FastAPI application)
Directly from a client application like a Jupyter notebook for analytical workloads
Deployed as a remote serverless database

LanceDB supports storage of the actual data itself, alongside the embeddings and metadata. You can persist your images, videos, text documents, audio files and more in the Lance format, which provides automatic data versioning and blazing fast retrievals and filtering via LanceDB.

The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.

Both the database and the underlying data format are designed from the ground up to be easy-to-use, scalable and cost-effective.