Full-Text Search Index
LanceDB Cloud and Enterprise provide performant full-text search based on BM25, allowing you to incorporate keyword-based search in your retrieval solutions.
Note
The create_fts_index
API returns immediately, but the building of the FTS index is asynchronous.
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect({
uri: "db://your-project-slug",
apiKey: "your-api-key",
region: "us-east-1"
});
const tableName = "lancedb-cloud-quickstart"
const table = openTable(tableName);
await table.createIndex("text", {
config: lancedb.Index.fts()
});
Check FTS index status using the methods above.
FTS Configuration Parameters
LanceDB supports the following configurable parameters for full-text search:
Parameter | Type | Default | Description |
---|---|---|---|
with_position | bool | True | Store token positions (required for phrase queries) |
base_tokenizer | str | "simple" | Text splitting method: - "simple": Split by whitespace/punctuation - "whitespace": Split by whitespace only - "raw": Treat as single token |
language | str | "English" | Language for tokenization (stemming/stop words) |
max_token_length | int | 40 | Maximum token size in bytes; tokens exceeding this length are omitted from the index |
lower_case | bool | True | Convert tokens to lowercase |
stem | bool | False | Apply stemming (e.g., "running" → "run") |
remove_stop_words | bool | False | Remove common stop words |
ascii_folding | bool | False | Normalize accented characters |
Note
- The
max_token_length
parameter helps optimize indexing performance by filtering out non-linguistic content like base64 data and long URLs - When
with_position
is disabled, phrase queries will not work, but index size is reduced and indexing is faster ascii_folding
is useful for handling international text (e.g., "café" → "cafe")