@lancedb/lancedb β’ Docs
@lancedb/lancedb / FtsOptions
Interface: FtsOptions
Options to create a full text search index
Properties
asciiFolding?
whether to remove punctuation
baseTokenizer?
The tokenizer to use when building the index. The default is "simple".
The following tokenizers are available:
"simple" - Simple tokenizer. This tokenizer splits the text into tokens using whitespace and punctuation as a delimiter.
"whitespace" - Whitespace tokenizer. This tokenizer splits the text into tokens using whitespace as a delimiter.
"raw" - Raw tokenizer. This tokenizer does not split the text into tokens and indexes the entire text as a single token.
language?
language for stemming and stop words
this is only used when stem
or remove_stop_words
is true
lowercase?
whether to lowercase tokens
maxTokenLength?
maximum token length tokens longer than this length will be ignored
removeStopWords?
whether to remove stop words
stem?
whether to stem tokens
withPosition?
Whether to build the index with positions. True by default. If set to false, the index will not store the positions of the tokens in the text, which will make the index smaller and faster to build, but will not support phrase queries.