@lancedb/lancedb • Docs
@lancedb/lancedb / Scannable
Class: Scannable¶
A data source that can be scanned as a stream of Arrow RecordBatches.
Scannable wraps the schema + optional row count + rescannable flag and
a callback that yields batches one at a time. It is passed to consumers
(e.g. Table.add, createTable, mergeInsert — follow-up work) that
need to pull data without materializing the full dataset in JS memory.
Batches cross the JS↔Rust boundary as Arrow IPC Stream messages; a fresh
writer serializes each batch, and the Rust side decodes it with
arrow_ipc::reader::StreamReader. One batch is in flight at a time.
Properties¶
numRows¶
rescannable¶
schema¶
Methods¶
fromFactory()¶
Build a Scannable from an explicit schema and a factory that returns a fresh batch iterator on each call.
The factory is invoked once per scan. Each iterator yields
RecordBatches matching the declared schema. Use this when you need
direct control over the pull loop — for example, to wrap a streaming
source whose batches are produced lazily.
Parameters¶
-
schema:
Schema<any> The Arrow schema of the produced batches. -
factory Called at the start of each scan to produce a batch iterator. Must be idempotent when
rescannableis true. -
opts:
ScannableOptions={}Optional hints.rescannabledefaults totrue; set tofalseif callingfactory()twice would not reproduce the same data.
Returns¶
Promise<Scannable>
fromIterable()¶
Build a Scannable from an iterable of RecordBatches. rescannable
defaults to false. Pass an explicit schema so the consumer can
validate before any batch is pulled.
opts.rescannable: true is honest for replayable iterables (Arrays,
Sets, or custom iterables whose [Symbol.iterator]() returns a fresh
iterator each call). It is rejected for one-shot iterables (generators,
async generators, or already-an-iterator inputs) because their
[Symbol.iterator]() returns the same exhausted object on the second
scan. For replayable sources outside this shape, use
fromFactory(schema, () => createIter(), { rescannable: true }).
Note: when opts.rescannable is true, the constructor calls
[Symbol.iterator]() once on the input to perform the structural check.
Parameters¶
-
schema:
Schema<any> -
iter:
Iterable<RecordBatch<any>> |AsyncIterable<RecordBatch<any>> -
opts:
ScannableOptions={}
Returns¶
Promise<Scannable>
fromRecordBatchReader()¶
Build a Scannable from an Arrow RecordBatchReader. A reader can only
be consumed once; rescannable defaults to false.
The reader must already be opened (via .open()) so its .schema is
populated. RecordBatchReader.from(...) returns an unopened reader.
opts.rescannable: true is rejected because RecordBatchReader is a
self-iterator (its [Symbol.iterator]() returns itself), and this
constructor does not call reader.reset() between scans, so a second
scan would always see an exhausted reader. For genuinely replayable
sources, use
fromFactory(schema, () => openReader(), { rescannable: true }),
which mints a fresh reader on each scan.
Parameters¶
-
reader:
RecordBatchReader<any> -
opts:
ScannableOptions={}
Returns¶
Promise<Scannable>
fromTable()¶
Build a Scannable from an in-memory Arrow Table. Always rescannable;
the table's batches are replayed on each scan.
The table's row count is authoritative: opts.numRows must either be
omitted or equal to table.numRows. opts.rescannable of false is
rejected because in-memory Tables are always rescannable.
Parameters¶
-
table:
Table<any> -
opts:
ScannableOptions={}
Returns¶
Promise<Scannable>