MemWAL Index¶
The MemTable and Write-Ahead Log (MemWAL) Index is used for fast upserts into the Lance table.
The index is used as the centralized synchronization system for a log-structured merge tree (LSM-tree), leaving the actual implementation of the MemTable and WAL up to the specific implementer of the spec.
Each region represents a single writer that writes to both a MemTable and a WAL, and a region can have increasing generations of MemWALs. Every time data is written into a WAL, the index is updated with the latest watermark. If a specific writer of a region dies, a new writer is able to read the information in the specific region and replay the WAL.
Index Details¶
message MemWalIndexDetails {
repeated MemWal mem_wal_list = 1;
message MemWalId {
// The name of the region that this specific MemWAL is responsible for.
string region = 1;
// The generation of the MemWAL.
// Every time a new MemWAL is created and an old one is sealed,
// the generation number of the next MemWAL is incremented.
// At any given point of time for all MemWALs of the same name,
// there must be only 1 generation that is not sealed.
uint64 generation = 2;
}
// A combination of MemTable and WAL for fast upsert.
message MemWal {
enum State {
// MemWAL is open and accepting new entries
OPEN = 0;
// When a MemTable is considered full, the writer should update this MemWAL as sealed
// and create a new MemWAL to write to atomically.
SEALED = 1;
// When a MemTable is sealed, it can be flushed asynchronously to disk.
// This state indicates the data has been persisted to disk but not yet merged
// into the source table.
FLUSHED = 2;
// When the flushed data has been merged into the source table.
// After a MemWAL is merged, the cleanup process can delete the WAL.
MERGED = 3;
}
MemWalId id = 1;
// The MemTable location, which is likely an in-memory address starting with memory://.
// The actual details of how the MemTable is stored is outside the concern of Lance.
string mem_table_location = 2;
// the root location of the WAL.
// THe WAL storage durability determines the data durability.
// This location is immutable once set at MemWAL creation time.
string wal_location = 3;
// All entries in the WAL, serialized as U64Segment.
// Each entry in the WAL has a uint64 sequence ID starting from 0.
// The actual details of how the WAL entry is stored is outside the concern of Lance.
// In most cases this U64Segment should be a simple range.
// Every time the writer starts writing, it must always try to atomically write to the last entry ID + 1.
// If fails due to concurrent writer, it then tries to write to the +2, +3, +4, etc. entry ID until succeed.
// but if there are 2 writers accidentally writing to the same WAL concurrently,
// although one writer will fail to update this index at commit time,
// the WAL entry is already written,
// causing some holes within the U64Segment range.
bytes wal_entries = 4;
// The current state of the MemWAL, indicating its lifecycle phase.
// States progress: OPEN -> SEALED -> FLUSHED
// OPEN: MemWAL is accepting new WAL entries
// SEALED: MemWAL has been sealed and no longer accepts new WAL entries
// FLUSHED: MemWAL has been flushed to the source Lance table and can be cleaned up
State state = 5;
// The owner identifier for this MemWAL, used for compare-and-swap operations.
// When a writer wants to perform any operation on this MemWAL, it must provide
// the expected owner_id. This serves as an optimistic lock to prevent concurrent
// writers from interfering with each other. When a new writer starts replay,
// it must first atomically update this owner_id to claim ownership.
// All subsequent operations will fail if the owner_id has changed.
string owner_id = 6;
// The dataset version that last updated this MemWAL.
// This is set to the new dataset version whenever the MemWAL is created or modified.
uint64 last_updated_dataset_version = 7;
}
}
Expected Use Pattern¶
It is expected that:
- there is exactly one writer for each region, guaranteed by optimistic update of the owner_id
- each writer updates the MemWAL index after a successful write to WAL and MemTable
- a new writer always finds unsealed MemWALs and performs replay before accepting new writes
- background processes are responsible for merging flushed MemWALs to the main Lance table, and making index up to date.
- a MemWAL-aware reader is able to merge results of MemTables in the MemWALs with results in the base Lance table.