Skip to content

MemWAL Index

The MemTable and Write-Ahead Log (MemWAL) Index is used for fast upserts into the Lance table.

The index is used as the centralized synchronization system for a log-structured merge tree (LSM-tree), leaving the actual implementation of the MemTable and WAL up to the specific implementer of the spec.

Each region represents a single writer that writes to both a MemTable and a WAL, and a region can have increasing generations of MemWALs. Every time data is written into a WAL, the index is updated with the latest watermark. If a specific writer of a region dies, a new writer is able to read the information in the specific region and replay the WAL.

Index Details

message MemWalIndexDetails {

  repeated MemWal mem_wal_list = 1;

  message MemWalId {
    // The name of the region that this specific MemWAL is responsible for.
    string region = 1;

    // The generation of the MemWAL.
    // Every time a new MemWAL is created and an old one is sealed,
    // the generation number of the next MemWAL is incremented.
    // At any given point of time for all MemWALs of the same name,
    // there must be only 1 generation that is not sealed.
    uint64 generation = 2;
  }

  // A combination of MemTable and WAL for fast upsert.
  message MemWal {

    enum State {
      // MemWAL is open and accepting new entries
      OPEN = 0;
      // When a MemTable is considered full, the writer should update this MemWAL as sealed
      // and create a new MemWAL to write to atomically.
      SEALED = 1;
      // When a MemTable is sealed, it can be flushed asynchronously to disk.
      // This state indicates the data has been persisted to disk but not yet merged
      // into the source table.
      FLUSHED = 2;
      // When the flushed data has been merged into the source table.
      // After a MemWAL is merged, the cleanup process can delete the WAL.
      MERGED = 3;
    }

    MemWalId id = 1;

    // The MemTable location, which is likely an in-memory address starting with memory://.
    // The actual details of how the MemTable is stored is outside the concern of Lance.
    string mem_table_location = 2;

    // the root location of the WAL.
    // THe WAL storage durability determines the data durability.
    // This location is immutable once set at MemWAL creation time.
    string wal_location = 3;

    // All entries in the WAL, serialized as U64Segment.
    // Each entry in the WAL has a uint64 sequence ID starting from 0.
    // The actual details of how the WAL entry is stored is outside the concern of Lance.
    // In most cases this U64Segment should be a simple range.
    // Every time the writer starts writing, it must always try to atomically write to the last entry ID + 1.
    // If fails due to concurrent writer, it then tries to write to the +2, +3, +4, etc. entry ID until succeed.
    // but if there are 2 writers accidentally writing to the same WAL concurrently,
    // although one writer will fail to update this index at commit time,
    // the WAL entry is already written,
    // causing some holes within the U64Segment range.
    bytes wal_entries = 4;

    // The current state of the MemWAL, indicating its lifecycle phase.
    // States progress: OPEN -> SEALED -> FLUSHED
    // OPEN: MemWAL is accepting new WAL entries
    // SEALED: MemWAL has been sealed and no longer accepts new WAL entries
    // FLUSHED: MemWAL has been flushed to the source Lance table and can be cleaned up
    State state = 5;

    // The owner identifier for this MemWAL, used for compare-and-swap operations.
    // When a writer wants to perform any operation on this MemWAL, it must provide
    // the expected owner_id. This serves as an optimistic lock to prevent concurrent
    // writers from interfering with each other. When a new writer starts replay,
    // it must first atomically update this owner_id to claim ownership.
    // All subsequent operations will fail if the owner_id has changed.
    string owner_id = 6;

    // The dataset version that last updated this MemWAL.
    // This is set to the new dataset version whenever the MemWAL is created or modified.
    uint64 last_updated_dataset_version = 7;
  }

}

Expected Use Pattern

It is expected that:

  1. there is exactly one writer for each region, guaranteed by optimistic update of the owner_id
  2. each writer updates the MemWAL index after a successful write to WAL and MemTable
  3. a new writer always finds unsealed MemWALs and performs replay before accepting new writes
  4. background processes are responsible for merging flushed MemWALs to the main Lance table, and making index up to date.
  5. a MemWAL-aware reader is able to merge results of MemTables in the MemWALs with results in the base Lance table.