Async APIĀ¶
We demonstrate the following functionalities suppored by LanceDB using our asynchonous APIs:
- Automatic versioning
- Instant rollback
- Appends, updates, deletions
- Schema evolution
Let's first prepare the data. We will be using a CSV file with a bunch of quotes from Rick and Morty
!wget http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv
!head rick_and_morty_quotes.csv
--2024-12-17 15:58:31-- http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv Resolving vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)... 3.5.84.162, 3.5.76.76, 52.92.228.138, ... Connecting to vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)|3.5.84.162|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 8236 (8.0K) [text/csv] Saving to: ārick_and_morty_quotes.csv.3ā rick_and_morty_quot 100%[===================>] 8.04K --.-KB/s in 0s 2024-12-17 15:58:31 (160 MB/s) - ārick_and_morty_quotes.csv.3ā saved [8236/8236] id,author,quote 1,Rick," Morty, you got to come on. You got to come with me." 2,Morty," Rick, whatās going on?" 3,Rick," I got a surprise for you, Morty." 4,Morty," Itās the middle of the night. What are you talking about?" 5,Rick," I got a surprise for you." 6,Morty," Ow! Ow! Youāre tugging me too hard." 7,Rick," I got a surprise for you, Morty." 8,Rick," What do you think of this flying vehicle, Morty? I built it out of stuff I found in the garage." 9,Morty," Yeah, Rick, itās great. Is this the surprise?"
Let's load this into a pandas dataframe.
It's got 3 columns, a quote id, the quote string, and the first name of the author of the quote:
import pandas as pd
df = pd.read_csv("rick_and_morty_quotes.csv")
df.head()
id | author | quote | |
---|---|---|---|
0 | 1 | Rick | Morty, you got to come on. You got to come wi... |
1 | 2 | Morty | Rick, whatās going on? |
2 | 3 | Rick | I got a surprise for you, Morty. |
3 | 4 | Morty | Itās the middle of the night. What are you ta... |
4 | 5 | Rick | I got a surprise for you. |
Creating a LanceDB table from a pandas dataframe is straightforward using create_table
We'll start with a local LanceDB connection
!pip install lancedb -q
import lancedb
async_db = await lancedb.connect_async("~/.lancedb")
await async_db.drop_table("rick_and_morty")
async_table = await async_db.create_table("rick_and_morty", df, mode="overwrite")
await async_table.to_pandas()
[2024-12-17T23:58:46Z WARN lance::dataset::write::insert] No existing dataset at ~/.lancedb/rick_and_morty.lance, it will be created
id | author | quote | |
---|---|---|---|
0 | 1 | Rick | Morty, you got to come on. You got to come wi... |
1 | 2 | Morty | Rick, whatās going on? |
2 | 3 | Rick | I got a surprise for you, Morty. |
3 | 4 | Morty | Itās the middle of the night. What are you ta... |
4 | 5 | Rick | I got a surprise for you. |
5 | 6 | Morty | Ow! Ow! Youāre tugging me too hard. |
6 | 7 | Rick | I got a surprise for you, Morty. |
7 | 8 | Rick | What do you think of this flying vehicle, Mor... |
8 | 9 | Morty | Yeah, Rick, itās great. Is this the surprise? |
9 | 10 | Rick | Morty, I had to I had to I had to I had to ma... |
UpdatesĀ¶
Now, since Rick is the smartest man in the multiverse, he deserves to have his quotes attributed to his full name: Richard Daniel Sanchez.
This can be done via LanceTable.update
. It needs two arguments:
- A
where
string filter (sql syntax) to determine the rows to update - A dict of
updates
where the keys are the column names to update and the values are the new values
await async_table.update(where="author='Morty'", updates={"author": "Richard Daniel Sanchez"})
await async_table.to_pandas()
id | author | quote | |
---|---|---|---|
0 | 1 | Rick | Morty, you got to come on. You got to come wi... |
1 | 3 | Rick | I got a surprise for you, Morty. |
2 | 5 | Rick | I got a surprise for you. |
3 | 7 | Rick | I got a surprise for you, Morty. |
4 | 8 | Rick | What do you think of this flying vehicle, Mor... |
5 | 10 | Rick | Morty, I had to I had to I had to I had to ma... |
6 | 12 | Rick | Weāre gonna drop it down there just get a who... |
7 | 14 | Rick | Come on, Morty. Just take it easy, Morty. Itā... |
8 | 16 | Rick | When I drop the bomb you know, I want you to ... |
9 | 18 | Rick | And Jessicaās gonna be Eve,ā¦ |
Schema evolutionĀ¶
Let's add a new_id
column to the table, where each value is the original id
plus 1.
await async_table.add_columns({"new_id": "id + 1"})
await async_table.to_pandas()
id | author | quote | new_id | |
---|---|---|---|---|
0 | 1 | Rick | Morty, you got to come on. You got to come wi... | 2 |
1 | 3 | Rick | I got a surprise for you, Morty. | 4 |
2 | 5 | Rick | I got a surprise for you. | 6 |
3 | 7 | Rick | I got a surprise for you, Morty. | 8 |
4 | 8 | Rick | What do you think of this flying vehicle, Mor... | 9 |
5 | 10 | Rick | Morty, I had to I had to I had to I had to ma... | 11 |
6 | 12 | Rick | Weāre gonna drop it down there just get a who... | 13 |
7 | 14 | Rick | Come on, Morty. Just take it easy, Morty. Itā... | 15 |
8 | 16 | Rick | When I drop the bomb you know, I want you to ... | 17 |
9 | 18 | Rick | And Jessicaās gonna be Eve,ā¦ | 19 |
If we look at the schema, we see that a new int64 column was added
await async_table.schema()
id: int64 author: string quote: string new_id: int64
RollbackĀ¶
Suppose we used the table and found that the new column should be a different value. How do we use another new column without losing the change history?
First, major operations are automatically versioned in LanceDB. Version 1 is the table creation, with the initial insertion of data. Versions 2 and 3 represents the update (deletion + append) Version 4 is adding the new column.
await async_table.checkout_latest()
await async_table.list_versions()
[{'version': 1, 'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259), 'metadata': {}}, {'version': 2, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948), 'metadata': {}}, {'version': 3, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165), 'metadata': {}}]
We can restore version 3, before we added the new_id
vector column
await async_table.checkout(2)
await async_table.restore()
await async_table.to_pandas()
id | author | quote | |
---|---|---|---|
0 | 1 | Rick | Morty, you got to come on. You got to come wi... |
1 | 3 | Rick | I got a surprise for you, Morty. |
2 | 5 | Rick | I got a surprise for you. |
3 | 7 | Rick | I got a surprise for you, Morty. |
4 | 8 | Rick | What do you think of this flying vehicle, Mor... |
5 | 10 | Rick | Morty, I had to I had to I had to I had to ma... |
6 | 12 | Rick | Weāre gonna drop it down there just get a who... |
7 | 14 | Rick | Come on, Morty. Just take it easy, Morty. Itā... |
8 | 16 | Rick | When I drop the bomb you know, I want you to ... |
9 | 18 | Rick | And Jessicaās gonna be Eve,ā¦ |
Notice that we now have one more, not less versions. When we restore an old version, we're not deleting the version history, we're just creating a new version where the schema and data is equivalent to the restored old version. In this way, we can keep track of all of the changes and always rollback to a previous state.
await async_table.list_versions()
[{'version': 1, 'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259), 'metadata': {}}, {'version': 2, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948), 'metadata': {}}, {'version': 3, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165), 'metadata': {}}, {'version': 4, 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 22, 800694), 'metadata': {}}]
Add another new columnĀ¶
Now we'll change the value of the new_id
column and add it to the restored dataset again
await async_table.add_columns({"new_id": "id + 10"})
await async_table.schema()
id: int64 author: string quote: string new_id: int64
DeletionĀ¶
What if the whole show was just Rick-isms? Let's delete any quote not said by Rick
await async_table.delete("author != 'Richard Daniel Sanchez'")
We can see that the number of rows has been reduced to 30
await async_table.count_rows()
34
Ok we had our fun, let's get back to the full quote set
await async_table.checkout(5)
await async_table.restore()
await async_table.count_rows()
99
HistoryĀ¶
We now have 9 versions in the data. We can review the operations that corresponds to each version below:
await async_table.version()
6
Versions:
- 1 - Create
- 2 - Update
- 3 - Add a new column
- 4 - Restore (2)
- 5 - Add a new column
- 6 - Delete
- 7 - Restore
SummaryĀ¶
We never had to explicitly manage the versioning. And we never had to create expensive and slow snapshots. LanceDB automatically tracks the full history of operations I created and supports fast rollbacks. In production this is critical for debugging issues and minimizing downtime by rolling back to a previously successful state in seconds.