Guide for New Contributors¶
This is a guide for new contributors to the Lance project. Even if you have no previous experience with python, rust, and open source, you can still make an non-trivial impact by helping us improve documentation, examples, and more. For experienced developers, the issues you can work on run the gamut from warm-ups to serious challenges in python and rust.
If you have any questions, please join our Discord for real-time support. Your feedback is always welcome!
Getting Started¶
- Join our Discord and say hi
- Setup your development environment
- Pick an issue to work on. See https://github.com/lancedb/lance/contribute for good first issues.
- Have fun!
Development Environment¶
Currently Lance is implemented in Rust and comes with a Python wrapper. So you'll want to make sure you setup both.
- Install Rust: https://www.rust-lang.org/tools/install
- Install Python 3.9+: https://www.python.org/downloads/
- Install protoctol buffers: https://grpc.io/docs/protoc-installation/ (make sure you have version 3.20 or higher)
- Install commit hooks:
a. Install pre-commit: https://pre-commit.com/#install
b. Run
pre-commit install
in the root of the repo
Sample Workflow¶
- Fork the repo
- Pick Github issue
- Create a branch for the issue
- Make your changes
- Create a pull request from your fork to lancedb/lance
- Get feedback and iterate
- Merge!
- Go back to step 2
Python Development¶
The python integration is done via pyo3 + custom python code:
- The Rust code that directly supports the Python bindings are under
python/src
while the pure Python code lives underpython/python
. - We make wrapper classes in Rust for Dataset/Scanner/RecordBatchReader that's exposed to python.
- These are then used by LanceDataset / LanceScanner implementations that extend pyarrow Dataset/Scanner for duckdb compat.
- Data is delivered via the Arrow C Data Interface
To build the Python bindings, first install requirements:
To make a dev install:
After installing, you can run import lance
in a Python shell within the virtual environment.
To run tests and integration tests:
To run the tests on OS X, you may need to increase the default limit on the number of open files:
ulimit -n 2048
Rust Development¶
To format and lint Rust code:
Core Format¶
The core format is implemented in Rust under the rust
directory. Once you've setup Rust you can build the core format with:
This builds the debug build. For the optimized release build:
To run the Rust unit tests:
If you're working on a performance related feature, benchmarks can be run via:
Documentation¶
Main website¶
The main documentation website is built using mkdocs-material. To build the docs, first install requirements:
Then build and start the docs server:
Python Generated Doc¶
Python code documentation is built using Sphinx in lance-python-doc, and published through Github Pages in ReadTheDocs style.
Rust Generated Doc¶
Rust code documentation is built and published to the Rust official docs website as a part of the release process.
Example Notebooks¶
Example notebooks are under examples
.
These are standalone notebooks you should be able to download and run.
Benchmarks¶
Our Rust benchmarks are run multiple times a day and the history can be found here.
Separately, we have vector index benchmarks that test against the sift1m dataset, as well as benchmarks for tpch.
These live under benchmarks
.
Code of Conduct¶
We follow the Code of Conduct of Python Foundation and Rust Foundation.