Reading Lance Datasets¶ Basic Reading¶ PythonScalaJava df = (spark.read .format("lance") .option("db", "/path/to/lance/database") .option("dataset", "my_dataset") .load()) val df = spark.read. format("lance"). option("db", "/path/to/lance/database"). option("dataset", "my_dataset"). load() Dataset<Row> df = spark.read() .format("lance") .option("db", "/path/to/lance/database") .option("dataset", "my_dataset") .load(); Column Selection¶ Lance is a columnar format. You can specify which columns to read to improve performance: PythonScalaJava df = (spark.read .format("lance") .option("db", "/path/to/lance/database") .option("dataset", "my_dataset") .load() .select("id", "name", "age")) val df = spark.read. format("lance"). option("db", "/path/to/lance/database"). option("dataset", "my_dataset"). load(). select("id", "name", "age") Dataset<Row> df = spark.read() .format("lance") .option("db", "/path/to/lance/database") .option("dataset", "my_dataset") .load() .select("id", "name", "age"); Filters¶ You can apply filters to a read. The filter is pushed down to reduce the amount of data read: PythonScalaJava from pyspark.sql.functions import col filtered = (spark.read .format("lance") .option("db", "/path/to/database") .option("dataset", "users") .load() .filter( col("age").between(25, 65) & col("department") == "Engineering" & col("is_active") == True )) import org.apache.spark.sql.functions.col val filtered = spark.read. format("lance"). option("db", "/path/to/database"). option("dataset", "users"). load(). filter( col("age").between(25, 65) && col("department") === "Engineering" && col("is_active") === true ) Dataset<Row> filtered = spark.read() .format("lance") .option("db", "/path/to/database") .option("dataset", "users") .load() .filter("age BETWEEN 25 AND 65 AND department = 'Engineering' AND is_active = true");