Skip to content

Query

geneva.query.GenevaQuery

Bases: BaseModel

base

base: Query

shuffle

shuffle: bool | None = None

shuffle_seed

shuffle_seed: int | None = None

fragment_ids

fragment_ids: list[int] | None = None

with_row_address

with_row_address: bool | None = None

column_udfs

column_udfs: list[ColumnUDF] | None = None

extract_column_udfs

extract_column_udfs(
    packager: UDFPackager,
) -> list[ExtractedTransform]

Loads a set of transforms that reflect the column_udfs and map_batches_udfs of the query.

geneva.query.GenevaQueryBuilder

Bases: LanceEmptyQueryBuilder

A proxy that wraps LanceQueryBuilder and adds geneva-specific functionality.

schema

schema: Schema

select

select(
    columns: list[str] | Mapping[str, str | UDF],
) -> Self

Select the output columns of the query.

Parameters:

  • columns (list[str] | Mapping[str, str | UDF]) –

    The columns to select.

    If a list of strings, each string is the name of a column to select.

    If a dictionary of strings then the key is the output name of the column and the value is either an SQL expression (str) or a UDF.

shuffle

shuffle(seed: int | None = None) -> Self

Shuffle the rows of the table

enable_internal_api

enable_internal_api() -> Self

Enable internal APIs WARNING: Internal APIs are subject to change

with_fragments

with_fragments(fragments: list[int] | int) -> Self

Filter the rows of the table to only include the specified fragments.

with_row_address

with_row_address() -> Self

Include the physical row address in the result WARNING: INTERNAL API DETAIL

with_where_as_bool_column

with_where_as_bool_column() -> Self

Include the filter selected column in the result instead of just selected rows

to_query_object

to_query_object() -> GenevaQuery

from_query_object

from_query_object(
    table: Table, query: GenevaQuery
) -> GenevaQueryBuilder

take_rows

take_rows(rows: list[int]) -> Table

to_batches

to_batches(
    batch_size: int | None = None,
    *,
    timeout: timedelta | None = None,
) -> RecordBatchReader

to_arrow

to_arrow(*args, timeout: timedelta | None = None) -> Table

rerank

rerank(reranker: Reranker) -> Self

create_materialized_view

create_materialized_view(
    conn: Connection, view_name: str
) -> Table

Creates a materialized view of the table.

The materialized view will be a table that contains the result of the query. The view will be populated via a pipeline job.

Parameters:

  • conn (Connection) –

    A connection to the database to create the view in.

  • view_name (str) –

    The name of the view to create.

Raises:

  • UserWarning

    If the source table does not have stable row IDs enabled. Without stable row IDs, incremental refresh is only supported when refreshing to the same source version. Attempting to refresh to a different version will fail.