Connection
geneva.db.Connection
Bases: DBConnection
Geneva Connection.
system_table_connection
system_table_connection: Connection
Connection used to access system tables.
resolve_system_table_location
resolve_system_table_location(
namespace: list[str] | None = None,
) -> tuple[Connection, list[str]]
Resolve the target connection/namespace for system table operations.
alter_or_create_system_table
alter_or_create_system_table(
table_name: str,
model: Any,
namespace: list[str] | None = None,
) -> tuple[Connection, list[str], Table]
Open or create a Geneva internal table in its resolved location.
namespace_client
Returns namespace client from the underlying LanceDB connection.
table_names
table_names(
page_token: str | None = None,
limit: int | None = None,
*args,
**kwargs,
) -> Iterable[str]
List all available tables and views.
list_tables
list_tables(
namespace_path: list[str] | None = None,
page_token: str | None = None,
limit: int | None = None,
) -> ListTablesResponse
List all tables in this database with pagination support.
Parameters:
-
namespace_path(list[str], default:None) –The namespace to list tables in. None or empty list represents root namespace.
-
page_token(str, default:None) –Token for pagination. Use the token from a previous response to get the next page of results.
-
limit(int, default:None) –The maximum number of results to return.
Returns:
-
ListTablesResponse–Response containing table names and optional page_token for pagination.
open_table
open_table(
name: str,
storage_options: dict[str, str] | None = None,
index_cache_size: int | None = None,
version: int | None = None,
namespace: list[str] | None = None,
*args,
**kwargs,
) -> Table
Open a Lance Table.
Parameters:
-
name(str) –Name of the table.
-
storage_options(dict[str, str] | None, default:None) –Additional options for the storage backend. Options already set on the connection will be inherited by the table, but can be overridden here. See available options at https://lancedb.github.io/lancedb/guides/storage/
-
namespace(list[str] | None, default:None) –Namespace path for the table (e.g., ["workspace"])
create_table
create_table(
name: str,
data: DATA | None = None,
schema: Schema | LanceModel | None = None,
mode: str = "create",
exist_ok: bool = False,
on_bad_vectors: str = "error",
fill_value: float = 0.0,
storage_options: dict[str, str] | None = None,
*args,
**kwargs,
) -> Table
Create a Table in the lake
Parameters:
-
name(str) –The name of the table
-
data(DATA | None, default:None) –User must provide at least one of
dataorschema. Acceptable types are:- list-of-dict
- pandas.DataFrame
- pyarrow.Table or pyarrow.RecordBatch
-
schema(Schema | LanceModel | None, default:None) –Acceptable types are:
- pyarrow.Schema
- lancedb.pydantic.LanceModel
-
mode(str, default:'create') –The mode to use when creating the table. Can be either "create" or "overwrite". By default, if the table already exists, an exception is raised. If you want to overwrite the table, use mode="overwrite".
-
exist_ok(bool, default:False) –If a table by the same name already exists, then raise an exception if exist_ok=False. If exist_ok=True, then open the existing table; it will not add the provided data but will validate against any schema that's specified.
-
on_bad_vectors(str, default:'error') –What to do if any of the vectors are not the same size or contain NaNs. One of "error", "drop", "fill".
create_view
create_view(
name: str, query: str, materialized: bool = False
) -> Table
Create a View from a Query.
Parameters:
-
name(str) –Name of the view.
-
query(str) –SQL query to create the view.
-
materialized(bool, default:False) –If True, the view is materialized.
create_materialized_view
create_materialized_view(
name: str,
query: GenevaQueryBuilder,
with_no_data: bool = True,
) -> Table
Create a materialized view
Parameters:
-
name(str) –Name of the materialized view.
-
query(GenevaQueryBuilder) –Query to create the view.
-
with_no_data(bool, default:True) –If True, the view is materialized, if false it is ready for refresh.
create_udtf_view
create_udtf_view(
name: str, source: GenevaQueryBuilder, udtf: UDTF
) -> Table
Create a UDTF-backed materialized view.
Warning
This API is in beta and may change in future releases.
The view is created empty; call view.refresh() to populate it.
Parameters:
-
name(str) –Name for the new view table.
-
source(GenevaQueryBuilder) –Query defining the source data.
-
udtf(UDTF) –The UDTF to execute on refresh.
create_scalar_udtf_view
create_scalar_udtf_view(
name: str,
source: GenevaQueryBuilder,
scalar_udtf: ScalarUDTF,
) -> Table
Create a scalar UDTF-backed materialized view (1:N row expansion).
Warning
This API is in beta and may change in future releases.
The view is created with placeholder rows (one per source row) that are
populated on view.refresh(). Each source row expands to zero or more
output rows via the scalar UDTF.
Parameters:
-
name(str) –Name for the new view table.
-
source(GenevaQueryBuilder) –Query defining the source data.
-
scalar_udtf(ScalarUDTF) –The scalar UDTF to execute on refresh.
define_cluster
define_cluster(name: str, cluster: GenevaCluster) -> None
Define a persistent Geneva cluster. This will upsert the cluster definition by
name. The cluster can then be provisioned using context(cluster=name).
Parameters:
-
name(str) –Name of the cluster. This will be used as the key when upserting and provisioning the cluster. The cluster name must comply with RFC 12123.
-
cluster(GenevaCluster) –The cluster definition to store.
list_clusters
list_clusters() -> list[GenevaCluster]
List the cluster definitions. These can be defined using define_cluster().
Returns:
-
list[GenevaCluster]–List of Geneva cluster definitions
delete_cluster
Delete a Geneva cluster definition.
Parameters:
-
name(str) –Name of the cluster to delete.
define_manifest
define_manifest(
name: str,
manifest: GenevaManifest,
uploader: Uploader | None = None,
) -> None
Define a persistent Geneva Manifest that represents the files and dependencies
used in the execution environment. This will upsert the manifest definition by
name and upload the required artifacts. The manifest can then be used with
context(manifest=name).
Parameters:
-
name(str) –Name of the manifest. This will be used as the key when upserting and loading the manifest.
-
manifest(GenevaManifest) –The manifest definition to use.
-
uploader(Uploader | None, default:None) –An optional, custom Uploader to use. If not provided, the uploader will be auto-detected based on the environment configuration.
list_manifests
list_manifests() -> list[GenevaManifest]
List the manifest definitions. These can be defined using define_manifest().
Returns:
-
list[GenevaManifest]–List of Geneva manifest definitions
delete_manifest
Delete a Geneva manifest definition.
Parameters:
-
name(str) –Name of the manifest to delete.
capture_local_environment
capture_local_environment(
name: str | None = None,
*,
skip_site_packages: bool = False,
) -> GenevaManifest
Capture and upload the caller's local environment.
Zips the workspace (and, by default, site-packages)
and uploads the resulting archives through this connection's
namespace-vended Uploader before returning. The returned
GenevaManifest carries the uploaded
zip URIs and is ready to be passed to @udf(manifest=...).
Note: the query node deployment must be configured for credential-vending
via vend_input_storage_options configuration
Typical usage::
db = geneva.connect(...)
manifest = db.capture_local_environment(skip_site_packages=True)
@udf(manifest=manifest)
def embed(text: str) -> list[float]: ...
Use create_pip
for declarative manifests when you do not want any local-environment
upload to happen.
Parameters:
-
name(str | None, default:None) –Optional manifest name. If omitted, an auto-generated name is assigned.
-
skip_site_packages(bool, default:False) –If True, capture only the workspace source and rely on the worker image's pre-installed dependencies.
Returns:
-
GenevaManifest–A manifest with
zipspopulated. No further resolution step is required.
Raises:
-
RuntimeError–If this connection cannot vend an Uploader (e.g. a
db://connection withoutnamespace_client_implconfigured).
local_ray_context
Context manager for a local Ray instance. This will provision a local Ray instance and return a context manager. This is useful for development or small jobs.
context
context(
cluster: str,
manifest: str | None = None,
on_exit=None,
wait_timeout: float | None = None,
log_to_driver: bool = True,
logging_level=INFO,
) -> AbstractContextManager[None]
Context manager for a Geneva Execution Environment. This will provision a cluster based on the cluster definition and the manifest provided. By default, the context manager will delete the cluster on exit. This can be configured with the on_exit parameter.
Parameters:
-
cluster(str) –Name of the persisted cluster definition to use. Required. This will raise an exception if the cluster definition was not defined via
define_cluster(). -
manifest(str | None, default:None) –Optional name of the persisted manifest to use. This will raise an exception if the manifest definition was not defined via
define_manifest(). If manifest is not provided, the local environment will be uploaded. -
on_exit–Exit mode for the cluster. By default, the cluster waits for all running jobs to complete before deleting. To retain the cluster when any job fails or the context body raises an exception, use
ExitMode.RETAIN_ON_FAILURE. To always retain the cluster, useExitMode.RETAIN. -
wait_timeout(float | None, default:None) –Internal/experimental. Maximum seconds to wait for tracked jobs during context exit. Only applies with DELETE or RETAIN_ON_FAILURE. None means wait indefinitely. For RETAIN_ON_FAILURE, a timeout is treated as a failure and the cluster is retained.
-
log_to_driver(bool, default:True) –Whether to send Ray worker logs to the driver. Defaults to True for better visibility in tests and debugging.
-
logging_level–The logging level for Ray workers. Use logging.DEBUG for detailed logs.
use_remote_dispatch
Whether table operations on this connection should dispatch
through phalanx's geneva v2 REST API (alter_table_backfill_columns etc.)
rather than running natively in-process.
True for client-side remote (db://) connections; False for
local/native connections and for executor-side connections (those
opened by geneva_driver with executor_mode=True) — executors
must run natively to avoid infinite dispatch recursion.
get_job
Get a job record by ID.
Reads from the _geneva_jobs system table via JobStateManager.
Works for both native and remote connections.