Gemini
geneva.udfs.gemini.gemini_udf
gemini_udf(
column: str,
prompt: str,
model: str = "gemini-2.5-flash",
mime_type: str | None = None,
api_key_env: str = "GEMINI_API_KEY",
version: str | None = None,
) -> UDF
Return a Gemini UDF with the API key captured from the local environment.
The API key is read from os.environ[api_key_env] at call time and
serialized with the UDF. On remote workers the key is available without
cluster-level env_vars configuration.
Supports both text and binary (e.g. image) columns. For text columns
the prompt is prepended to each value. For binary columns the raw bytes
are sent as inline data with the given mime_type alongside the prompt.
The column type is detected at runtime from the Arrow array; pass
mime_type when the column contains binary data.
Parameters:
-
column(str) –Name of the input column.
-
prompt(str) –Instruction sent to Gemini alongside each row's value.
-
model(str, default:'gemini-2.5-flash') –Gemini model identifier (default
gemini-2.5-flash). -
mime_type(str | None, default:None) –MIME type for binary columns. Required when the input column contains binary data; ignored for text columns.
Supported types:
- Image —
image/jpeg,image/png,image/webp,image/heic,image/heif(docs <https://ai.google.dev/gemini-api/docs/vision>_) - Audio —
audio/wav,audio/mp3,audio/aac,audio/flac,audio/aiff,audio/ogg(docs <https://ai.google.dev/gemini-api/docs/audio>_) - Video —
video/mp4,video/mpeg,video/webm,video/mov,video/avi,video/x-flv,video/wmv,video/mpg,video/3gpp(docs <https://ai.google.dev/gemini-api/docs/video-understanding>_) - Document —
application/pdf,text/plain
Note: inline data is limited to 20 MB per request.
- Image —
-
api_key_env(str, default:'GEMINI_API_KEY') –Environment variable that holds the API key (default
GEMINI_API_KEY). -
version(str | None, default:None) –Explicit version string for the UDF so that key rotation does not change the UDF hash and trigger a re-backfill.
-
Requires–pip install 'geneva[udf-text-gemini]'
Returns:
-
UDF–A UDF instance ready to be registered with a Geneva dataset.
Examples:
Caption images with a one-sentence description:
>>> udf = gemini_udf(
... column="image",
... prompt="Provide a 1 sentence description of the scene",
... mime_type="image/jpeg",
... )
>>> table.add_columns({"caption": udf})
Summarise text documents: