a week ago
I am currently exploring the possibility of using Databricks AI Genie to allow layman users to ask questions and retrieve data on their own.
We would like to keep the data in our warehouse (e.g., Snowflake or local). I read the documentation, but it seems like the data must be uploaded to the Databricks server to use Genie. I'm wondering if, rather than uploading the data to Databricks, is there a way for Genie to read the data that on another platform—such as by using a Snowflake connector or even accessing it from a local host. Also, how secure is Genie AI? Thank you!
a week ago
Hi, @ChrisChan
You’re absolutely right that the data used with Genie needs to be managed under Unity Catalog. However, if you want Genie to query data in Snowflake, you can use lakehouse federation. I’ve personally tried this method, and it worked successfully for me.
Additionally, this documentation might be helpful regarding Genie's security features. Apologies if you’re already familiar with it:
https://docs.databricks.com/en/genie/index.html#privacy-and-security
a week ago
Hi, @ChrisChan
You’re absolutely right that the data used with Genie needs to be managed under Unity Catalog. However, if you want Genie to query data in Snowflake, you can use lakehouse federation. I’ve personally tried this method, and it worked successfully for me.
Additionally, this documentation might be helpful regarding Genie's security features. Apologies if you’re already familiar with it:
https://docs.databricks.com/en/genie/index.html#privacy-and-security
a week ago
Many thanks for your advice! It works well with lakehouse federation.
a week ago
Sorry, one more question. I successfully use Genie to query data via lakehouse federation, but I also see there is a limitation that Single user access mode is only available for users owning the connection. From your experience, is that means the user must have the ownership of the connection like (edit, remove etc). Thanks
a week ago
If this is about access permissions to data within Genie, I thought the following documentation might be helpful
https://docs.databricks.com/en/genie/index.html#required-permissions
Data access permissions: Any user who interacts with the space needs at least SELECT privileges on the data used in a space.
Genie space permissions: Users need CAN RUN permissions on the Genie space to interact with Genie and the data used in the space. See Genie space ACLs for a complete mapping of privileges and abilities for a Genie space.
Friday
I'll take a stab at "Also, how secure is Genie AI?" since I've dug into this for our own uses. There aren't many moving parts to Genie, it's really just a fine-tuned LLM and the rest is the same stuff you use in your notebooks.
The most insecure part of Genie I could find is that it uses serverless compute, and serverless compute is hosted in Databricks' tenant, not yours. This means for a brief period of time, the prompt and metadata exist in the memory of a VM hosted outside your realm. Per the docs, serverless compute nodes are isolated from one another but to me there is an "ick factor" when I make the statement "our data never leaves our environment" to the business and then I have to explain this technicality to InfoSec.
Genie itself does not access any of your data directly. The prompt and your metadata are sent to the Genie model, which then generates a SQL statement. This SQL is then executed on the serverless compute engine against the data stored in your tenant, same as if you were using a notebook or DLT job with serverless compute.
The LLM behind Genie is currently the Azure Open AI Model, which is Microsoft's hosted version of the LLM behind ChatGPT, and Databricks opted in to "exemption from abuse monitoring and human review program, under which Microsoft does not store any prompts and completions sent to the Azure OpenAI service" (see Work with an AI/BI Genie space - Azure Databricks | Microsoft Learn). If you're on AWS or GCP I'd expect the models are different but I didn't check.
My reply here is what I understand at this time, but security is fight club so solid answers are difficult to come by.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group