cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Genie Agent integration issue

yj940525
New Contributor III

Hi, anyone from development team for Genie Agent integration?  i had an issue of using sample code of Genie Agent integration. The issue is that underlying code (databricks_ai_bridge/genie.py) cannot connect to url openaipublic.blob.core.windows.net for openai tokenization and calculate total count of query result string. Since our Databricks workspace sits on enclosed AWS VPC which doesn't have access to public internet (i guess there are other databricks users like us).  Just wondering whether this implementation can be done in another way without make API call to public internet. 

3 REPLIES 3

lingareddy_Alva
Honored Contributor II

@yj940525 

This is a common challenge when working with Databricks in air-gapped or restricted network environments. The issue you're experiencing with databricks_ai_bridge/genie.py attempting to connect to openaipublic.blob.core.windows.net is related to the tokenization process.
There are a few potential approaches to address this:

VPC Endpoints: Consider setting up AWS PrivateLink/VPC Endpoints to allow specific traffic to the required Azure Blob Storage endpoint without exposing your entire VPC to the public internet.
Local Tokenization: You could modify the code to use a local tokenizer implementation instead of relying on the remote API call. Libraries like tiktoken can be installed within your Databricks environment to handle tokenization locally.
Proxy Configuration: If your organization has an approved outbound proxy, you might be able to configure the Genie Agent to route its requests through this proxy.
Custom Implementation: Fork the library and modify the tokenization logic to either skip token counting or implement an alternative method that works within your network constraints.

 

LR

thanks for reply, yes, we are thinking of commenting out token counting logic if databricks side is not willing to change implementation. Just want to raise the point that the implementation like this should take into consideration of restricted network scenarios. 

lingareddy_Alva
Honored Contributor II

You're absolutely right. The implementation should definitely take restricted network scenarios into consideration, as many enterprise Databricks deployments operate in air-gapped or network-restricted environments.

Commenting out the token counting logic is a pragmatic short-term solution.

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now