lingareddy_Alva
Esteemed Contributor

@yj940525 

This is a common challenge when working with Databricks in air-gapped or restricted network environments. The issue you're experiencing with databricks_ai_bridge/genie.py attempting to connect to openaipublic.blob.core.windows.net is related to the tokenization process.
There are a few potential approaches to address this:

VPC Endpoints: Consider setting up AWS PrivateLink/VPC Endpoints to allow specific traffic to the required Azure Blob Storage endpoint without exposing your entire VPC to the public internet.
Local Tokenization: You could modify the code to use a local tokenizer implementation instead of relying on the remote API call. Libraries like tiktoken can be installed within your Databricks environment to handle tokenization locally.
Proxy Configuration: If your organization has an approved outbound proxy, you might be able to configure the Genie Agent to route its requests through this proxy.
Custom Implementation: Fork the library and modify the tokenization logic to either skip token counting or implement an alternative method that works within your network constraints.

 

LR