@yj940525
This is a common challenge when working with Databricks in air-gapped or restricted network environments. The issue you're experiencing with databricks_ai_bridge/genie.py attempting to connect to openaipublic.blob.core.windows.net is related to the tokenization process.
There are a few potential approaches to address this:
VPC Endpoints: Consider setting up AWS PrivateLink/VPC Endpoints to allow specific traffic to the required Azure Blob Storage endpoint without exposing your entire VPC to the public internet.
Local Tokenization: You could modify the code to use a local tokenizer implementation instead of relying on the remote API call. Libraries like tiktoken can be installed within your Databricks environment to handle tokenization locally.
Proxy Configuration: If your organization has an approved outbound proxy, you might be able to configure the Genie Agent to route its requests through this proxy.
Custom Implementation: Fork the library and modify the tokenization logic to either skip token counting or implement an alternative method that works within your network constraints.
LR