Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday
The execution and result streaming generally happens over the gRPC route. You can force the gRPC route to send periodic frames to keep the connection look active in the AKS network infrastructure side.
You can add the following variables into the AKS Pod manifest before initializing the Databricks Session.
os.environ["GRPC_KEEPALIVE_TIME_MS"] = "30000" # 30 seconds
os.environ["GRPC_KEEPALIVE_TIMEOUT_MS"] = "10000" # 10 seconds
os.environ["GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS"] = "1"
os.environ["GRPC_HTTP2_MAX_PINGS_WITHOUT_DATA"] = "0"You can pass them as headers during session creation based on specific builder implementation.
You can check below
- AKS Timeouts - You can increase the default idle time out of Azure NAT Gateway if possible to 15 minutes to give queries more time
- Enable gRPC Logging - Check for connection resets, stream closures or EOF errors in the logs
- Application-Level Timeouts: You can implement application level timeouts in the code (concurrent.futures or asyncio). It can ensure the pipeline fails gracefully and can trigger a retry mechanism than hanging an AKS pod indefinitely.
- Cluster Configuration - You can add the configurations - spark.databricks.service.server.enabled & spark.sql.execution.arrow.pyspark.enabled as true