Re: Network error on subsequent runs using serverl...

mark_ott · ‎10-29-2025

The error you’re seeing (“Network is unreachable” repeated during pip installs) on a DLT (Delta Live Table) serverless cluster, especially after the first successful run, is a common issue that appears to affect Databricks pipelines run repeatedly on serverless clusters in rapid succession. Here’s a detailed analysis:

Likely Causes

Network Policy Reset or Resource Recycling: Serverless Databricks clusters are managed by the cloud provider and often aggressively recycle resources between runs to optimize costs. This can result in a fresh network environment for each pipeline execution. In some cases, egress (outbound) connections are not immediately or correctly re-established after cluster recycling, leading to intermittent “network unreachable” errors for pip installs.
Temporary IP Blocking or Firewall NAT Exhaustion: When running pipelines frequently, there’s evidence in Databricks and PyPI communities that IP addresses from cloud-managed pools can be subject to temporary blocks, rate limits, or network stack exhaustion, especially when connections are repeatedly opened and closed across rapid cluster lifecycles.
Cached Environments and Init Scripts: Sometimes, after the first run, the cluster environment may be cached or a previously downloaded wheel might still exist, but the network connectivity required to check requirement satisfiability is not re-established, resulting in pip’s repeated failure to connect to PyPI endpoints.
Library/Dependency Handling in DLT: DLT serverless clusters install dependencies anew each time, but because they’re not persistent environments, any custom setup (including .whl files not located in DBFS/S3/ADLS, or direct pip installs against PyPI) can run into transient network access issues or library install race conditions. The official advice is to pre-install via workspace library management and DBFS.

What You Can Try

Cluster Pool and Egress Policy: Review your network egress settings for serverless and ensure that necessary outbound connections (to PyPI, your artifact server, etc.) are not restricted or subject to rate limiting. If using Azure or AWS, verify that egress policies allow repeated rapid outbound traffic and consider working with your cloud admin to whitelist PyPI endpoints.
Staggered Runs: Avoid running pipelines back-to-back in very quick succession. Allow for the managed cluster pool to recycle and fully reinitialize. This can avoid network stack exhaustion.
Use Workspace Libraries/DBFS for Wheels: Store your .whl files on DBFS (Databricks File System) and reference them as workspace libraries at the start of your pipeline, rather than performing pip installs each time within notebook code. This may reduce dependency on live network connectivity at run time.
Init Script Management: Double-check any cluster init scripts and library install commands for idempotency and robustness. Misconfigured scripts may intermittently fail after the first run.
Contact Databricks Support: If the issue persists, document the timing and frequency, and contact Databricks support since there are acknowledged issues with network reliability for serverless pipeline compute under edge use-cases.

Key Best Practices

Prefer using DBFS/S3 to host custom wheels and install from those URIs, not from workspace or ephemeral paths.
Avoid installing packages with pip in user code on serverless clusters; leverage cluster-level library configuration whenever possible.
Ensure your organization’s outbound firewall does not rate-limit or temporarily block IPs due to frequent cluster recycling.

If you consistently see “Network is unreachable” after the first pipeline run, it's likely a side effect of Databricks' serverless cluster recycling, network egress policies, or rapid-fire resource re-allocation. These are not typically seen on classic clusters, which maintain more stable environments between runs.