Hey,
We have installed the com.databricks:spark-xml_2.12:0.18.0 library in our VNET-injected Databricks workspace to read XML files from a storage account. The notebook runs successfully for text files when the cluster is started without the library installed. However, when running the notebook with XML files, the cluster enters a waiting state.
Our Databricks subnet has a route table attached, and all traffic is routed through our firewall. When we disassociate the route table from the Databricks public subnet, the notebook runs without any issues, indicating that the firewall is blocking the required connectivity. However, I am unable to determine which ports or FQDNs need to be opened to resolve this issue.
I would greatly appreciate any thoughts on this!