cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

In Shared cluster with unity catalog Python process can't connect

6502
New Contributor III

The story is that I can access a service listening on port 80 using a single node cluster, but can't do the same using a shared node cluster. 

I read about the `spark.databricks.pyspark.iptable.outbound.whitelisted.ports` , however, setting that : 

`spark.databricks.pyspark.iptable.outbound.whitelisted.ports 587,9100,9243,443,22,80`

does not make it work. Get Started Discussions

I would like if there are settings I'm missing or the above settings is the only supposed to be used in that case.

 

4 REPLIES 4

Yeshwanth
Honored Contributor
Honored Contributor

Hi, I hope you are doing well.

Can you confirm if the connectivity works well while using the "Single User" and "No-Isolation Shared" clusters? The Shared clusters, by default, block outbound traffic to some of the ports.

Workaround 1:
Try to add this spark property on the shared cluster and try to run the same connectivity tests

====
spark.databricks.pyspark.iptable.outbound.whitelisted.ports 4554
====

If this fails, please try to configure an init script to run on the shared cluster where this issue is observed. The init script uses use IP tables firewall to open INPUT/OUTPUT TCP connections to port 4554 for the cluster

Workaround 2:
Create an init-script with the below script, attach it to the cluster and run the connectivity test.

====
#!/bin/bash
iptables -A OUTPUT -p tcp --dport 4554 -j ACCEPT
iptables -A INPUT -p tcp --dport 4554 -j ACCEPT
====

Please try both of the workarounds and keep us posted regarding the progress. Also, do not hesitate to reach out to us if you need any help.

Note: I have taken port 4554 as an example. Please change it as per your use case.

6502
New Contributor III

spark.databricks.pyspark.iptable.outbound.whitelisted.ports <-- this is not working 

This rule is supposed to accept the incoming to port 80, however, my problem is the outgoing connection.

Are you sure I need to add this?  

iptables -A INPUT -p tcp --dport 80 -j ACCEPT 

6502
New Contributor III

The init.sh can't be tested
INVALID_PARAMETER_VALUE: Attempting to install the following init scripts that are not in the allowlist. /Volumes/main/default/datalake/libs/init.sh: PERMISSION_DENIED: '/Volumes/main/default/datalake/libs/init.sh' is not in the artifact allowlist
I'll back as soon as possibile.

Yeshwanth
Honored Contributor
Honored Contributor

@6502 Please try placing the init script on the S3 or Workspace location and share the results here.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group