cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

In Shared cluster with unity catalog Python process can't connect

6502
New Contributor III

The story is that I can access a service listening on port 80 using a single node cluster, but can't do the same using a shared node cluster. 

I read about the `spark.databricks.pyspark.iptable.outbound.whitelisted.ports` , however, setting that : 

`spark.databricks.pyspark.iptable.outbound.whitelisted.ports 587,9100,9243,443,22,80`

does not make it work. Get Started Discussions

I would like if there are settings I'm missing or the above settings is the only supposed to be used in that case.

 

4 REPLIES 4

Yeshwanth
Valued Contributor
Valued Contributor

Hi, I hope you are doing well.

Can you confirm if the connectivity works well while using the "Single User" and "No-Isolation Shared" clusters? The Shared clusters, by default, block outbound traffic to some of the ports.

Workaround 1:
Try to add this spark property on the shared cluster and try to run the same connectivity tests

====
spark.databricks.pyspark.iptable.outbound.whitelisted.ports 4554
====

If this fails, please try to configure an init script to run on the shared cluster where this issue is observed. The init script uses use IP tables firewall to open INPUT/OUTPUT TCP connections to port 4554 for the cluster

Workaround 2:
Create an init-script with the below script, attach it to the cluster and run the connectivity test.

====
#!/bin/bash
iptables -A OUTPUT -p tcp --dport 4554 -j ACCEPT
iptables -A INPUT -p tcp --dport 4554 -j ACCEPT
====

Please try both of the workarounds and keep us posted regarding the progress. Also, do not hesitate to reach out to us if you need any help.

Note: I have taken port 4554 as an example. Please change it as per your use case.

6502
New Contributor III

spark.databricks.pyspark.iptable.outbound.whitelisted.ports <-- this is not working 

This rule is supposed to accept the incoming to port 80, however, my problem is the outgoing connection.

Are you sure I need to add this?  

iptables -A INPUT -p tcp --dport 80 -j ACCEPT 

6502
New Contributor III

The init.sh can't be tested
INVALID_PARAMETER_VALUE: Attempting to install the following init scripts that are not in the allowlist. /Volumes/main/default/datalake/libs/init.sh: PERMISSION_DENIED: '/Volumes/main/default/datalake/libs/init.sh' is not in the artifact allowlist
I'll back as soon as possibile.

Yeshwanth
Valued Contributor
Valued Contributor

@6502 Please try placing the init script on the S3 or Workspace location and share the results here.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.