โ10-18-2024 12:01 AM
Summary:
We use Zscaler and are trying to use Databricks Connect to develop pyspark code locally. At first, we received SSL HTTP errors, which we resolved by ensuring Python's request library could find Zscaler's CA cert (setting REQUESTS_CA_BUNDLE env var).
We continued to get SSL errors, which came from the GRPC library used by Spark Connect. We resolved this by setting GRPC_DEFAULT_SSL_ROOTS_FILE_PATH.
But now, we receive "Cannot check peer: missing selected ALPN property" from the GRPC library. GRPC uses HTTP/2, and MITM proxies like Zscaler don't play nicely with HTTP/2.
Is there any workaround for this? Can we use HTTP/1.1 as the protocol for Databricks Connect? Or add an exception for the Databricks domain to our proxy?
Note: Databricks JDBC Driver appears to be unaffected
System:
Operating system: OSX 14.6
Python version: 3.11
Python Libraries:
databricks-connect==15.4.2
databricks-sdk==0.33.0
delta-spark==3.2.1
pyspark==3.5.3
grpcio==1.66.2
grpcio-status==1.66.2
requests==2.32.3
Steps to Reproduce:
export REQUESTS_CA_BUNDLE=/path/to/my/root/ca.pem
export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/path/to/my/root/ca.pem
If you do not do this step, you will receive the following _InactiveRpcError error:failed to connect to all addresses; last error: UNKNOWN: ipv4:[REDACTED WORKSPACE IP]:443: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
Stack Trace:
Traceback (most recent call last):
File ".venv/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py", line 1853, in config
resp = self._stub.Config(req, metadata=self.metadata())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/grpc/_channel.py", line 1181, in __call__
return _end_unary_response_blocking(state, call, False, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:[REDACTED WORKSPACE IP]:443: Cannot check peer: missing selected ALPN property."
debug_error_string = "UNKNOWN:Error received from peer {
grpc_message:"failed to connect to all addresses;
last error: UNKNOWN:
ipv4:[REDACTED WORKSPACE IP]:443:
Cannot check peer: missing selected ALPN property.",
grpc_status:14,
created_time:"2024-10-17T12:51:04.105548+01:00"
}"
>
โ10-24-2024 07:30 PM
Hello stevenayers-bge,
Do you have any solution for this problem me too facing the same issue when trying to use proxy to connect databricks through databricks connect
โ10-30-2024 04:44 PM
Hello Stevenayers-bge,
checking if you come across any solution on above mentioned issue?
if yes could you please post here, I really appreciate
โ10-31-2024 02:28 AM
Hi Ganeshpendu. We got on a call with Zscaler to talk through this. There are two options:
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now