Proxy (Zscaler) & Databricks/Spark Connect "Cannot check peer: missing selected ALPN property"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2024 12:01 AM
Summary:
We use Zscaler and are trying to use Databricks Connect to develop pyspark code locally. At first, we received SSL HTTP errors, which we resolved by ensuring Python's request library could find Zscaler's CA cert (setting REQUESTS_CA_BUNDLE env var).
We continued to get SSL errors, which came from the GRPC library used by Spark Connect. We resolved this by setting GRPC_DEFAULT_SSL_ROOTS_FILE_PATH.
But now, we receive "Cannot check peer: missing selected ALPN property" from the GRPC library. GRPC uses HTTP/2, and MITM proxies like Zscaler don't play nicely with HTTP/2.
Is there any workaround for this? Can we use HTTP/1.1 as the protocol for Databricks Connect? Or add an exception for the Databricks domain to our proxy?
Note: Databricks JDBC Driver appears to be unaffected
System:
Operating system: OSX 14.6
Python version: 3.11
Python Libraries:
databricks-connect==15.4.2
databricks-sdk==0.33.0
delta-spark==3.2.1
pyspark==3.5.3
grpcio==1.66.2
grpcio-status==1.66.2
requests==2.32.3
Steps to Reproduce:
- Be connected to a proxy which conducts man-in-the-middle inspections, such as Zscaler
- Set the Python requests library CA file using the REQUESTS_CA_BUNDLE env var::
export REQUESTS_CA_BUNDLE=/path/to/my/root/ca.pem
- Set the Python GRPC library's CA file using the GRPC_DEFAULT_SSL_ROOTS_FILE_PATH env var:
export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/path/to/my/root/ca.pem
If you do not do this step, you will receive the following _InactiveRpcError error:failed to connect to all addresses; last error: UNKNOWN: ipv4:[REDACTED WORKSPACE IP]:443: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
- Execute the following code with your Databricks profile configured:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
Stack Trace:
Traceback (most recent call last):
File ".venv/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py", line 1853, in config
resp = self._stub.Config(req, metadata=self.metadata())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/grpc/_channel.py", line 1181, in __call__
return _end_unary_response_blocking(state, call, False, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:[REDACTED WORKSPACE IP]:443: Cannot check peer: missing selected ALPN property."
debug_error_string = "UNKNOWN:Error received from peer {
grpc_message:"failed to connect to all addresses;
last error: UNKNOWN:
ipv4:[REDACTED WORKSPACE IP]:443:
Cannot check peer: missing selected ALPN property.",
grpc_status:14,
created_time:"2024-10-17T12:51:04.105548+01:00"
}"
>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2024 07:30 PM
Hello stevenayers-bge,
Do you have any solution for this problem me too facing the same issue when trying to use proxy to connect databricks through databricks connect
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2024 04:44 PM
Hello Stevenayers-bge,
checking if you come across any solution on above mentioned issue?
if yes could you please post here, I really appreciate
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2024 02:28 AM
Hi Ganeshpendu. We got on a call with Zscaler to talk through this. There are two options:
- Add an SSL inspection exemption in Zscaler's control panel for your Databricks Domain
- ZScaler does support HTTP2 but it is not turned on by default for some reason, and the options to turn it on are not available in the control panel by default. If you want to turn it on and configure it, you need to reach out to your Zscaler Account representative who can then enable the feature to appear in the Zscaler control panel.

