cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Proxy (Zscaler) & Databricks/Spark Connect "Cannot check peer: missing selected ALPN property"

stevenayers-bge
Contributor

Summary:

We use Zscaler and are trying to use Databricks Connect to develop pyspark code locally. At first, we received SSL HTTP errors, which we resolved by ensuring Python's request library could find Zscaler's CA cert (setting REQUESTS_CA_BUNDLE env var).

We continued to get SSL errors, which came from the GRPC library used by Spark Connect. We resolved this by setting GRPC_DEFAULT_SSL_ROOTS_FILE_PATH.

But now, we receive "Cannot check peer: missing selected ALPN property" from the GRPC library. GRPC uses HTTP/2, and MITM proxies like Zscaler don't play nicely with HTTP/2.

Is there any workaround for this? Can we use HTTP/1.1 as the protocol for Databricks Connect? Or add an exception for the Databricks domain to our proxy?

Note: Databricks JDBC Driver appears to be unaffected

System:

 

 

 

Operating system: OSX 14.6

Python version: 3.11

Python Libraries:
databricks-connect==15.4.2
databricks-sdk==0.33.0
delta-spark==3.2.1
pyspark==3.5.3
grpcio==1.66.2
grpcio-status==1.66.2
requests==2.32.3

 

 

 

Steps to Reproduce:

  1. Be connected to a proxy which conducts man-in-the-middle inspections, such as Zscaler
  2. Set the Python requests library CA file using the REQUESTS_CA_BUNDLE env var::
    export REQUESTS_CA_BUNDLE=/path/to/my/root/ca.pem
  3. Set the Python GRPC library's CA file using the GRPC_DEFAULT_SSL_ROOTS_FILE_PATH env var:
    export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/path/to/my/root/ca.pem
     If you do not do this step, you will receive the following _InactiveRpcError error:
    failed to connect to all addresses; last error: UNKNOWN: ipv4:[REDACTED WORKSPACE IP]:443: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
  4. Execute the following code with your Databricks profile configured:
    from databricks.connect import DatabricksSession

    spark = DatabricksSession.builder.getOrCreate()

 

Stack Trace:

Traceback (most recent call last):
File ".venv/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py", line 1853, in config
resp = self._stub.Config(req, metadata=self.metadata())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/grpc/_channel.py", line 1181, in __call__
return _end_unary_response_blocking(state, call, False, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:[REDACTED WORKSPACE IP]:443: Cannot check peer: missing selected ALPN property."
debug_error_string = "UNKNOWN:Error received from peer {
grpc_message:"failed to connect to all addresses;
last error: UNKNOWN:
ipv4:[REDACTED WORKSPACE IP]:443:
Cannot check peer: missing selected ALPN property.",
grpc_status:14,
created_time:"2024-10-17T12:51:04.105548+01:00"
}"
>

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group