cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark UI reverse Proxy blocked on GCP

samst
New Contributor III

Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.

Databricks is deployed on google platform and I was using the trial.

It is quite difficult to debug if the spark ui is only semi accessible.

Part of the results in raw html are visible but all css, js assets as well as any details if I klick on jobs in the ui popup are blocked (403).

302 https://dp-....gcp.databricks.com/driver-proxy/o/.../1007-074754-weeps90/41616/jobs/job?id=4&csrf=3.....

403 https://dp-....gcp.databricks.com/driver-proxy/o/.../1007-074754-weeps90/48884/proxy/local-163474460...

403 https://dp-....gcp.databricks.com/driver-proxy/o/.../1007-074754-weeps90/48884/proxy/local-163474460...

(everything which is behind the reverseProxy /driver-proxy/o/.../1007-074754-weeps90/48884/)

The spark ui works if I terminate the cluster and check it in Cluster or notebook tab. Its the history server that works fine but the live access proxy does not. I assume there are some issues with oauth but I cannot find any logs to tell me where the 403 happens.

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

Hi @Samuel Stütz​ , the implementation of Spark on GCP is over Kubernetes so yes the configuration is a different as compared to Databricks over Azure/AWS. But for security purposes we do recommend being careful with the firewalls. As @Kaniz Fatma​  shared please refer to the security guidelines for GCP here. For a properly configured cluster, the Spark UI should be accessible but through a SSH tunnel.

If you can share more details such as what job are you running, more screenshots of the issue and steps to reproduce the issue. Someone here should be able to help you out!!

View solution in original post

9 REPLIES 9

samst
New Contributor III

Is there any anwers on this. Is this supposed to work or.

I have a just a standard google marketplace databricks deployment. Getting a usable spark ui only after job completion and cluster shutdown, is not a functioning deployment for developing code.

It seems odd that this bug still exists without any changes.

Anonymous
Not applicable

Hi @Samuel Stütz​ , the implementation of Spark on GCP is over Kubernetes so yes the configuration is a different as compared to Databricks over Azure/AWS. But for security purposes we do recommend being careful with the firewalls. As @Kaniz Fatma​  shared please refer to the security guidelines for GCP here. For a properly configured cluster, the Spark UI should be accessible but through a SSH tunnel.

If you can share more details such as what job are you running, more screenshots of the issue and steps to reproduce the issue. Someone here should be able to help you out!!

samst
New Contributor III

Thanks for the answer. I am just running an interactive cluster and would expect to see the spark ui in the side panel, when I run a notebook.

The way I expect it to work given the UI (the blocked URLs I can see in the chrome dev tools) and how AWS works. Is that it is an inverting proxy https://github.com/google/inverting-proxy

However if you say I have to use ssh. Then such urls

403 https://dp-....gcp.databricks.com/driver-proxy/o/.../1007-074754-weeps90/48884/proxy/local-163474460...

will remain broken.

I know from AWS EMR spark deployements that a socks proxy and some foxyproxy chrome extension config can work, but the way I understand it the UI in databricks notebooks would not agree with that the links would still be broken.

I am just starting a cluster and attach a notebook and then would expect the Spark UI to be reachable. I did use the single node stand alone maybe this was the issue. I will try an actual multi node cluser.

Anonymous
Not applicable

@samst, we don't generally recommend using any other chrome plugins again for security purposes as they may be collecting some info from the traffic but for a job fired by reading a text file using spark and then doing a count, I was able to access the spark UI in another tab. Attached is the screenshot for the same. So the spark UI is definitely not blocked on GCP.

samst
New Contributor III

UPDATE: I attempted now a high concurrency cluster with runtime 10.1ML.

I don't have time atm to investigate all options but I am guessing that the spark-ui is simple not working on standalone single node clusters or it was the specific (9.1LTS runtime but I doubt that).

It is not that relevant on single node clusters anyway.

I have a high concurrency 10.1ML cluster running and this one show all the ui everywhere.

HI @Samuel Stütz​ ,

Are you still having this issue?

samst
New Contributor III

It is not a big issues anymore, but it remains as it is.

Standard and High Concurrency Cluster have a working Spark UI,

Single node clusters do not.

I tested with runtime 9.1 and 10.1. The Runtime does not matter. It probably is the slightly different setup with the single node clusters.

Spark UI history server works in either cases.

Only interactive single node clusters are problematic to work with, all others are fine. For Jobs that complete it is not an issue.

At the moment my bigger issue is that there is no Databricks SQL and the respective dashboards in GCP. Just the Data Engineering View and Machine Learning.

Anonymous
Not applicable

This has yet to be solved. The Spark UI for single node clusters in GCP is still broken and throwing 403.

LucasArrudaW
New Contributor II

Any news about this?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group