10-20-2021 09:16 AM
Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.
Databricks is deployed on google platform and I was using the trial.
It is quite difficult to debug if the spark ui is only semi accessible.
Part of the results in raw html are visible but all css, js assets as well as any details if I klick on jobs in the ui popup are blocked (403).
(everything which is behind the reverseProxy /driver-proxy/o/.../1007-074754-weeps90/48884/)
The spark ui works if I terminate the cluster and check it in Cluster or notebook tab. Its the history server that works fine but the live access proxy does not. I assume there are some issues with oauth but I cannot find any logs to tell me where the 403 happens.
11-23-2021 04:09 AM
Hi @Samuel Stütz , the implementation of Spark on GCP is over Kubernetes so yes the configuration is a different as compared to Databricks over Azure/AWS. But for security purposes we do recommend being careful with the firewalls. As @Kaniz Fatma shared please refer to the security guidelines for GCP here. For a properly configured cluster, the Spark UI should be accessible but through a SSH tunnel.
If you can share more details such as what job are you running, more screenshots of the issue and steps to reproduce the issue. Someone here should be able to help you out!!
11-02-2021 05:27 AM
Is there any anwers on this. Is this supposed to work or.
I have a just a standard google marketplace databricks deployment. Getting a usable spark ui only after job completion and cluster shutdown, is not a functioning deployment for developing code.
It seems odd that this bug still exists without any changes.
11-23-2021 04:09 AM
Hi @Samuel Stütz , the implementation of Spark on GCP is over Kubernetes so yes the configuration is a different as compared to Databricks over Azure/AWS. But for security purposes we do recommend being careful with the firewalls. As @Kaniz Fatma shared please refer to the security guidelines for GCP here. For a properly configured cluster, the Spark UI should be accessible but through a SSH tunnel.
If you can share more details such as what job are you running, more screenshots of the issue and steps to reproduce the issue. Someone here should be able to help you out!!
11-23-2021 04:39 AM
Thanks for the answer. I am just running an interactive cluster and would expect to see the spark ui in the side panel, when I run a notebook.
The way I expect it to work given the UI (the blocked URLs I can see in the chrome dev tools) and how AWS works. Is that it is an inverting proxy https://github.com/google/inverting-proxy
However if you say I have to use ssh. Then such urls
will remain broken.
I know from AWS EMR spark deployements that a socks proxy and some foxyproxy chrome extension config can work, but the way I understand it the UI in databricks notebooks would not agree with that the links would still be broken.
I am just starting a cluster and attach a notebook and then would expect the Spark UI to be reachable. I did use the single node stand alone maybe this was the issue. I will try an actual multi node cluser.
11-23-2021 05:19 AM
@samst, we don't generally recommend using any other chrome plugins again for security purposes as they may be collecting some info from the traffic but for a job fired by reading a text file using spark and then doing a count, I was able to access the spark UI in another tab. Attached is the screenshot for the same. So the spark UI is definitely not blocked on GCP.
11-23-2021 05:21 AM
UPDATE: I attempted now a high concurrency cluster with runtime 10.1ML.
I don't have time atm to investigate all options but I am guessing that the spark-ui is simple not working on standalone single node clusters or it was the specific (9.1LTS runtime but I doubt that).
It is not that relevant on single node clusters anyway.
I have a high concurrency 10.1ML cluster running and this one show all the ui everywhere.
12-13-2021 04:43 PM
HI @Samuel Stütz ,
Are you still having this issue?
12-19-2021 10:58 PM
It is not a big issues anymore, but it remains as it is.
Standard and High Concurrency Cluster have a working Spark UI,
Single node clusters do not.
I tested with runtime 9.1 and 10.1. The Runtime does not matter. It probably is the slightly different setup with the single node clusters.
Spark UI history server works in either cases.
Only interactive single node clusters are problematic to work with, all others are fine. For Jobs that complete it is not an issue.
At the moment my bigger issue is that there is no Databricks SQL and the respective dashboards in GCP. Just the Data Engineering View and Machine Learning.
07-23-2023 06:05 PM
This has yet to be solved. The Spark UI for single node clusters in GCP is still broken and throwing 403.
10-12-2023 07:53 AM
Any news about this?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group