Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...
Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.Databricks is deployed on google platform and I was using the trial.It is quite difficult to debug if the spark ui is only semi accessible.Part of the results in raw html are visible but all ...
I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How ca...
We have a Spark pipeline producing more than 3k Spark jobs. After the pipeline finishes and the cluster shuts down, only a subset (<1k) of these can be recovered from the Spark UI.We would like to have access to the full Spark UI after the pipeline t...
@Vlad Crisan , you can use the Databricks clusters to replay the events. Please follow this kb: https://kb.databricks.com/clusters/replay-cluster-spark-eventsNote: Please spin up a cluster with version 10.4 LTS.
Spill occurs as a result of executing various wide transformations. However, diagnosing a spill requires one to proactively look for key indicators.Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?a- Exec...
@Aviral Bhardwaj I feel it is Option e. Stage and executor log files. Consolidated details at the Stage LevelDetails at the task and Executor Level Please let me know if you feel any other option is better.
When running spark under yarn each script has it's own self contained set of logs:- In databricks all I see if a list of jobs and stages that have been run on the cluster:- From a support perspective this is a nightmare.How can notebooks logs be grou...
Hi @Dean Lovelace Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...
Hi Team,We have a complex ETL job running in databricks for 6 hours. The cluster has the below configuration: Minworkers: 16Maxworkers: 24Worker and Driver Node Type: Standard_DS14_v2. (16 cores, 128 GB RAM)I have monitored the job progress in Spark...
Hi @Rajesh Kannan R Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedb...
Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.
Hello all,I would like to know why task times (among other times in Spark UI) display values like 1h 2h when the task does only really take some seconds or minutes. What is the meaning of these high time values I see all around Spark UI.Thanks in adv...
Hi,We have two workspaces on Databricks, prod and dev. On prod, if we create a new all-purpose cluster through the web interface and go to Environment in the the spark UI, the spark.master setting is correctly set to be the host IP. This results in a...
I found the same issue when choosing the default cluster setup on first setup that when I went to edit the cluster to add an instance profile, I was not able to save without fixing this. Thanks for the tip
I am doing a batch load, using the JDBC driver from a database table. I am noticing in Sparkui, that there is both memory and disk spill, but only on one executor. I am also, noticing that when trying to use the JDBC parallel read, it seems to run sl...
Hi! I'm using the code from here to execute a query on Databricks, it goes flawlessly, can follow it from the Spark UI, etc. The problem here is at the moment it seems the download of the result (spark is idle, there is a green check in the query his...
Hi @Alejandro Martinez Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...
HI,I am running a Notebook job calling a JAR code (application code implmented in C#). in the Spark UI page for almost 2 hrs, it'w not showing any tasks and even the CPU usage is below 20%, memory usage is very small. Before this 2 hr window it shows...
This happens if the Spark driver is missing events. The jobs/task are not running. The Spark UI is reporting incorrect stats. This can be treated as a harmless UI issue. If you continue to see the issue consistently, then it might be good to review w...