cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sanjay
by Valued Contributor II
  • 7302 Views
  • 13 replies
  • 10 kudos

Spark tasks too slow and not doing parellel processing

Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...

  • 7302 Views
  • 13 replies
  • 10 kudos
Latest Reply
plondon
New Contributor II
  • 10 kudos

Will it be any different if using Spark but within Azure, i.e. faster? 

  • 10 kudos
12 More Replies
samst
by New Contributor III
  • 5185 Views
  • 11 replies
  • 6 kudos

Resolved! Spark UI reverse Proxy blocked on GCP

Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.Databricks is deployed on google platform and I was using the trial.It is quite difficult to debug if the spark ui is only semi accessible.Part of the results in raw html are visible but all ...

  • 5185 Views
  • 11 replies
  • 6 kudos
Latest Reply
LucasArrudaW
New Contributor II
  • 6 kudos

Any news about this?

  • 6 kudos
10 More Replies
dave_hiltbrand
by New Contributor II
  • 2071 Views
  • 3 replies
  • 0 kudos

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How ca...

  • 2071 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Dave Hiltbrand​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
2 More Replies
vladcrisan
by New Contributor II
  • 3453 Views
  • 5 replies
  • 1 kudos

Can Spark History server be created in Databricks?

We have a Spark pipeline producing more than 3k Spark jobs. After the pipeline finishes and the cluster shuts down, only a subset (<1k) of these can be recovered from the Spark UI.We would like to have access to the full Spark UI after the pipeline t...

  • 3453 Views
  • 5 replies
  • 1 kudos
Latest Reply
Sandeep
Contributor III
  • 1 kudos

@Vlad Crisan​ , you can use the Databricks clusters to replay the events. Please follow this kb: https://kb.databricks.com/clusters/replay-cluster-spark-eventsNote: Please spin up a cluster with version 10.4 LTS.

  • 1 kudos
4 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 7452 Views
  • 2 replies
  • 2 kudos

Resolved! can anyone help with Spill Question

Spill occurs as a result of executing various wide transformations. However, diagnosing a spill requires one to proactively look for key indicators.Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?a-   Exec...

  • 7452 Views
  • 2 replies
  • 2 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 2 kudos

@Aviral Bhardwaj​  I feel it is Option e. Stage and executor log files. Consolidated details at the Stage LevelDetails at the task and Executor Level Please let me know if you feel any other option is better.

  • 2 kudos
1 More Replies
Dean_Lovelace
by New Contributor III
  • 3481 Views
  • 3 replies
  • 0 kudos

How to filter the Spark UI for a notebook

When running spark under yarn each script has it's own self contained set of logs:- In databricks all I see if a list of jobs and stages that have been run on the cluster:- From a support perspective this is a nightmare.How can notebooks logs be grou...

image image
  • 3481 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Dean Lovelace​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

  • 0 kudos
2 More Replies
RajeshRK
by Contributor
  • 3438 Views
  • 7 replies
  • 2 kudos

How to optimize jobs performance

Hi Team,We have a complex ETL job running in databricks for 6 hours. The cluster has the below configuration: Minworkers: 16Maxworkers: 24Worker and Driver Node Type: Standard_DS14_v2. (16 cores, 128 GB RAM)I have monitored the job progress in Spark...

  • 3438 Views
  • 7 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Rajesh Kannan R​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedb...

  • 2 kudos
6 More Replies
User16783854657
by New Contributor III
  • 2961 Views
  • 4 replies
  • 6 kudos

How do I know how much of a query/job used Photon?

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

  • 2961 Views
  • 4 replies
  • 6 kudos
Latest Reply
venkat09
New Contributor III
  • 6 kudos

Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.

  • 6 kudos
3 More Replies
alvaro_databric
by New Contributor III
  • 1710 Views
  • 1 replies
  • 1 kudos

Resolved! Task time Spark UI

Hello all,I would like to know why task times (among other times in Spark UI) display values like 1h 2h when the task does only really take some seconds or minutes. What is the meaning of these high time values I see all around Spark UI.Thanks in adv...

  • 1710 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

that is accumulated time.https://stackoverflow.com/questions/73302982/task-time-and-gc-time-calculation-in-spark-ui-in-executor-section.

  • 1 kudos
supremefist
by New Contributor III
  • 4668 Views
  • 5 replies
  • 2 kudos

Resolved! New spark cluster being configured in local mode

Hi,We have two workspaces on Databricks, prod and dev. On prod, if we create a new all-purpose cluster through the web interface and go to Environment in the the spark UI, the spark.master setting is correctly set to be the host IP. This results in a...

  • 4668 Views
  • 5 replies
  • 2 kudos
Latest Reply
scottb
New Contributor II
  • 2 kudos

I found the same issue when choosing the default cluster setup on first setup that when I went to edit the cluster to add an instance profile, I was not able to save without fixing this. Thanks for the tip

  • 2 kudos
4 More Replies
huyd
by New Contributor III
  • 1035 Views
  • 0 replies
  • 4 kudos

Optimizing a batch load process, reading with the JDBC driver

I am doing a batch load, using the JDBC driver from a database table. I am noticing in Sparkui, that there is both memory and disk spill, but only on one executor. I am also, noticing that when trying to use the JDBC parallel read, it seems to run sl...

  • 1035 Views
  • 0 replies
  • 4 kudos
alejandrofm
by Valued Contributor
  • 2797 Views
  • 7 replies
  • 1 kudos

Improve dowload speed or see download progress Python-Databricks SQL

Hi! I'm using the code from here to execute a query on Databricks, it goes flawlessly, can follow it from the Spark UI, etc. The problem here is at the moment it seems the download of the result (spark is idle, there is a green check in the query his...

  • 2797 Views
  • 7 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Alejandro Martinez​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

  • 1 kudos
6 More Replies
guruv
by New Contributor III
  • 4805 Views
  • 5 replies
  • 1 kudos

Resolved! Saprk UI not showing any running tasks

HI,I am running a Notebook job calling a JAR code (application code implmented in C#). in the Spark UI page for almost 2 hrs, it'w not showing any tasks and even the CPU usage is below 20%, memory usage is very small. Before this 2 hr window it shows...

  • 4805 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @guruv​ , Does @Atanu Sarkar​ 's response answer your query?

  • 1 kudos
4 More Replies
brickster_2018
by Esteemed Contributor
  • 2102 Views
  • 1 replies
  • 0 kudos

Resolved! Jobs running forever in Spark UI

On the Spark UI, Jobs are running forever. But my notebook already completed the operations. Why the resources are wasted

  • 2102 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

This happens if the Spark driver is missing events. The jobs/task are not running. The Spark UI is reporting incorrect stats. This can be treated as a harmless UI issue. If you continue to see the issue consistently, then it might be good to review w...

  • 0 kudos
Labels