Data Engineering

Forum Posts

Sorted by:

by sanjay • Valued Contributor II

03-30-2023 12:42:50 AM

13056 Views
13 replies
10 kudos

Spark tasks too slow and not doing parellel processing

Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...

Data Engineering

13056 Views
13 replies
10 kudos

03-30-2023 12:42:50 AM

View Replies

Latest Reply

plondon
New Contributor II

07-24-2024 4:07:00 AM

10 kudos

Will it be any different if using Spark but within Azure, i.e. faster?

10 kudos

07-24-2024 4:07:00 AM

12 More Replies

by samst • New Contributor III

10-20-2021 9:16:34 AM

6844 Views
9 replies
6 kudos

Resolved! Spark UI reverse Proxy blocked on GCP

Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.Databricks is deployed on google platform and I was using the trial.It is quite difficult to debug if the spark ui is only semi accessible.Part of the results in raw html are visible but all ...

Data Engineering

6844 Views
9 replies
6 kudos

10-20-2021 9:16:34 AM

View Replies

Latest Reply

LucasArrudaW
New Contributor II

10-12-2023 7:53:07 AM

6 kudos

Any news about this?

6 kudos

10-12-2023 7:53:07 AM

8 More Replies

by dave_hiltbrand • New Contributor II

06-22-2023 7:47:26 PM

5964 Views
3 replies
0 kudos

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How ca...

Data Engineering

5964 Views
3 replies
0 kudos

06-22-2023 7:47:26 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-23-2023 12:18:56 AM

0 kudos

Hi @Dave Hiltbrand Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-23-2023 12:18:56 AM

2 More Replies

by vladcrisan • New Contributor II

10-04-2022 11:46:57 PM

4556 Views
5 replies
1 kudos

Can Spark History server be created in Databricks?

We have a Spark pipeline producing more than 3k Spark jobs. After the pipeline finishes and the cluster shuts down, only a subset (<1k) of these can be recovered from the Spark UI.We would like to have access to the full Spark UI after the pipeline t...

Data Engineering

4556 Views
5 replies
1 kudos

10-04-2022 11:46:57 PM

View Replies

Latest Reply

Sandeep
Contributor III

06-02-2023 3:29:54 AM

1 kudos

@Vlad Crisan , you can use the Databricks clusters to replay the events. Please follow this kb: https://kb.databricks.com/clusters/replay-cluster-spark-eventsNote: Please spin up a cluster with version 10.4 LTS.

1 kudos

06-02-2023 3:29:54 AM

4 More Replies

by Aviral-Bhardwaj • Esteemed Contributor III

05-21-2023 3:12:08 AM

9360 Views
2 replies
2 kudos

Resolved! can anyone help with Spill Question

Spill occurs as a result of executing various wide transformations. However, diagnosing a spill requires one to proactively look for key indicators.Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?a- Exec...

Data Engineering

9360 Views
2 replies
2 kudos

05-21-2023 3:12:08 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

05-22-2023 12:30:10 AM

2 kudos

@Aviral Bhardwaj I feel it is Option e. Stage and executor log files. Consolidated details at the Stage LevelDetails at the task and Executor Level Please let me know if you feel any other option is better.

2 kudos

05-22-2023 12:30:10 AM

1 More Replies

by Dean_Lovelace • New Contributor III

04-26-2023 8:43:11 AM

4583 Views
3 replies
0 kudos

How to filter the Spark UI for a notebook

When running spark under yarn each script has it's own self contained set of logs:- In databricks all I see if a list of jobs and stages that have been run on the cluster:- From a support perspective this is a nightmare.How can notebooks logs be grou...

Data Engineering

4583 Views
3 replies
0 kudos

04-26-2023 8:43:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-30-2023 11:25:13 PM

0 kudos

Hi @Dean Lovelace Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...

0 kudos

04-30-2023 11:25:13 PM

2 More Replies

by RajeshRK • Contributor II

03-17-2023 4:29:09 AM

4652 Views
7 replies
2 kudos

How to optimize jobs performance

Hi Team,We have a complex ETL job running in databricks for 6 hours. The cluster has the below configuration: Minworkers: 16Maxworkers: 24Worker and Driver Node Type: Standard_DS14_v2. (16 cores, 128 GB RAM)I have monitored the job progress in Spark...

Data Engineering

4652 Views
7 replies
2 kudos

03-17-2023 4:29:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-17-2023 11:08:46 PM

2 kudos

Hi @Rajesh Kannan R Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedb...

2 kudos

03-17-2023 11:08:46 PM

6 More Replies

by cgrant • Databricks Employee

06-09-2021 3:12:47 PM

3966 Views
4 replies
6 kudos

How do I know how much of a query/job used Photon?

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

Data Engineering

3966 Views
4 replies
6 kudos

06-09-2021 3:12:47 PM

View Replies

Latest Reply

venkat09
New Contributor III

01-21-2023 5:05:52 PM

6 kudos

Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.

6 kudos

01-21-2023 5:05:52 PM

3 More Replies

by alvaro_databric • New Contributor III

01-17-2023 7:09:56 AM

2440 Views
1 replies
1 kudos

Resolved! Task time Spark UI

Hello all,I would like to know why task times (among other times in Spark UI) display values like 1h 2h when the task does only really take some seconds or minutes. What is the meaning of these high time values I see all around Spark UI.Thanks in adv...

Data Engineering

2440 Views
1 replies
1 kudos

01-17-2023 7:09:56 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-17-2023 7:28:07 AM

1 kudos

that is accumulated time.https://stackoverflow.com/questions/73302982/task-time-and-gc-time-calculation-in-spark-ui-in-executor-section.

1 kudos

01-17-2023 7:28:07 AM

by supremefist • New Contributor III

03-16-2022 4:51:08 AM

6257 Views
3 replies
2 kudos

Resolved! New spark cluster being configured in local mode

Hi,We have two workspaces on Databricks, prod and dev. On prod, if we create a new all-purpose cluster through the web interface and go to Environment in the the spark UI, the spark.master setting is correctly set to be the host IP. This results in a...

Data Engineering

6257 Views
3 replies
2 kudos

03-16-2022 4:51:08 AM

View Replies

Latest Reply

scottb
New Contributor II

12-26-2022 7:05:48 PM

2 kudos

I found the same issue when choosing the default cluster setup on first setup that when I went to edit the cluster to add an instance profile, I was not able to save without fixing this. Thanks for the tip

2 kudos

12-26-2022 7:05:48 PM

2 More Replies

by huyd • New Contributor III

11-22-2022 2:47:12 PM

1450 Views
0 replies
4 kudos

Optimizing a batch load process, reading with the JDBC driver

I am doing a batch load, using the JDBC driver from a database table. I am noticing in Sparkui, that there is both memory and disk spill, but only on one executor. I am also, noticing that when trying to use the JDBC parallel read, it seems to run sl...

Data Engineering

1450 Views
0 replies
4 kudos

11-22-2022 2:47:12 PM

by alejandrofm • Valued Contributor

08-31-2022 2:12:38 PM

3732 Views
7 replies
1 kudos

Improve dowload speed or see download progress Python-Databricks SQL

Hi! I'm using the code from here to execute a query on Databricks, it goes flawlessly, can follow it from the Spark UI, etc. The problem here is at the moment it seems the download of the result (spark is idle, there is a green check in the query his...

Data Engineering

3732 Views
7 replies
1 kudos

08-31-2022 2:12:38 PM

View Replies

Latest Reply

Vidula
Honored Contributor

09-17-2022 12:53:17 AM

1 kudos

Hi @Alejandro Martinez Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

1 kudos

09-17-2022 12:53:17 AM

6 More Replies

by guruv • New Contributor III

01-15-2022 1:18:38 AM

6013 Views
4 replies
1 kudos

Resolved! Saprk UI not showing any running tasks

HI,I am running a Notebook job calling a JAR code (application code implmented in C#). in the Spark UI page for almost 2 hrs, it'w not showing any tasks and even the CPU usage is below 20%, memory usage is very small. Before this 2 hr window it shows...

Data Engineering

6013 Views
4 replies
1 kudos

01-15-2022 1:18:38 AM

View Replies

Latest Reply

Atanu
Databricks Employee

01-25-2022 9:03:42 PM

1 kudos

If I understood the issue correctly .

1 kudos

01-25-2022 9:03:42 PM

3 More Replies

by brickster_2018 • Databricks Employee

06-25-2021 11:55:32 AM

2824 Views
1 replies
0 kudos

Resolved! Jobs running forever in Spark UI

On the Spark UI, Jobs are running forever. But my notebook already completed the operations. Why the resources are wasted

Data Engineering

2824 Views
1 replies
0 kudos

06-25-2021 11:55:32 AM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 11:55:51 AM

0 kudos

This happens if the Spark driver is missing events. The jobs/task are not running. The Spark UI is reporting incorrect stats. This can be treated as a harmless UI issue. If you continue to see the issue consistently, then it might be good to review w...

0 kudos

06-25-2021 11:55:51 AM

by User15787040559 • Databricks Employee

06-07-2021 9:11:09 AM

5415 Views
1 replies
0 kudos

How do I see the java version being used on the cluster?

Environment Tab in the Spark UI

Data Engineering

5415 Views
1 replies
0 kudos

06-07-2021 9:11:09 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-14-2021 5:48:14 AM

0 kudos

In spark UI - > Environment Tab

0 kudos

06-14-2021 5:48:14 AM