Data Engineering

Forum Posts

Sorted by:

by brickster_2018 • Databricks Employee

06-25-2021 8:53:45 AM

4665 Views
2 replies
0 kudos

Resolved! Is Spark Driver a synonym for Spark Master daemon

If I understand correctly, Spark driver is a master process. Is it the same as the Spark Master. I get confused with the Spark master and Spark driver.

Data Engineering

4665 Views
2 replies
0 kudos

06-25-2021 8:53:45 AM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 8:55:12 AM

0 kudos

This is a common misconception. Spark Master and Spark driver are two independent and isolated JVM's running on the same instance. Spark Master's responsibilities are to ensure the Spark worker's daemons are up and running and monitor the health. Als...

0 kudos

06-25-2021 8:55:12 AM

1 More Replies

by SaraCorralLou • New Contributor III

02-03-2023 6:15:23 AM

28575 Views
5 replies
2 kudos

Resolved! Error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

What is the problem?I am getting this error every time I run a python notebook on my Repo in Databricks.BackgroundThe notebook where I am getting the error is a notebook that creates a dataframe and the last step is to write the dataframe to a Delta ...

Data Engineering

28575 Views
5 replies
2 kudos

02-03-2023 6:15:23 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 1:55:31 AM

2 kudos

Hi @Sara Corral Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

2 kudos

04-10-2023 1:55:31 AM

4 More Replies

by Nis • New Contributor II

05-17-2023 7:01:35 AM

1751 Views
1 replies
2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

Data Engineering

1751 Views
1 replies
2 kudos

05-17-2023 7:01:35 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-06-2023 11:39:55 AM

2 kudos

Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?

2 kudos

06-06-2023 11:39:55 AM

by yunna_wei • Databricks Employee

06-02-2023 2:44:46 AM

897 Views
0 replies
3 kudos

In any Spark application, Spark driver plays a critical role and performs the following functions: 1. Initiating a Spark Session 2. Communicating with...

In any Spark application, Spark driver plays a critical role and performs the following functions:1. Initiating a Spark Session2. Communicating with the cluster manager to request resources (CPU, memory, etc) from the cluster manager for Spark's exec...

Data Engineering

897 Views
0 replies
3 kudos

06-02-2023 2:44:46 AM

by JKR • Contributor

04-25-2023 4:57:58 PM

2772 Views
2 replies
0 kudos

The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

Getting below error Context: Using Databricks shared interactive cluster for scheduled run multiple parallel jobs at the same time after every 5 mins. When I check Ganglia, driver node's memory reaches almost max and then restart of driver happens an...

Data Engineering

2772 Views
2 replies
0 kudos

04-25-2023 4:57:58 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-26-2023 2:20:11 PM

0 kudos

please check the driver's logs, for example the log4j and the GC logs

0 kudos

04-26-2023 2:20:11 PM

1 More Replies

by uzairm • New Contributor III

02-27-2023 12:29:48 PM

6232 Views
12 replies
3 kudos

Resolved! Concurrent Jobs - The spark driver has stopped unexpectedly!

Hi, I am running concurrent notebooks in concurrent workflow jobs in job compute cluster c5a.8xlarge with 5-7 worker nodes. Each job has 100 concurrent child notebooks and there are 10 job instances. 8/10 jobs gives the error the spark driver has sto...

Data Engineering

6232 Views
12 replies
3 kudos

02-27-2023 12:29:48 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-12-2023 9:53:42 PM

3 kudos

Hi @uzair mustafa Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

3 kudos

03-12-2023 9:53:42 PM

11 More Replies

by Data_Engineer3 • Contributor III

04-12-2023 3:33:34 AM

12080 Views
4 replies
5 kudos

How can i use the same spark session from onenotebook to another notebook in databricks

I want to use the same spark session which created in one notebook and need to be used in another notebook in across same environment, Example, if some of the (variable)object got initialized in the first notebook, i need to use the same object in t...

Data Engineering

12080 Views
4 replies
5 kudos

04-12-2023 3:33:34 AM

View Replies

Latest Reply

Manoj12421
Valued Contributor II

04-18-2023 12:39:19 AM

5 kudos

You can use %run and then use the location of the notebook - %run "/folder/notebookname"

5 kudos

04-18-2023 12:39:19 AM

3 More Replies

by oleole • Contributor

03-19-2023 8:01:15 PM

5607 Views
3 replies
3 kudos

Resolved! How to delay a new job run after job

I have a daily job run that occasionally fails with the error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached. After I get the notification that this job failed on schedule, I manually run ...

Data Engineering

5607 Views
3 replies
3 kudos

03-19-2023 8:01:15 PM

View Replies

Latest Reply

oleole
Contributor

04-17-2023 1:16:39 PM

3 kudos

According to this documentation, you can specify the wait time between the "start" of the first run and the retry start time.

3 kudos

04-17-2023 1:16:39 PM

2 More Replies

by uzairm • New Contributor III

03-10-2023 3:20:05 AM

6795 Views
2 replies
1 kudos

My whole code is running on driver node, I want my code to run on worker nodes so that the memory of driver node is not exhausted. Please tell me improvement is my codes. My spark crashes frequently when the pulled data from s3 is huge.

I am running process which has 4 steps.Querying s3 file paths from dynamo DB based on certain parameters given by user. (function to do so provided by client, just have to import). Returns a list of filesCheck if those file paths have already been qu...

Data Engineering

6795 Views
2 replies
1 kudos

03-10-2023 3:20:05 AM

View Replies

Latest Reply

Vartika
Databricks Employee

03-31-2023 2:38:44 AM

1 kudos

Hi @uzair mustafa Thank you for posting your question in our community! We are happy to assist you.Does @Suteja Kanuri's answer help? If it does, would you be happy to mark it as best?This will help other community members who may have similar ques...

1 kudos

03-31-2023 2:38:44 AM

1 More Replies

by jayallenmn • New Contributor III

07-20-2022 9:30:13 PM

1686 Views
2 replies
0 kudos

Analyzing 23 GB JSON file

Hey all, We're trying to analyze the data in a 23 GB JSON file. We're using the basic starter cluster - one node, 2 cpu x 8GB.We can read the JSON file into a spark dataframe and print out the schema but if we try and do any operations that won't c...

Data Engineering

1686 Views
2 replies
0 kudos

07-20-2022 9:30:13 PM

View Replies

Latest Reply

Prabakar
Databricks Employee

07-21-2022 6:45:14 AM

0 kudos

Hi @Jay Allen you can refer to the cluster sizing doc.

0 kudos

07-21-2022 6:45:14 AM

1 More Replies

by chandan_a_v • Valued Contributor

05-05-2022 11:23:48 PM

16375 Views
6 replies
6 kudos

Resolved! Spark Driver Out of Memory Issue

Hi, I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue. Spark config :from pyspark.sql import SparkSessionspark_session = SparkSession.builder.appName("Demand Forecasting...

Data Engineering

16375 Views
6 replies
6 kudos

05-05-2022 11:23:48 PM

View Replies

Latest Reply

chandan_a_v
Valued Contributor

05-08-2022 12:05:48 PM

6 kudos

I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.

6 kudos

05-08-2022 12:05:48 PM

5 More Replies

by NOOR_BASHASHAIK • Contributor

02-07-2022 1:55:47 AM

1786 Views
1 replies
0 kudos

Resolved! Databricks PAT (personal access token) with access to databases selectively

Hi all,I am establishing a connection to databricks from Collibra through Spark driver. Collibra expects these details for the connection (for token based):personal access token (pat)server/workspace namehttpPathUpon successful connection, Collibra d...

Data Engineering

1786 Views
1 replies
0 kudos

02-07-2022 1:55:47 AM

View Replies

Latest Reply

Atanu
Databricks Employee

02-12-2022 8:53:30 AM

0 kudos

PAT token is integrated with the workspace, So it will get access of all hive. Is there anyway you can filter out with Collibra?

0 kudos

02-12-2022 8:53:30 AM

by brickster_2018 • Databricks Employee

06-25-2021 3:24:25 PM

2867 Views
1 replies
1 kudos

Resolved! How to capture the heap dump of the Spark driver JVM

Data Engineering

2867 Views
1 replies
1 kudos

06-25-2021 3:24:25 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 3:26:29 PM

1 kudos

Find the DriverDaemon%sh jpsTake the heap dump%sh jmap -dump:live,format=b,file=pbs_worker_DriverDaemon.hprof 2413Copy out to download%sh cp pbs_worker_DriverDaemon.hprof /dbfs/FileStore/pbs_worker_04-30-2021T15-50-00.hprof

1 kudos

06-25-2021 3:26:29 PM

by brickster_2018 • Databricks Employee

06-23-2021 7:03:43 AM

7975 Views
1 replies
0 kudos

Resolved! Why do I always see "Executor heartbeat timed out" messages in the Spark Driver logs

Often, I see "Executor heartbeat timed out" messages in the Spark driver logs. Sometimes job fails with this error. Will increasing "spark.executor.heartbeatInterval" help to mitigate the issue ?

Data Engineering

7975 Views
1 replies
0 kudos

06-23-2021 7:03:43 AM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-23-2021 7:07:50 AM

0 kudos

This is a common misconception that increasing "spark.executor.heartbeatInterval" will help to mitigate or resolve the heartbeat issues. In fact, increasing the spark.executor.heartbeatInterval will increase the chance of the error and worse the situ...

0 kudos

06-23-2021 7:07:50 AM

Databricks Community

Resolved! Is Spark Driver a synonym for Spark Master daemon

Resolved! Error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

In any Spark application, Spark driver plays a critical role and performs the following functions: 1. Initiating a Spark Session 2. Communicating with...

The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

Resolved! Concurrent Jobs - The spark driver has stopped unexpectedly!

How can i use the same spark session from onenotebook to another notebook in databricks

Resolved! How to delay a new job run after job

My whole code is running on driver node, I want my code to run on worker nodes so that the memory of driver node is not exhausted. Please tell me improvement is my codes. My spark crashes frequently when the pulled data from s3 is huge.

Analyzing 23 GB JSON file

Resolved! Spark Driver Out of Memory Issue

Resolved! Databricks PAT (personal access token) with access to databases selectively

Resolved! How to capture the heap dump of the Spark driver JVM

Resolved! Why do I always see "Executor heartbeat timed out" messages in the Spark Driver logs