Data Engineering

Forum Posts

Sorted by:

by nadia • New Contributor II

06-12-2022 2:19:33 PM

25204 Views
4 replies
2 kudos

Resolved! Executor heartbeat timed out

Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in sta...

Data Engineering

25204 Views
4 replies
2 kudos

06-12-2022 2:19:33 PM

View Replies

Latest Reply

SparkJun
Databricks Employee

06-18-2024 1:52:44 PM

2 kudos

Please also review the Spark UI to see the failed Spark job and Spark stage. Please check on the GC time and data spill to memory and disk. See if there is any error in the failed task in the Spark stage view. This will confirm data skew or GC/memory...

2 kudos

06-18-2024 1:52:44 PM

3 More Replies

by Swostiman • New Contributor II

05-25-2023 2:57:43 AM

6113 Views
5 replies
4 kudos

Consuming data from databricks[Hive metastore] sql endpoint using pyspark

I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.Even when I try to just show the data it giv...

Data Engineering

6113 Views
5 replies
4 kudos

05-25-2023 2:57:43 AM

View Replies

Latest Reply

sucan
New Contributor II

08-11-2023 5:05:04 PM

4 kudos

Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.

4 kudos

08-11-2023 5:05:04 PM

4 More Replies

by THIAM_HUATTAN • Valued Contributor

06-19-2023 6:41:58 AM

7710 Views
3 replies
0 kudos

Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64

Data Engineering

7710 Views
3 replies
0 kudos

06-19-2023 6:41:58 AM

View Replies

Latest Reply

Lakshay
Databricks Employee

06-20-2023 5:59:55 AM

0 kudos

Hi @THIAM HUAT TAN , The issue is because the schema defined for the column "Rainfall_Value" is of DoubleType and the values present in the data frame are of Integer type. This could be because of one or multiple values. Depending on the data, you ...

0 kudos

06-20-2023 5:59:55 AM

2 More Replies

by Nis • New Contributor II

05-17-2023 7:01:35 AM

1715 Views
1 replies
2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

Data Engineering

1715 Views
1 replies
2 kudos

05-17-2023 7:01:35 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-06-2023 11:39:55 AM

2 kudos

Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?

2 kudos

06-06-2023 11:39:55 AM

by kumarPerry • New Contributor II

04-11-2023 10:46:49 AM

3045 Views
3 replies
0 kudos

Notebook connectivity issue with aws s3 bucket using mounting

When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...

Data Engineering

3045 Views
3 replies
0 kudos

04-11-2023 10:46:49 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 11:50:12 PM

0 kudos

Hi @Amrendra Kumar Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

0 kudos

04-15-2023 11:50:12 PM

2 More Replies

by SS2 • Valued Contributor

11-29-2022 12:06:54 PM

1997 Views
2 replies
1 kudos

Spark out of memory error.

Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.

Data Engineering

1997 Views
2 replies
1 kudos

11-29-2022 12:06:54 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

01-30-2023 4:38:22 PM

1 kudos

Hi @S S,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details

1 kudos

01-30-2023 4:38:22 PM

1 More Replies

by BL • New Contributor III

01-14-2023 4:09:25 AM

5018 Views
4 replies
3 kudos

Error reading in Parquet file

I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...

Data Engineering

5018 Views
4 replies
3 kudos

01-14-2023 4:09:25 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

01-30-2023 2:51:18 PM

3 kudos

Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...

3 kudos

01-30-2023 2:51:18 PM

3 More Replies

by Prem1 • New Contributor III

08-10-2022 3:00:57 PM

16884 Views
21 replies
11 kudos

java.lang.IllegalArgumentException: java.net.URISyntaxException

I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day ...

Data Engineering

16884 Views
21 replies
11 kudos

08-10-2022 3:00:57 PM

View Replies

Latest Reply

jshields
New Contributor II

01-04-2023 6:56:35 AM

11 kudos

Hi Everyone,I'm seeing this issue as well - same configuration of the previous posts, using autoloader with incremental file listing turned on. The strange part is that it mostly works despite almost all of the files we're loading having colons incl...

11 kudos

01-04-2023 6:56:35 AM

20 More Replies

by Manjusha • New Contributor II

10-13-2022 5:16:00 AM

2374 Views
1 replies
1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

Data Engineering

2374 Views
1 replies
1 kudos

10-13-2022 5:16:00 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-24-2022 10:36:21 PM

1 kudos

Hi @Manjusha Unnikrishnan Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

1 kudos

11-24-2022 10:36:21 PM

by shamly • New Contributor III

11-17-2022 11:35:32 AM

3692 Views
2 replies
3 kudos

spark exception error while reading a parquet file

when I try to read parquet file from Azure datalake container from databricks, I am getting spark exception. Below is my queryimport pyarrow.parquet as pqfrom pyspark.sql.functions import *from datetime import datetimedata = spark.read.parquet(f"/mnt...

Data Engineering

3692 Views
2 replies
3 kudos

11-17-2022 11:35:32 AM

View Replies

Latest Reply

DavideAnghileri
Contributor

11-19-2022 3:28:43 AM

3 kudos

Hi @shamly pt , more info are needed to solve the issue. However common problems are:The storage is not mountThat file doesn't exists in the mounted storageAlso, there is no need to use an f-string if there are no curly brackets with expressions in ...

3 kudos

11-19-2022 3:28:43 AM

1 More Replies

by Bujji • New Contributor II

11-10-2022 1:14:30 AM

5332 Views
1 replies
3 kudos

How to resolve our of memory error?

Hi, I am working as azure support engineerI found this error while I am checking the pipeline failure, and showing below error"org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 72403.0 failed 4 times, most recent fail...

Data Engineering

5332 Views
1 replies
3 kudos

11-10-2022 1:14:30 AM

View Replies

Latest Reply

Pat
Honored Contributor III

11-10-2022 1:55:58 AM

3 kudos

Hi @mahesh bmk ,It would be nice to see the sql_query.is there some window function used? You might try to run this on bigger cluster.

3 kudos

11-10-2022 1:55:58 AM

by pjp94 • Contributor

09-19-2022 11:19:43 AM

2872 Views
1 replies
0 kudos

ERROR - Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

I get the below error when trying to run multi-threading - fails towards the end of the run. My guess is it's related to memory/worker config. I've seen some solutions involving modifying the number of workers or CPU on the cluster - however that's n...

Data Engineering

2872 Views
1 replies
0 kudos

09-19-2022 11:19:43 AM

View Replies

Latest Reply

pjp94
Contributor

09-19-2022 12:56:47 PM

0 kudos

Since I don't have permissions to change cluster configurations, the only solution that ended up working was setting a max thread count to about half of the actual max so I don't overload the containers. However, open to any other optimization ideas!

0 kudos

09-19-2022 12:56:47 PM

by Brendon_Daugher • New Contributor II

09-06-2022 12:25:15 PM

1430 Views
0 replies
0 kudos

Understanding Dependency Update Failure

Heyooooo!I'm using Azure Databricks and sparklyr to do some geospatial analysis.Before I actually work with Spark Dataframes, I've been using the R packagesstarsandsfto do some preprocessing on my data so that it's easier to interact with later.In or...

Data Engineering

1430 Views
0 replies
0 kudos

09-06-2022 12:25:15 PM

by Himanshi • New Contributor III

08-03-2022 11:46:38 PM

1322 Views
0 replies
4 kudos

Databricks streaming job issue with Autoloader for new checkpoint.

Hi Team,I am trying to run a streaming job in databricks, used Autoloader approach for reading the files from the Azure Datalake Gen2 which is in parquet format. I have created a new checkpoint, so first offset is getting created but throwing an erro...

Data Engineering

1322 Views
0 replies
4 kudos

08-03-2022 11:46:38 PM

by Will_Sullivan • New Contributor

07-26-2022 12:46:16 PM

1480 Views
0 replies
0 kudos

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...

Data Engineering

1480 Views
0 replies
0 kudos

07-26-2022 12:46:16 PM