cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nadia
by New Contributor II
  • 17079 Views
  • 3 replies
  • 2 kudos

Resolved! Executor heartbeat timed out

Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in sta...

  • 17079 Views
  • 3 replies
  • 2 kudos
Latest Reply
JunYang
New Contributor III
  • 2 kudos

Please also review the Spark UI to see the failed Spark job and Spark stage. Please check on the GC time and data spill to memory and disk. See if there is any error in the failed task in the Spark stage view. This will confirm data skew or GC/memory...

  • 2 kudos
2 More Replies
Swostiman
by New Contributor II
  • 4298 Views
  • 5 replies
  • 4 kudos

Consuming data from databricks[Hive metastore] sql endpoint using pyspark

I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.Even when I try to just show the data it giv...

  • 4298 Views
  • 5 replies
  • 4 kudos
Latest Reply
sucan
New Contributor II
  • 4 kudos

Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.

  • 4 kudos
4 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 4542 Views
  • 3 replies
  • 0 kudos

Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64

df.printSchema()root |-- Device_ID: string (nullable = true) |-- Location: string (nullable = true) |-- Latitude: double (nullable = true) |-- Longitude: double (nullable = true) |-- DateTime: timestamp (nullable = true) |-- Rainfall_Value: double (n...

  • 4542 Views
  • 3 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

Hi @THIAM HUAT TAN​ , The issue is because the schema defined for the column "Rainfall_Value" is of DoubleType and the values present in the data frame are of Integer type. This could be because of one or multiple values. Depending on the data, you ...

  • 0 kudos
2 More Replies
Nis
by New Contributor II
  • 1124 Views
  • 1 replies
  • 2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

  • 1124 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?

  • 2 kudos
kumarPerry
by New Contributor II
  • 1889 Views
  • 3 replies
  • 0 kudos

Notebook connectivity issue with aws s3 bucket using mounting

When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...

  • 1889 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Amrendra Kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 0 kudos
2 More Replies
SS2
by Valued Contributor
  • 1430 Views
  • 2 replies
  • 1 kudos

Spark out of memory error.

Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.

  • 1430 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @S S​,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details

  • 1 kudos
1 More Replies
BL
by New Contributor III
  • 3536 Views
  • 4 replies
  • 3 kudos

Error reading in Parquet file

I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...

  • 3536 Views
  • 4 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...

  • 3 kudos
3 More Replies
Prem1
by New Contributor III
  • 11043 Views
  • 21 replies
  • 11 kudos

java.lang.IllegalArgumentException: java.net.URISyntaxException

I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day ...

  • 11043 Views
  • 21 replies
  • 11 kudos
Latest Reply
jshields
New Contributor II
  • 11 kudos

Hi Everyone,I'm seeing this issue as well - same configuration of the previous posts, using autoloader with incremental file listing turned on. The strange part is that it mostly works despite almost all of the files we're loading having colons incl...

  • 11 kudos
20 More Replies
Manjusha
by New Contributor II
  • 1661 Views
  • 1 replies
  • 1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

  • 1661 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Manjusha Unnikrishnan​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 1 kudos
Bujji
by New Contributor II
  • 3953 Views
  • 2 replies
  • 3 kudos

How to resolve our of memory error?

Hi, I am working as azure support engineerI found this error while I am checking the pipeline failure, and showing below error"org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 72403.0 failed 4 times, most recent fail...

  • 3953 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @mahesh bmk​, We haven’t heard from you since the last response from @Pat Sienkiewicz​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 3 kudos
1 More Replies
shamly
by New Contributor III
  • 2619 Views
  • 2 replies
  • 3 kudos

spark exception error while reading a parquet file

when I try to read parquet file from Azure datalake container from databricks, I am getting spark exception. Below is my queryimport pyarrow.parquet as pqfrom pyspark.sql.functions import *from datetime import datetimedata = spark.read.parquet(f"/mnt...

  • 2619 Views
  • 2 replies
  • 3 kudos
Latest Reply
DavideAnghileri
Contributor
  • 3 kudos

Hi @shamly pt​ , more info are needed to solve the issue. However common problems are:The storage is not mountThat file doesn't exists in the mounted storageAlso, there is no need to use an f-string if there are no curly brackets with expressions in ...

  • 3 kudos
1 More Replies
pjp94
by Contributor
  • 2280 Views
  • 1 replies
  • 0 kudos

ERROR - Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

I get the below error when trying to run multi-threading - fails towards the end of the run. My guess is it's related to memory/worker config. I've seen some solutions involving modifying the number of workers or CPU on the cluster - however that's n...

  • 2280 Views
  • 1 replies
  • 0 kudos
Latest Reply
pjp94
Contributor
  • 0 kudos

Since I don't have permissions to change cluster configurations, the only solution that ended up working was setting a max thread count to about half of the actual max so I don't overload the containers. However, open to any other optimization ideas!

  • 0 kudos
Brendon_Daugher
by New Contributor II
  • 880 Views
  • 0 replies
  • 0 kudos

Understanding Dependency Update Failure

Heyooooo!I'm using Azure Databricks and sparklyr to do some geospatial analysis.Before I actually work with Spark Dataframes, I've been using the R packagesstarsandsfto do some preprocessing on my data so that it's easier to interact with later.In or...

  • 880 Views
  • 0 replies
  • 0 kudos
Will_Sullivan
by New Contributor
  • 1134 Views
  • 0 replies
  • 0 kudos

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...

  • 1134 Views
  • 0 replies
  • 0 kudos
Labels