- 3739 Views
- 5 replies
- 4 kudos
I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.Even when I try to just show the data it giv...
- 3739 Views
- 5 replies
- 4 kudos
Latest Reply
Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.
4 More Replies
- 3777 Views
- 3 replies
- 0 kudos
df.printSchema()root |-- Device_ID: string (nullable = true) |-- Location: string (nullable = true) |-- Latitude: double (nullable = true) |-- Longitude: double (nullable = true) |-- DateTime: timestamp (nullable = true) |-- Rainfall_Value: double (n...
- 3777 Views
- 3 replies
- 0 kudos
Latest Reply
Hi @THIAM HUAT TAN​ , The issue is because the schema defined for the column "Rainfall_Value" is of DoubleType and the values present in the data frame are of Integer type. This could be because of one or multiple values. Depending on the data, you ...
2 More Replies
by
Nis
• New Contributor II
- 943 Views
- 1 replies
- 2 kudos
I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...
- 943 Views
- 1 replies
- 2 kudos
Latest Reply
Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?
- 1505 Views
- 3 replies
- 0 kudos
When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...
- 1505 Views
- 3 replies
- 0 kudos
Latest Reply
Hi @Amrendra Kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...
2 More Replies
by
SS2
• Valued Contributor
- 1266 Views
- 2 replies
- 1 kudos
Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.
- 1266 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @S S​,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details
1 More Replies
by
BL
• New Contributor III
- 3121 Views
- 4 replies
- 3 kudos
I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...
- 3121 Views
- 4 replies
- 3 kudos
Latest Reply
Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...
3 More Replies
by
Prem1
• New Contributor III
- 9180 Views
- 21 replies
- 11 kudos
I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day ...
- 9180 Views
- 21 replies
- 11 kudos
Latest Reply
Hi Everyone,I'm seeing this issue as well - same configuration of the previous posts, using autoloader with incremental file listing turned on. The strange part is that it mostly works despite almost all of the files we're loading having colons incl...
20 More Replies
- 1494 Views
- 1 replies
- 1 kudos
I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...
- 1494 Views
- 1 replies
- 1 kudos
Latest Reply
Hi @Manjusha Unnikrishnan​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.
by
Bujji
• New Contributor II
- 3493 Views
- 2 replies
- 3 kudos
Hi, I am working as azure support engineerI found this error while I am checking the pipeline failure, and showing below error"org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 72403.0 failed 4 times, most recent fail...
- 3493 Views
- 2 replies
- 3 kudos
Latest Reply
Hi @mahesh bmk​, We haven’t heard from you since the last response from @Pat Sienkiewicz​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to...
1 More Replies
by
shamly
• New Contributor III
- 2330 Views
- 2 replies
- 3 kudos
when I try to read parquet file from Azure datalake container from databricks, I am getting spark exception. Below is my queryimport pyarrow.parquet as pqfrom pyspark.sql.functions import *from datetime import datetimedata = spark.read.parquet(f"/mnt...
- 2330 Views
- 2 replies
- 3 kudos
Latest Reply
Hi @shamly pt​ , more info are needed to solve the issue. However common problems are:The storage is not mountThat file doesn't exists in the mounted storageAlso, there is no need to use an f-string if there are no curly brackets with expressions in ...
1 More Replies
- 2060 Views
- 1 replies
- 0 kudos
I get the below error when trying to run multi-threading - fails towards the end of the run. My guess is it's related to memory/worker config. I've seen some solutions involving modifying the number of workers or CPU on the cluster - however that's n...
- 2060 Views
- 1 replies
- 0 kudos
Latest Reply
Since I don't have permissions to change cluster configurations, the only solution that ended up working was setting a max thread count to about half of the actual max so I don't overload the containers. However, open to any other optimization ideas!
- 737 Views
- 0 replies
- 0 kudos
Heyooooo!I'm using Azure Databricks and sparklyr to do some geospatial analysis.Before I actually work with Spark Dataframes, I've been using the R packagesstarsandsfto do some preprocessing on my data so that it's easier to interact with later.In or...
- 737 Views
- 0 replies
- 0 kudos
- 767 Views
- 0 replies
- 4 kudos
Hi Team,I am trying to run a streaming job in databricks, used Autoloader approach for reading the files from the Azure Datalake Gen2 which is in parquet format. I have created a new checkpoint, so first offset is getting created but throwing an erro...
- 767 Views
- 0 replies
- 4 kudos
- 999 Views
- 0 replies
- 0 kudos
Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...
- 999 Views
- 0 replies
- 0 kudos
- 1361 Views
- 2 replies
- 1 kudos
Hello friends,I have a DataFrame with specific values. I am trying to find specific values out of it. *I/P -|ID | text ||:--|:------||1 | select distinct Col1 as OrderID from Table1 WHERE ( (Col3 Like '%ABC%') OR (Col3 Like '%DEF%') OR (Col3 Like '...
- 1361 Views
- 2 replies
- 1 kudos
Latest Reply
What is the logic for substring function?Can't you use str1[idxi+14:3] for substring?
1 More Replies