cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Swostiman
by New Contributor II
  • 3363 Views
  • 5 replies
  • 4 kudos

Consuming data from databricks[Hive metastore] sql endpoint using pyspark

I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.Even when I try to just show the data it giv...

  • 3363 Views
  • 5 replies
  • 4 kudos
Latest Reply
sucan
New Contributor II
  • 4 kudos

Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.

  • 4 kudos
4 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 3168 Views
  • 3 replies
  • 0 kudos

Parquet column cannot be converted. Column: [Rainfall_Value], Expected: DoubleType, Found: INT64

df.printSchema()root |-- Device_ID: string (nullable = true) |-- Location: string (nullable = true) |-- Latitude: double (nullable = true) |-- Longitude: double (nullable = true) |-- DateTime: timestamp (nullable = true) |-- Rainfall_Value: double (n...

  • 3168 Views
  • 3 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

Hi @THIAM HUAT TAN​ , The issue is because the schema defined for the column "Rainfall_Value" is of DoubleType and the values present in the data frame are of Integer type. This could be because of one or multiple values. Depending on the data, you ...

  • 0 kudos
2 More Replies
Nis
by New Contributor II
  • 817 Views
  • 1 replies
  • 2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

  • 817 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?

  • 2 kudos
kumarPerry
by New Contributor II
  • 1303 Views
  • 3 replies
  • 0 kudos

Notebook connectivity issue with aws s3 bucket using mounting

When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...

  • 1303 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Amrendra Kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 0 kudos
2 More Replies
SS2
by Valued Contributor
  • 1107 Views
  • 2 replies
  • 1 kudos

Spark out of memory error.

Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.

  • 1107 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @S S​,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details

  • 1 kudos
1 More Replies
BL
by New Contributor III
  • 2767 Views
  • 4 replies
  • 3 kudos

Error reading in Parquet file

I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...

  • 2767 Views
  • 4 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...

  • 3 kudos
3 More Replies
Prem1
by New Contributor III
  • 7959 Views
  • 21 replies
  • 11 kudos

java.lang.IllegalArgumentException: java.net.URISyntaxException

I am using Databricks Autoloader to load JSON files from ADLS gen2 incrementally in directory listing mode. All source filename has Timestamp on them. The autoloader works perfectly couple of days with the below configuration and breaks the next day ...

  • 7959 Views
  • 21 replies
  • 11 kudos
Latest Reply
jshields
New Contributor II
  • 11 kudos

Hi Everyone,I'm seeing this issue as well - same configuration of the previous posts, using autoloader with incremental file listing turned on. The strange part is that it mostly works despite almost all of the files we're loading having colons incl...

  • 11 kudos
20 More Replies
Manjusha
by New Contributor II
  • 1338 Views
  • 1 replies
  • 1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

  • 1338 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Manjusha Unnikrishnan​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 1 kudos
Bujji
by New Contributor II
  • 3169 Views
  • 2 replies
  • 3 kudos

How to resolve our of memory error?

Hi, I am working as azure support engineerI found this error while I am checking the pipeline failure, and showing below error"org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 72403.0 failed 4 times, most recent fail...

  • 3169 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @mahesh bmk​, We haven’t heard from you since the last response from @Pat Sienkiewicz​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 3 kudos
1 More Replies
shamly
by New Contributor III
  • 2057 Views
  • 2 replies
  • 3 kudos

spark exception error while reading a parquet file

when I try to read parquet file from Azure datalake container from databricks, I am getting spark exception. Below is my queryimport pyarrow.parquet as pqfrom pyspark.sql.functions import *from datetime import datetimedata = spark.read.parquet(f"/mnt...

  • 2057 Views
  • 2 replies
  • 3 kudos
Latest Reply
DavideAnghileri
Contributor
  • 3 kudos

Hi @shamly pt​ , more info are needed to solve the issue. However common problems are:The storage is not mountThat file doesn't exists in the mounted storageAlso, there is no need to use an f-string if there are no curly brackets with expressions in ...

  • 3 kudos
1 More Replies
pjp94
by Contributor
  • 1852 Views
  • 1 replies
  • 0 kudos

ERROR - Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

I get the below error when trying to run multi-threading - fails towards the end of the run. My guess is it's related to memory/worker config. I've seen some solutions involving modifying the number of workers or CPU on the cluster - however that's n...

  • 1852 Views
  • 1 replies
  • 0 kudos
Latest Reply
pjp94
Contributor
  • 0 kudos

Since I don't have permissions to change cluster configurations, the only solution that ended up working was setting a max thread count to about half of the actual max so I don't overload the containers. However, open to any other optimization ideas!

  • 0 kudos
Brendon_Daugher
by New Contributor II
  • 667 Views
  • 0 replies
  • 0 kudos

Understanding Dependency Update Failure

Heyooooo!I'm using Azure Databricks and sparklyr to do some geospatial analysis.Before I actually work with Spark Dataframes, I've been using the R packagesstarsandsfto do some preprocessing on my data so that it's easier to interact with later.In or...

  • 667 Views
  • 0 replies
  • 0 kudos
Will_Sullivan
by New Contributor
  • 914 Views
  • 0 replies
  • 0 kudos

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...

  • 914 Views
  • 0 replies
  • 0 kudos
Tejas1987
by New Contributor II
  • 1180 Views
  • 2 replies
  • 1 kudos

Resolved! Finding multiple substrings from a DataFrame column dynamically?

Hello friends,I have a DataFrame with specific values. I am trying to find specific values out of it.   *I/P -|ID | text ||:--|:------||1 | select distinct Col1 as OrderID from Table1 WHERE ( (Col3 Like '%ABC%') OR (Col3 Like '%DEF%') OR (Col3 Like '...

  • 1180 Views
  • 2 replies
  • 1 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 1 kudos

What is the logic for substring function?Can't you use str1[idxi+14:3] for substring?

  • 1 kudos
1 More Replies
Labels