cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 1408 Views
  • 3 replies
  • 3 kudos

Resolved! 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11 Connect Timeout

"Notebook detached Exception when creating execution context: java.net.SocketTimeout Exception: Connect Timeout" when trying to connect my cluster to a notebook. Then "Error trying to handle that request We failed to handle that request, please try a...

  • 1408 Views
  • 3 replies
  • 3 kudos
Latest Reply
Wolverine
New Contributor II
  • 3 kudos

Hello @Kaniz  I am facing same issue I tried changing DBR but it is still giving me error and the cluster is not startingRegardsMS

  • 3 kudos
2 More Replies
FarBo
by New Contributor III
  • 2293 Views
  • 4 replies
  • 5 kudos

Spark issue handling data from json when the schema DataType mismatch occurs

Hi,I have encountered a problem using spark, when creating a dataframe from a raw json source.I have defined an schema for my data and the problem is that when there is a mismatch between one of the column values and its defined schema, spark not onl...

  • 2293 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Farzad Bonabi​ :Thank you for reporting this issue. It seems to be a known bug in Spark when dealing with malformed decimal values. When a decimal value in the input JSON data is not parseable by Spark, it sets not only that column to null but also ...

  • 5 kudos
3 More Replies
User16869510359
by Esteemed Contributor
  • 8065 Views
  • 3 replies
  • 5 kudos

Resolved! How to add I custom logging in Databricks

I want to add custom logs that redirect in the Spark driver logs. Can I use the existing logger classes to have my application logs or progress message in the Spark driver logs.

  • 8065 Views
  • 3 replies
  • 5 kudos
Latest Reply
Kaizen
Contributor III
  • 5 kudos

1) Is it possible to save all the custom logging to its own file? Currently it is being logging with all other cluster logs (see image) 2) Also Databricks it seems like a lot of blank files are also being created for this. Is this a bug? this include...

  • 5 kudos
2 More Replies
Smitha1
by Valued Contributor II
  • 1579 Views
  • 9 replies
  • 3 kudos

Databricks Certified Associate Developer for Apache Spark 3.0

Databricks Certified Associate Developer for Apache Spark 3.0

  • 1579 Views
  • 9 replies
  • 3 kudos
Latest Reply
Shivam_Patil
New Contributor II
  • 3 kudos

Hey I am looking for sample papers for the above exam other than the one provided by databricks do any one have any idea about it

  • 3 kudos
8 More Replies
DJey
by New Contributor III
  • 4159 Views
  • 5 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 4159 Views
  • 5 replies
  • 2 kudos
Latest Reply
DJey
New Contributor III
  • 2 kudos

@Vidula Khanna​  Enabling the below property resolved my issue:spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled",True) Thanks v much!

  • 2 kudos
4 More Replies
User15787040559
by New Contributor III
  • 1008 Views
  • 2 replies
  • 0 kudos

How to translate Apache Pig FILTER statement to Spark?

If you have the following Apache Pig FILTER statement:XCOCD_ACT_Y = FILTER XCOCD BY act_ind == 'Y';the equivalent code in Apache Spark is:XCOCD_ACT_Y_DF = (XCOCD_DF .filter(col("act_ind") == "Y"))

  • 1008 Views
  • 2 replies
  • 0 kudos
Latest Reply
FeliciaWilliam
New Contributor III
  • 0 kudos

Translating an Apache Pig FILTER statement to Spark requires understanding the differences in syntax and functionality between the two processing frameworks. While both aim to filter data, Spark uses a different syntax and approach, typically involvi...

  • 0 kudos
1 More Replies
User16869510359
by Esteemed Contributor
  • 1412 Views
  • 2 replies
  • 0 kudos

Resolved! The driver is temporarily unavailable

My job fails with Driver is temporarily unavailable. Apparently, it's permanently unavailable, because the job is not pausing but failing.

  • 1412 Views
  • 2 replies
  • 0 kudos
Latest Reply
Chalki
New Contributor III
  • 0 kudos

I am facing the same issues .  I am writing in batches using a simple for loop. I don't have any collect statements inside the loop. I am rewriting the partitions with partition overwrite dynamic mode in a huge wide delta table - several tb. The incr...

  • 0 kudos
1 More Replies
Smitha1
by Valued Contributor II
  • 3393 Views
  • 10 replies
  • 9 kudos

Resolved! Request for reattempt voucher. Databricks Certified Associate Developer for Apache Spark 3.0 exam

Hi,I gave Databricks Certified Associate Developer for Apache Spark 3.0 exam today but missed by one percent. I got 68.33% and pass is 70%.I am planning to reattempt the exam, could you kindly give me another opportunity and provide reattempt voucher...

  • 3393 Views
  • 10 replies
  • 9 kudos
Latest Reply
shriya
New Contributor II
  • 9 kudos

Hi,I gave Databricks Certified Associate Developer for Apache Spark 3.0 Python exam yesterday but missed by three percent. I got 66.66% and pass is 70%.I am planning to reattempt the exam, could you kindly give me another opportunity and provide reat...

  • 9 kudos
9 More Replies
Sujitha
by Community Manager
  • 1243 Views
  • 3 replies
  • 2 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers t...

KB Feedback DiscussionIn addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.These...

  • 1243 Views
  • 3 replies
  • 2 kudos
Latest Reply
martinez
New Contributor III
  • 2 kudos

Thanks for sharing!  

  • 2 kudos
2 More Replies
Vsleg
by Contributor
  • 2066 Views
  • 5 replies
  • 3 kudos

Resolved! Issue with Apache Spark™ Programming with Databricks course

Hello,I found an issue with the Apache Spark™ Programming with Databricks courses on Databricks Academy when trying to do the labs. The mount that the courses use for training data is failing with what looks to me like an authentication issue (see sc...

image
  • 2066 Views
  • 5 replies
  • 3 kudos
Latest Reply
Vsleg
Contributor
  • 3 kudos

I found the course Git Repo at (https://github.com/databricks-academy/apache-spark-programming-with-databricks-english), this works so using that instead of the 'apache-spark-programming-with-databricks.dbc' file available in the learning portal. #DA...

  • 3 kudos
4 More Replies
sevvalmehder
by New Contributor II
  • 1108 Views
  • 3 replies
  • 3 kudos

Databricks run-time 12.2 LTS drop function problem

I am getting an error about the `drop function of pyspark` at a cluster using 12.2 LTS. When I check the error I see spark solved that bug, see SPARK-42444. Also when I check maintenance updates page, I saw this solved issue included the Databricks R...

image.png
  • 1108 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Sevval Mehder​ Elevate our community by acknowledging exceptional contributions. Your participation in marking the best answers is a testament to our collective pursuit of knowledge.

  • 3 kudos
2 More Replies
PriyaV
by New Contributor II
  • 6117 Views
  • 5 replies
  • 10 kudos

Suppress output in python notebooks

My dilemma is this - We use PySpark to connect to external data sources via jdbc from within databricks. Every time we issue a spark command, it spits out the connection options including the username, url and password which is not advisable. So, is ...

  • 6117 Views
  • 5 replies
  • 10 kudos
Latest Reply
Pabeggetur
New Contributor II
  • 10 kudos

Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic.youi contact hoursuber eats complaints

  • 10 kudos
4 More Replies
MohamedThanveer
by New Contributor II
  • 515 Views
  • 1 replies
  • 0 kudos

Databricks Certified Associate Developer for Apache Spark 3.0 - Python Cancellation

I have scheduled an examination on 1st June 2023 and due to personal reason, I have cancelled the examination on 26th May 2023 (more than 72 hours) but I am yet to receive the refund amount. In the auto generated mail it is mentioned that the refund ...

image
  • 515 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

adding @Suteja Kanuri​  and @Vidula Khanna​ for visibility

  • 0 kudos
Nis
by New Contributor II
  • 787 Views
  • 1 replies
  • 2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

  • 787 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?

  • 2 kudos
Labels