cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

FRG96
by New Contributor III
  • 17884 Views
  • 6 replies
  • 7 kudos

Resolved! How to programmatically get the Spark Job ID of a running Spark Task?

In Spark we can get the Spark Application ID inside the Task programmatically using:SparkEnv.get.blockManager.conf.getAppIdand we can get the Stage ID and Task Attempt ID of the running Task using:TaskContext.get.stageId TaskContext.get.taskAttemptId...

  • 17884 Views
  • 6 replies
  • 7 kudos
Latest Reply
FRG96
New Contributor III
  • 7 kudos

Hi @Gaurav Rupnar​ , I have Spark SQL UDFs (implemented as Scala methods) in which I want to get the details of the Spark SQL query that called the UDF, especially a unique query ID, which in SparkSQL is the Spark Job ID. That's why I wanted a way to...

  • 7 kudos
5 More Replies
Thom
by New Contributor
  • 298 Views
  • 0 replies
  • 0 kudos

There seems to be missing lesson files in the repo I downloaded for the Data Engineering with Databricks course. The lesson Advanced SQL Transformati...

There seems to be missing lesson files in the repo I downloaded for the Data Engineering with Databricks course. The lesson Advanced SQL Transformations refers to files that aren't in the repo. One or two other lessons were missing as well.

  • 298 Views
  • 0 replies
  • 0 kudos
KC_1205
by New Contributor III
  • 1751 Views
  • 4 replies
  • 3 kudos

Resolved! NumPy update 1.18-1.21

Hi all,I am planning to update the DB to 9.1 LTS from 7.3 LTS, corresponding NumPy version will be 1.19 and later would like to update 1.21 in the notebooks. At cluster I have Spark version related to the 9.1 LTS which will support 1.19 and notebook ...

  • 1751 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Kiran Chalasani​ , How are you doing? Do you still need help or you've solved your problem?

  • 3 kudos
3 More Replies
Lincoln_Bergeso
by New Contributor II
  • 5413 Views
  • 10 replies
  • 5 kudos

Resolved! How do I read the contents of a hidden file in a Spark job?

I'm trying to read a file from a Google Cloud Storage bucket. The filename starts with a period, so Spark assumes the file is hidden and won't let me read it.My code is similar to this:from pyspark.sql import SparkSession   spark = SparkSession.build...

  • 5413 Views
  • 10 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Lincoln Bergeson​ , Did @Dan Zafar​ 's response help you solve your problem?

  • 5 kudos
9 More Replies
amil
by New Contributor
  • 559 Views
  • 1 replies
  • 0 kudos

Hi Kaniz Fatma, I have verification done successfully however the mail hasn't come to the mail. mail: ss4699@srmist.edu.in Kindly help. Regards,Si...

Hi Kaniz Fatma,I have verification done successfully however the mail hasn't come to the mail.mail: ss4699@srmist.edu.inKindly help.Regards,Siva

  • 559 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @sivabalan Selvaraj​ , Thank you for reaching out!Let us look into this for you, and we'll follow up with an update.

  • 0 kudos
amil
by New Contributor
  • 472 Views
  • 1 replies
  • 0 kudos

Hi Kaniz , I am unable to access data bricks Community edition ever after solving the puzzle. Mail : amilsivabalan@gmail.com Kindly help.  Regards,Siv...

Hi Kaniz ,I am unable to access data bricks Community edition ever after solving the puzzle.Mail : amilsivabalan@gmail.comKindly help.Regards,Sivabalan S

  • 472 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @sivabalan Selvaraj​ , Thank you for reaching out!Let us look into this for you, and we'll follow up with an update.

  • 0 kudos
YFL
by New Contributor III
  • 4454 Views
  • 11 replies
  • 6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

  • 4454 Views
  • 11 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Yerachmiel Feltzman​ I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 6 kudos
10 More Replies
LukaszJ
by Contributor III
  • 579 Views
  • 0 replies
  • 0 kudos

Real time query plotting

Hello,I have a table on Azure Databricks that I keep updating with the "A" notebook.And I want to real time plotting the query result from the table (let's say SELECT COUNT(name), name FROM my_schema.my_table GROUP BY name).I know about Azure Applica...

  • 579 Views
  • 0 replies
  • 0 kudos
LukaszJ
by Contributor III
  • 1246 Views
  • 3 replies
  • 1 kudos

Table access control cluster with R language

Hello,I want to have a high concurrency cluster with table access control and I want to use R language on it.I know that the documentation says that R and Scala is not available with table access control.But maybe you have some tricks or best practic...

  • 1246 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Łukasz Jaremek​ , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke​'s response help you to find the solution? Please let us know.

  • 1 kudos
2 More Replies
Vee
by New Contributor
  • 3909 Views
  • 2 replies
  • 1 kudos

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Please could you suggest best cluster configuration for a use case stated below and tips to resolve the errors shown below -Use case:There could be 4 or 5 spark jobs that run concurrently.Each job reads 40 input files and spits out 120 output files ...

  • 3909 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Vetrivel Senthil​ , Just a friendly follow-up. Do you still need help? Please let us know.

  • 1 kudos
1 More Replies
samrachmiletter
by New Contributor III
  • 2248 Views
  • 4 replies
  • 5 kudos

Resolved! Is it possible to set order of precedence of spark SQL extensions?

I have the iceberg SQL extension installed, but running commands such as MERGE INTO result in the error pyspark.sql.utils.AnalysisException: MERGE destination only supports Delta sources.this seems to be due to using Delta's MERGE command as opposed ...

  • 2248 Views
  • 4 replies
  • 5 kudos
Latest Reply
samrachmiletter
New Contributor III
  • 5 kudos

This does help. I tried going through the DataFrameReader as well but ran into the same error, so it seems it is indeed not possible. Thank you @Hubert Dudek​!

  • 5 kudos
3 More Replies
laus
by New Contributor III
  • 17756 Views
  • 4 replies
  • 2 kudos

Resolved! How to solve Py4JJavaError: An error occurred while calling o5082.csv. : org.apache.spark.SparkException: Job aborted. when writing to csv

Hi ,I get the error: Py4JJavaError: An error occurred while calling o5082.csv.: org.apache.spark.SparkException: Job aborted. when writing to csv.Screenshot below with detail error.Any idea how to solve it?Thanks!

Screenshot 2022-03-31 at 17.33.26
  • 17756 Views
  • 4 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 2 kudos

Please try output.coalesce(1).write.option("header","true").format("csv").save("path")It seems to be same to https://community.databricks.com/s/topic/0TO3f000000CjVqGAK/py4jjavaerror

  • 2 kudos
3 More Replies
Ben_Spark
by New Contributor III
  • 4989 Views
  • 9 replies
  • 2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

  • 4989 Views
  • 9 replies
  • 2 kudos
Latest Reply
Ben_Spark
New Contributor III
  • 2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

  • 2 kudos
8 More Replies
alejandrofm
by Valued Contributor
  • 2305 Views
  • 6 replies
  • 1 kudos

Resolved! Can't see execution plan graph on all-purpose cluster

On a current running all-purpose-cluster, enter the spark UI, then SQL, and into a task, you can see the details, and SQL properties but the visualization doesn't appear, that graph is very useful to debug certain scenarios.It works fine on jobs.Any ...

  • 2305 Views
  • 6 replies
  • 1 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 1 kudos

I'm on Chrome, sometimes appears sometimes not, will look more into that to give something reproducible. Thanks!

  • 1 kudos
5 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels