cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Thom
by New Contributor
  • 189 Views
  • 0 replies
  • 0 kudos

There seems to be missing lesson files in the repo I downloaded for the Data Engineering with Databricks course. The lesson Advanced SQL Transformati...

There seems to be missing lesson files in the repo I downloaded for the Data Engineering with Databricks course. The lesson Advanced SQL Transformations refers to files that aren't in the repo. One or two other lessons were missing as well.

  • 189 Views
  • 0 replies
  • 0 kudos
KC_1205
by New Contributor III
  • 1108 Views
  • 4 replies
  • 3 kudos

Resolved! NumPy update 1.18-1.21

Hi all,I am planning to update the DB to 9.1 LTS from 7.3 LTS, corresponding NumPy version will be 1.19 and later would like to update 1.21 in the notebooks. At cluster I have Spark version related to the 9.1 LTS which will support 1.19 and notebook ...

  • 1108 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Kiran Chalasani​ , How are you doing? Do you still need help or you've solved your problem?

  • 3 kudos
3 More Replies
Lincoln_Bergeso
by New Contributor II
  • 3935 Views
  • 10 replies
  • 5 kudos

Resolved! How do I read the contents of a hidden file in a Spark job?

I'm trying to read a file from a Google Cloud Storage bucket. The filename starts with a period, so Spark assumes the file is hidden and won't let me read it.My code is similar to this:from pyspark.sql import SparkSession   spark = SparkSession.build...

  • 3935 Views
  • 10 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @Lincoln Bergeson​ , Did @Dan Zafar​ 's response help you solve your problem?

  • 5 kudos
9 More Replies
amil
by New Contributor
  • 366 Views
  • 1 replies
  • 0 kudos

Hi Kaniz Fatma, I have verification done successfully however the mail hasn't come to the mail. mail: ss4699@srmist.edu.in Kindly help. Regards,Si...

Hi Kaniz Fatma,I have verification done successfully however the mail hasn't come to the mail.mail: ss4699@srmist.edu.inKindly help.Regards,Siva

  • 366 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @sivabalan Selvaraj​ , Thank you for reaching out!Let us look into this for you, and we'll follow up with an update.

  • 0 kudos
amil
by New Contributor
  • 276 Views
  • 1 replies
  • 0 kudos

Hi Kaniz , I am unable to access data bricks Community edition ever after solving the puzzle. Mail : amilsivabalan@gmail.com Kindly help.  Regards,Siv...

Hi Kaniz ,I am unable to access data bricks Community edition ever after solving the puzzle.Mail : amilsivabalan@gmail.comKindly help.Regards,Sivabalan S

  • 276 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @sivabalan Selvaraj​ , Thank you for reaching out!Let us look into this for you, and we'll follow up with an update.

  • 0 kudos
YFL
by New Contributor III
  • 3090 Views
  • 11 replies
  • 6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

  • 3090 Views
  • 11 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Yerachmiel Feltzman​ I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 6 kudos
10 More Replies
LukaszJ
by Contributor III
  • 396 Views
  • 0 replies
  • 0 kudos

Real time query plotting

Hello,I have a table on Azure Databricks that I keep updating with the "A" notebook.And I want to real time plotting the query result from the table (let's say SELECT COUNT(name), name FROM my_schema.my_table GROUP BY name).I know about Azure Applica...

  • 396 Views
  • 0 replies
  • 0 kudos
LukaszJ
by Contributor III
  • 893 Views
  • 3 replies
  • 1 kudos

Table access control cluster with R language

Hello,I want to have a high concurrency cluster with table access control and I want to use R language on it.I know that the documentation says that R and Scala is not available with table access control.But maybe you have some tricks or best practic...

  • 893 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Łukasz Jaremek​ , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke​'s response help you to find the solution? Please let us know.

  • 1 kudos
2 More Replies
Vee
by New Contributor
  • 3199 Views
  • 2 replies
  • 1 kudos

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Please could you suggest best cluster configuration for a use case stated below and tips to resolve the errors shown below -Use case:There could be 4 or 5 spark jobs that run concurrently.Each job reads 40 input files and spits out 120 output files ...

  • 3199 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Vetrivel Senthil​ , Just a friendly follow-up. Do you still need help? Please let us know.

  • 1 kudos
1 More Replies
samrachmiletter
by New Contributor III
  • 1485 Views
  • 4 replies
  • 5 kudos

Resolved! Is it possible to set order of precedence of spark SQL extensions?

I have the iceberg SQL extension installed, but running commands such as MERGE INTO result in the error pyspark.sql.utils.AnalysisException: MERGE destination only supports Delta sources.this seems to be due to using Delta's MERGE command as opposed ...

  • 1485 Views
  • 4 replies
  • 5 kudos
Latest Reply
samrachmiletter
New Contributor III
  • 5 kudos

This does help. I tried going through the DataFrameReader as well but ran into the same error, so it seems it is indeed not possible. Thank you @Hubert Dudek​!

  • 5 kudos
3 More Replies
laus
by New Contributor III
  • 14278 Views
  • 4 replies
  • 2 kudos

Resolved! How to solve Py4JJavaError: An error occurred while calling o5082.csv. : org.apache.spark.SparkException: Job aborted. when writing to csv

Hi ,I get the error: Py4JJavaError: An error occurred while calling o5082.csv.: org.apache.spark.SparkException: Job aborted. when writing to csv.Screenshot below with detail error.Any idea how to solve it?Thanks!

Screenshot 2022-03-31 at 17.33.26
  • 14278 Views
  • 4 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 2 kudos

Please try output.coalesce(1).write.option("header","true").format("csv").save("path")It seems to be same to https://community.databricks.com/s/topic/0TO3f000000CjVqGAK/py4jjavaerror

  • 2 kudos
3 More Replies
Ben_Spark
by New Contributor III
  • 3558 Views
  • 9 replies
  • 2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

  • 3558 Views
  • 9 replies
  • 2 kudos
Latest Reply
Ben_Spark
New Contributor III
  • 2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

  • 2 kudos
8 More Replies
alejandrofm
by Valued Contributor
  • 1535 Views
  • 6 replies
  • 1 kudos

Resolved! Can't see execution plan graph on all-purpose cluster

On a current running all-purpose-cluster, enter the spark UI, then SQL, and into a task, you can see the details, and SQL properties but the visualization doesn't appear, that graph is very useful to debug certain scenarios.It works fine on jobs.Any ...

  • 1535 Views
  • 6 replies
  • 1 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 1 kudos

I'm on Chrome, sometimes appears sometimes not, will look more into that to give something reproducible. Thanks!

  • 1 kudos
5 More Replies
AmanSehgal
by Honored Contributor III
  • 4397 Views
  • 2 replies
  • 10 kudos

Resolved! How to merge all the columns into one column as JSON?

I have a task to transform a dataframe. The task is to collect all the columns in a row and embed it into a JSON string as a column.Source DF:Target DF: 

image image
  • 4397 Views
  • 2 replies
  • 10 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 10 kudos

I was able to do this by converting df to rdd and then by applying map function to it.rdd_1 = df.rdd.map(lambda row: (row['ID'], row.asDict() ) )   ...

  • 10 kudos
1 More Replies
Labels
Top Kudoed Authors