cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nis
by New Contributor II
  • 1767 Views
  • 1 replies
  • 2 kudos

Best sequence of using Vacuum, optimize, fsck repair and refresh commands.

I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...

  • 1767 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?

  • 2 kudos
Jujiro
by New Contributor III
  • 10281 Views
  • 11 replies
  • 7 kudos

Random error: At least one column must be specified for the table?

I have the following code in a notebook. It is randomly giving me the error, "At least one column must be specified for the table." The error occurs (if at all it occurs) only on the first run after attaching to a cluster.Cluster details:Summary5-1...

dbr-bug
  • 10281 Views
  • 11 replies
  • 7 kudos
Latest Reply
Harold
New Contributor II
  • 7 kudos

Please check if this could help or not:spark.databricks.delta.catalog.update.enabled false

  • 7 kudos
10 More Replies
yunna_wei
by Databricks Employee
  • 915 Views
  • 0 replies
  • 3 kudos

In any Spark application, Spark driver plays a critical role and performs the following functions: 1. Initiating a Spark Session 2. Communicating with...

In any Spark application, Spark driver plays a critical role and performs the following functions:1. Initiating a Spark Session2. Communicating with the cluster manager to request resources (CPU, memory, etc) from the cluster manager for Spark's exec...

  • 915 Views
  • 0 replies
  • 3 kudos
Sweetnesh
by New Contributor
  • 2126 Views
  • 2 replies
  • 0 kudos

Not able to read S3 object through AssumedRoleCredentialProvider

SparkSession spark = SparkSession.builder() .appName("SparkS3Example") .master("local[1]") .getOrCreate(); spark.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", S3_ACCOUNT_KEY); spark.sparkContext().hadoopConf...

  • 2126 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vartika
Databricks Employee
  • 0 kudos

Hi @Sweetnesh Dholariya​,Does @Debayan Mukherjee​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thanks!

  • 0 kudos
1 More Replies
RayelightOP
by New Contributor II
  • 2173 Views
  • 1 replies
  • 2 kudos

Azure Blob Storage sas-keys expired for Apache Spark Tutorial

"Apache Spark programming with databricks" tutorial uses Blob storage parquet files on Azure. To access those files a sas key is used in the configuration files. Those keys were generated 5 years ago, however they expired in the begining of this mont...

  • 2173 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Adding @Vidula Khanna​ and @Kaniz Fatma​ for visibility to help with your request

  • 2 kudos
Smitha1
by Valued Contributor II
  • 1895 Views
  • 3 replies
  • 1 kudos

December exam voucher for Databricks Certified Associate Developer for Apache Spark 3.0 exam

Dear @Jose Gonzalez​  Hope you're having great day. This is of HIGH priority for me, I've to schedule exam in December before slots are full.I gave Databricks Certified Associate Developer for Apache Spark 3.0 exam on 30th Nov but missed by one perc...

  • 1895 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Smitha Nelapati​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 1 kudos
2 More Replies
Anwar_Patel
by New Contributor III
  • 3138 Views
  • 3 replies
  • 0 kudos

Resolved! Not received my certificate after passing Databricks Certified Associate Developer for Apache Spark 3.0 - Python.

I've successfully passed Databricks Certified Associate Developer for Apache Spark 3.0 - Python but still have not received the certificate. E-mail : anwarpatel91@gmail.com

  • 3138 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anwar_Patel
New Contributor III
  • 0 kudos

@Nadia Elsayed​ could you please help me with this issue I need to send my certificate to my team.

  • 0 kudos
2 More Replies
asethia
by New Contributor
  • 5685 Views
  • 1 replies
  • 0 kudos

delta lake in Apache Spark

Hi,As per documentation https://docs.delta.io/latest/quick-start.html , we can configure DeltaCatalog using spark.sql.catalog.spark_catalog.The Iceberg supports two Catalog implementations (https://iceberg.apache.org/docs/latest/spark-configuration/#...

  • 5685 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Arun Sethia​ :Yes, Delta Lake also supports custom catalogs. Delta Lake uses the Spark Catalog API, which allows for pluggable catalog implementations. You can implement your own custom catalog to use with Delta Lake.To use a custom catalog, you can...

  • 0 kudos
Sameer_876675
by New Contributor III
  • 5356 Views
  • 3 replies
  • 2 kudos

How to efficiently process a 100GiB JSON nested file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~1000GiB JSON nested file containing insurance policy data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df=spark.read.option("...

Cluster Summary OOM Error
  • 5356 Views
  • 3 replies
  • 2 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 2 kudos

Hi Sameer, please refer to following documents on how to work with nested json:https://docs.databricks.com/optimizations/semi-structured.htmlhttps://learn.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/scala/nested-json-to-dataframe.html

  • 2 kudos
2 More Replies
rammy
by Contributor III
  • 2736 Views
  • 2 replies
  • 3 kudos

How can we save a data frame in Docx format using pyspark?

  I am trying to save a data frame into a document but it returns saying that the below errorjava.lang.ClassNotFoundException: Failed to find data source: docx. Please find packages at http://spark.apache.org/third-party-projects.htm   #f_d...

  • 2736 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi,You cannot do it from Pyspark, but you can try to use Pandas to save to Excell. There is no Docx

  • 3 kudos
1 More Replies
Smitha1
by Valued Contributor II
  • 4318 Views
  • 6 replies
  • 10 kudos

databricks learning Full screen disabled in selfpaced course

databricks learning self paced course Databricks Certified Associate Developer for Apache Spark 3 has below issues 1) Full screen button is disabled, difficult to see small font and stress on eyes.PFA2) courses do not have captions. Sometimes it is d...

font is very small as full screen is disabled.
  • 4318 Views
  • 6 replies
  • 10 kudos
Latest Reply
Anonymous
Not applicable
  • 10 kudos

Hi @Smitha Nelapati​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us ...

  • 10 kudos
5 More Replies
Jyo777
by Contributor
  • 2201 Views
  • 2 replies
  • 3 kudos

Resolved! Can't do "Full screen" while taking Databricks Apache Spark developer course.

Hi, I see the option for "Full screen" on bottom right but its disabled/inactive. Attached is the screenshot for same.Please advise as its hard to read or see contents on half screen.Thanks

  • 2201 Views
  • 2 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

press F11 button it will become full screen

  • 3 kudos
1 More Replies
MC006
by New Contributor III
  • 7232 Views
  • 4 replies
  • 2 kudos

Resolved! java.lang.NoSuchMethodError after upgrade to Databricks Runtime 11.3 LTS

Hi,  I am using Databricks and want to upgrade to Databricks runtime version 11.3 LTS which uses Spark 3.3 now. Current system enviroment:Operating System: Ubuntu 20.04.4 LTSJava: Zulu 8.56.0.21-CA-linux64Python: 3.8.10Delta Lake: 1.1.0Target system ...

  • 7232 Views
  • 4 replies
  • 2 kudos
Latest Reply
Meghala
Valued Contributor II
  • 2 kudos

Hi everyone this data was helped me thanks ​

  • 2 kudos
3 More Replies
Smitha1
by Valued Contributor II
  • 2188 Views
  • 3 replies
  • 2 kudos

December exam free voucher for Databricks Certified Associate Developer for Apache Spark 3.0 exam.

Dear @Vidula Khanna​  Hope you're having great day. This is of HIGH priority for me, I've to schedule exam in December before slots are full.I gave Databricks Certified Associate Developer for Apache Spark 3.0 exam on 30th Nov but missed by one perc...

  • 2188 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

hey @Smitha Nelapati​ ,you can attend the below webinars and get the 75% off in Jan ​ 

  • 2 kudos
2 More Replies
SIRIGIRI
by Contributor
  • 890 Views
  • 1 replies
  • 1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

  • 890 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Yeah this is really good post,keep it up Man

  • 1 kudos
Labels