cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ImAbhishekTomar
by New Contributor III
  • 9900 Views
  • 7 replies
  • 4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

  • 9900 Views
  • 7 replies
  • 4 kudos
Latest Reply
devmehta
New Contributor III
  • 4 kudos

What event hub namespace you were using?I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic planLet me know if you had anything else. Thanks

  • 4 kudos
6 More Replies
Anonymous
by Not applicable
  • 3447 Views
  • 1 replies
  • 0 kudos

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then create...

Image
  • 3447 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@vikashk84The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you ...

  • 0 kudos
zak
by New Contributor II
  • 4213 Views
  • 1 replies
  • 1 kudos

add custom metadata to avro file with pyspark

Hello, i need to add a custom metadata into a avro file. The avro file containt data. we have tried to use "option" within the write function but it's not taken without generated any error.df.write.format("avro").option("avro.codec", "snappy").option...

  • 4213 Views
  • 1 replies
  • 1 kudos
SIRIGIRI
by Contributor
  • 863 Views
  • 2 replies
  • 2 kudos

sharikrishna26.medium.com

Spark Dataframe MetadataSpark Dataframe is structurally the same as the table. However, it does not store any schema information in the metadata store. Instead, we have a runtime metadata catalog to store the Dataframe schema information. It is simil...

  • 863 Views
  • 2 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

this is awesome thanks

  • 2 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 1904 Views
  • 1 replies
  • 0 kudos
  • 1904 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

setting the parameter ‘spark.cleaner.ttl’ or by dividing the long running jobs into different batches and writing the intermediary results to the disk.

  • 0 kudos
User16826992666
by Valued Contributor
  • 1541 Views
  • 1 replies
  • 0 kudos

Resolved! How much space does the metadata for a Delta table take up?

If you have a lot of transactions in a table it seems like the Delta log keeping track of all those transactions would get pretty large. Does the size of the metadata become a problem over time?

  • 1541 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Yes, the size of the metadata can become a problem over time but not because of performance but because of storage costs. Delta performance will not degrade due to the size of the metadata, but your cloud storage bill can increase. By default Delta h...

  • 0 kudos
olisch
by New Contributor
  • 21306 Views
  • 3 replies
  • 0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

  • 21306 Views
  • 3 replies
  • 0 kudos
Latest Reply
saravananraju
New Contributor II
  • 0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

  • 0 kudos
2 More Replies
Labels