Data Engineering

Forum Posts

Sorted by:

by ImAbhishekTomar • New Contributor III

10-07-2022 6:45:43 AM

10753 Views
7 replies
4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

Data Engineering

10753 Views
7 replies
4 kudos

10-07-2022 6:45:43 AM

View Replies

Latest Reply

devmehta
New Contributor III

09-10-2024 2:54:28 AM

4 kudos

What event hub namespace you were using?I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic planLet me know if you had anything else. Thanks

4 kudos

09-10-2024 2:54:28 AM

6 More Replies

by Anonymous • Not applicable

03-28-2023 4:52:26 PM

3764 Views
1 replies
0 kudos

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then create...

Data Engineering

3764 Views
1 replies
0 kudos

03-28-2023 4:52:26 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-04-2023 6:17:14 PM

0 kudos

@vikashk84The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you ...

0 kudos

04-04-2023 6:17:14 PM

by zak • New Contributor II

02-16-2023 11:03:51 AM

4364 Views
1 replies
1 kudos

add custom metadata to avro file with pyspark

Hello, i need to add a custom metadata into a avro file. The avro file containt data. we have tried to use "option" within the write function but it's not taken without generated any error.df.write.format("avro").option("avro.codec", "snappy").option...

Data Engineering

4364 Views
1 replies
1 kudos

02-16-2023 11:03:51 AM

View Replies

by SIRIGIRI • Contributor

12-26-2022 8:07:02 AM

922 Views
2 replies
2 kudos

sharikrishna26.medium.com

Spark Dataframe MetadataSpark Dataframe is structurally the same as the table. However, it does not store any schema information in the metadata store. Instead, we have a runtime metadata catalog to store the Dataframe schema information. It is simil...

Data Engineering

922 Views
2 replies
2 kudos

12-26-2022 8:07:02 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-26-2022 5:00:51 PM

2 kudos

this is awesome thanks

2 kudos

12-26-2022 5:00:51 PM

1 More Replies

by User16826994223 • Honored Contributor III

06-25-2021 8:48:50 AM

2082 Views
1 replies
0 kudos

Resolved! Triggering clean-ups in Spark to handle accumulated metadata

Data Engineering

2082 Views
1 replies
0 kudos

06-25-2021 8:48:50 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 8:49:20 AM

0 kudos

setting the parameter ‘spark.cleaner.ttl’ or by dividing the long running jobs into different batches and writing the intermediary results to the disk.

0 kudos

06-25-2021 8:49:20 AM

by User16826992666 • Valued Contributor

06-16-2021 7:41:25 PM

1656 Views
1 replies
0 kudos

Resolved! How much space does the metadata for a Delta table take up?

If you have a lot of transactions in a table it seems like the Delta log keeping track of all those transactions would get pretty large. Does the size of the metadata become a problem over time?

Data Engineering

1656 Views
1 replies
0 kudos

06-16-2021 7:41:25 PM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

06-18-2021 2:07:09 PM

0 kudos

Yes, the size of the metadata can become a problem over time but not because of performance but because of storage costs. Delta performance will not degrade due to the size of the metadata, but your cloud storage bill can increase. By default Delta h...

0 kudos

06-18-2021 2:07:09 PM

by olisch • New Contributor

09-26-2019 3:37:08 AM

22248 Views
3 replies
0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

Data Engineering

22248 Views
3 replies
0 kudos

09-26-2019 3:37:08 AM

View Replies

Latest Reply

saravananraju
New Contributor II

09-03-2020 3:41:16 PM

0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

0 kudos

09-03-2020 3:41:16 PM

2 More Replies

Databricks Community

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

add custom metadata to avro file with pyspark

sharikrishna26.medium.com

Resolved! Triggering clean-ups in Spark to handle accumulated metadata

Resolved! How much space does the metadata for a Delta table take up?

Spark: How to simultaneously read from and write to the same parquet file