Data Engineering

Forum Posts

Sorted by:

by alonisser • Contributor II

02-22-2022 6:55:46 AM

7374 Views
7 replies
3 kudos

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...

Data Engineering

7374 Views
7 replies
3 kudos

02-22-2022 6:55:46 AM

View Replies

Latest Reply

Leszek
Contributor

09-08-2022 4:33:51 AM

3 kudos

@Jose Gonzalez thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...

3 kudos

09-08-2022 4:33:51 AM

6 More Replies

by ImAbhishekTomar • New Contributor III

10-07-2022 6:45:43 AM

10753 Views
7 replies
4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

Data Engineering

10753 Views
7 replies
4 kudos

10-07-2022 6:45:43 AM

View Replies

Latest Reply

devmehta
New Contributor III

09-10-2024 2:54:28 AM

4 kudos

What event hub namespace you were using?I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic planLet me know if you had anything else. Thanks

4 kudos

09-10-2024 2:54:28 AM

6 More Replies

by Anonymous • Not applicable

06-17-2021 11:34:08 AM

2100 Views
1 replies
0 kudos

Resolved! Is it possible to write single Spark stream to 2 different Delta tables? Recommendations around that?

Data Engineering

2100 Views
1 replies
0 kudos

06-17-2021 11:34:08 AM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

06-17-2021 12:36:20 PM

0 kudos

In this scenario, the best option would be to have a single readStream reading a source delta table. Since checkpoint logs are controlled when writing to delta tables you would be able to maintain separate logs for each of your writeStreams. I would...

0 kudos

06-17-2021 12:36:20 PM

Databricks Community

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

Resolved! Is it possible to write single Spark stream to 2 different Delta tables? Recommendations around that?