cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alonisser
by Contributor
  • 3332 Views
  • 6 replies
  • 3 kudos

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...

  • 3332 Views
  • 6 replies
  • 3 kudos
Latest Reply
Leszek
Contributor
  • 3 kudos

@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...

  • 3 kudos
5 More Replies
ImAbhishekTomar
by New Contributor III
  • 5355 Views
  • 6 replies
  • 4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

  • 5355 Views
  • 6 replies
  • 4 kudos
Latest Reply
Zainaboladokun
New Contributor III
  • 4 kudos

BIU$I

  • 4 kudos
5 More Replies
Anonymous
by Not applicable
  • 826 Views
  • 1 replies
  • 0 kudos
  • 826 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

In this scenario, the best option would be to have a single readStream reading a source delta table. Since checkpoint logs are controlled when writing to delta tables you would be able to maintain separate logs for each of your writeStreams. I would...

  • 0 kudos
Labels