cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alonisser
by Contributor
  • 5997 Views
  • 7 replies
  • 3 kudos

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...

  • 5997 Views
  • 7 replies
  • 3 kudos
Latest Reply
Leszek
Contributor
  • 3 kudos

@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...

  • 3 kudos
6 More Replies
ImAbhishekTomar
by New Contributor III
  • 9094 Views
  • 7 replies
  • 4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

  • 9094 Views
  • 7 replies
  • 4 kudos
Latest Reply
devmehta
New Contributor III
  • 4 kudos

What event hub namespace you were using?I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic planLet me know if you had anything else. Thanks

  • 4 kudos
6 More Replies
Anonymous
by Not applicable
  • 1582 Views
  • 1 replies
  • 0 kudos
  • 1582 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

In this scenario, the best option would be to have a single readStream reading a source delta table. Since checkpoint logs are controlled when writing to delta tables you would be able to maintain separate logs for each of your writeStreams. I would...

  • 0 kudos
Labels