Databricks Community

Jreco · ‎02-01-2022

Hi Team,

I'm trying to build a Real-time solution using Databricks and Event hubs.

Something weird happens after a time that the process start.

At the begining the messages flow through the process as expected with this rate:

please, note that the last updated time is 50Sec.

However, after a time, the messages don't flow:

Please note that the last updated time is: 11 hours.

If I restart the job, the messages flow again as expected (even recovering the messages that does not were processed in the las 11 hours, for this case)

This is a graph of the example of the issue:

The last peek was when I restarted the job.

Any idea what could happens?

Hubert-Dudek · ‎02-01-2022

please check is .option("checkpointLocation", "/mnt/your_storage/") specified for structured streaming,
it can also depend what is than done with stream (writeStream),
connection to eventhub is quite straight forward, please verify also flow on Azure side as there we can see streaming messages in real time (go to entities, select Even Hub, in "Futures" select "process data" than "Explore" than "Create")

My blog: https://databrickster.medium.com/

View solution in original post

Hubert-Dudek · ‎02-01-2022

please check is .option("checkpointLocation", "/mnt/your_storage/") specified for structured streaming,
it can also depend what is than done with stream (writeStream),
connection to eventhub is quite straight forward, please verify also flow on Azure side as there we can see streaming messages in real time (go to entities, select Even Hub, in "Futures" select "process data" than "Explore" than "Create")

My blog: https://databrickster.medium.com/

Jreco · ‎02-01-2022

Thanks for your answer @Hubert Dudek ,

Is already specified
What do youn mean with this?
This is the weird part of this, bucause the data is flowing good, but at any time is like the Job stop the reading or somethign like that and if I restart the job, all continues working well

Hubert-Dudek · ‎02-01-2022

I mean that you read stream in some purpose, usually to transform it and write it somewhere. So problem can be not with reading but writing part.

My blog: https://databrickster.medium.com/

Jreco · ‎02-01-2022

I'm assuming that the issue is not the Writing part because the DB does not present any kind of blockers or conflicts.

jose_gonzalez · ‎02-08-2022

hi @Jhonatan Reyes ,

Do you control/limit the max number of events process per trigger in your event hubs? check "maxEventsPerTrigger" or Whats your trigger internal? also, how many partitions are you reading from? whats your sink?