cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your...

SamarthJain
New Contributor II

Hi All,

I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.

Need your help here to understand what happens internally at the "Stream Initializing" phase of the Spark Streaming job that is taking so much time. Below are some more information -

1. This Streaming job(where we are facing an issue) reads data from the silver table and inserts it into the gold table.

2. silver table gets data from another streaming job from the raw table.

3. There is another Batch job that reads from silver and writes to gold(same functionality as a streaming job) that is working without any issue.

4 REPLIES 4

amr
Valued Contributor
Valued Contributor

I am not sure about the exact issue here, but there are some guidelines in general when streaming data from another table. first Make sure the tables that you are streaming form are Delta tables. also the best way to stream changes form a table is through CDF if there are too many updates, or deletes if it is all inserted, then use streaming. One thing you might explore to simplify this is to use Delta Live Table Streaming(<table-name) function.

NK_123
New Contributor II

Hey Samarth

Did your issue resolved. I am facing same issue. Could you help here?

strangan
New Contributor II

I had the same issue this morning. I was able to resolve it by running the notebook on a job cluster as opposed to an all purpose interactive cluster. Also best to set it up to not auto scale. These are Databricks recommendations.

MohsenJ
Contributor

I'm facing the same issue when I try to run this example Create a monitor using the API | Databricks on AWS 
(Inference Lakehouse Monitor regression example notebook). any idea? 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!