cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your...

SamarthJain
New Contributor II

Hi All,

I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.

Need your help here to understand what happens internally at the "Stream Initializing" phase of the Spark Streaming job that is taking so much time. Below are some more information -

1. This Streaming job(where we are facing an issue) reads data from the silver table and inserts it into the gold table.

2. silver table gets data from another streaming job from the raw table.

3. There is another Batch job that reads from silver and writes to gold(same functionality as a streaming job) that is working without any issue.

4 REPLIES 4

amr
Databricks Employee
Databricks Employee

I am not sure about the exact issue here, but there are some guidelines in general when streaming data from another table. first Make sure the tables that you are streaming form are Delta tables. also the best way to stream changes form a table is through CDF if there are too many updates, or deletes if it is all inserted, then use streaming. One thing you might explore to simplify this is to use Delta Live Table Streaming(<table-name) function.

NK_123
New Contributor II

Hey Samarth

Did your issue resolved. I am facing same issue. Could you help here?

strangan
New Contributor II

I had the same issue this morning. I was able to resolve it by running the notebook on a job cluster as opposed to an all purpose interactive cluster. Also best to set it up to not auto scale. These are Databricks recommendations.

MohsenJ
Contributor

I'm facing the same issue when I try to run this example Create a monitor using the API | Databricks on AWS 
(Inference Lakehouse Monitor regression example notebook). any idea? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group