- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2025 12:38 AM
Hi Community, I recently built some streaming pipelines (Autoloader-based) that extract JSON data from the Data Lake and, after parsing and logging, dump it into the Delta Lake bronze layer. Since these are streaming pipelines, they are supposed to run indefinitely until I deliberately stop them. However, I’ve noticed that the Databricks clusters (All-Purpose Compute) tend to become unstable after a day or two of continuous execution.
To keep things running, I’ve currently implemented an optimizer job that’s scheduled daily to stop the cluster, restart it, and then re-trigger the streaming pipeline.
I feel this might not be a best practice. Could you please suggest what type of clusters are most suitable for streaming jobs/pipelines and what the best practices are for managing streaming systems in Databricks?