- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 07:38 AM
@dmytro,
Autoscaling is managed by Databricks and it's logic is mostly automatic. But If you're planning on structured streaming for production I suggest you to go for a fixed amount of workers and limiting your streaming query input rate or create a DLT pipeline that uses enhanced autoscaling.
This doc covers the production considerations for structured streaming workloads: https://docs.databricks.com/en/structured-streaming/production.html.
As mentioned in the docs above, when working with compute auto-scaling, the auto-scaling algorithm will have some difficulties scaling down for structured streaming workloads:
Compute auto-scaling has limitations scaling down cluster size for Structured Streaming workloads. Databricks recommends using Delta Live Tables with Enhanced Autoscaling for streaming workloads. See Optimize the cluster utilization of Delta Live Tables pipelines with Enhanced Autoscaling.
Compute auto-scaling docs: https://docs.databricks.com/en/compute/configure.html#benefits-of-autoscaling
Raphael Balogo
Sr. Technical Solutions Engineer
Databricks