cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Streaming Delta Live Tables Cluster Management

Shawn_Eary
Contributor

If I use code like this:

 

-- 8:56
-- https://youtu.be/PIFL7W3DmaY?si=MWDSiC_bftoCh4sH&t=536     
CREATE STREAMING LIVE TABLE report
AS SELECT * 
FROM cloud_files("/mydata", "json")

 

To create a STREAMING Delta Live Table though the Workflows Section of Databricks, will the clusters associated with the STREAMING Delta Live Table stay on 24 hours a day, 7 days a week and get me a REALLY BIG bill?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Shawn_Eary, When creating a STREAMING Delta Live Table through the Workflows section of Databricks, it’s essential to understand the associated costs and resource usage. 

 

Let’s break it down:

 

Delta Live Tables (DLT) Pricing:

Cluster Usage:

  • When you create a streaming Delta Live Table, Databricks automatically manages the underlying compute resources (clusters) for you.
  • The clusters associated with your DLT pipelines will be provisioned and terminated dynamically based on workload.
  • They won’t stay on 24/7 unless explicitly configured to do so.

Dynamic Cluster Lifecycle:

  • Databricks dynamically provisions clusters when needed (e.g., during data ingestion, processing, or serving).
  • Clusters are terminated when idle to avoid unnecessary costs.
  • This dynamic lifecycle ensures efficient resource utilization and cost savings.

Resource Costs:

  • While clusters are active, you’ll incur costs based on the cluster configuration (e.g., number of nodes, instance types, memory, cores).
  • The actual cost depends on your workload, data volume, and query complexity.
  • If you’re using auto-scaling, costs adjust dynamically based on demand.

Monitoring and Optimization:

  • Monitor your cluster usage and adjust resources as needed.
  • Use the Databricks UI or APIs to track resource consumption and optimize performance.
  • Consider using smaller clusters during off-peak hours to reduce costs.

Billing Considerations:

  • DLT pricing includes both compute (DBUs) and storage costs.
  • Storage costs depend on the amount of data stored in Delta tables.
  • Compute costs depend on the cluster usage (active hours, node types, etc.).

In summary, the clusters associated with your streaming Delta Live Table won’t run 24/7 by default. They’ll be provisioned as needed and terminated when idle. To avoid a “REALLY BIG bill,” monitor your usage, optimize resources, and choose the appropriate DLT edition based on your requirements.

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Shawn_Eary, When creating a STREAMING Delta Live Table through the Workflows section of Databricks, it’s essential to understand the associated costs and resource usage. 

 

Let’s break it down:

 

Delta Live Tables (DLT) Pricing:

Cluster Usage:

  • When you create a streaming Delta Live Table, Databricks automatically manages the underlying compute resources (clusters) for you.
  • The clusters associated with your DLT pipelines will be provisioned and terminated dynamically based on workload.
  • They won’t stay on 24/7 unless explicitly configured to do so.

Dynamic Cluster Lifecycle:

  • Databricks dynamically provisions clusters when needed (e.g., during data ingestion, processing, or serving).
  • Clusters are terminated when idle to avoid unnecessary costs.
  • This dynamic lifecycle ensures efficient resource utilization and cost savings.

Resource Costs:

  • While clusters are active, you’ll incur costs based on the cluster configuration (e.g., number of nodes, instance types, memory, cores).
  • The actual cost depends on your workload, data volume, and query complexity.
  • If you’re using auto-scaling, costs adjust dynamically based on demand.

Monitoring and Optimization:

  • Monitor your cluster usage and adjust resources as needed.
  • Use the Databricks UI or APIs to track resource consumption and optimize performance.
  • Consider using smaller clusters during off-peak hours to reduce costs.

Billing Considerations:

  • DLT pricing includes both compute (DBUs) and storage costs.
  • Storage costs depend on the amount of data stored in Delta tables.
  • Compute costs depend on the cluster usage (active hours, node types, etc.).

In summary, the clusters associated with your streaming Delta Live Table won’t run 24/7 by default. They’ll be provisioned as needed and terminated when idle. To avoid a “REALLY BIG bill,” monitor your usage, optimize resources, and choose the appropriate DLT edition based on your requirements.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!