To calculate the storage cost for Databricks in Azure and view the data being stored and charged, you need to consider both the Databricks compute (DBUs) and the storage resources (such as Azure Data Lake Storage or Blob Storage) linked to your Databricks workspace's managed resource group. Azure provides integrated cost tracking and billing tools that help you monitor, analyze, and manage these expenses efficiently.โ
Calculating Databricks Storage Cost
Storage cost is determined by:
-
The volume of data stored in Azure Data Lake Storage (ADLS) or Blob Storage.
-
The storage tier (hot/cool/archive), with typical costs (example: Azure Blob Storage Hot tier at about $0.022/GB-month and Cool tier around $0.0184/GB-month).โ
-
Data transfer and networking, which incur additional charges for cross-region or outbound data movement.โ
Use the Azure Pricing Calculator to estimate costs by selecting Databricks and configuring the storage options, along with associated compute and network features. The calculator lets you model scenarios and view projected monthly expenses.โ
Viewing and Managing Stored Data
-
Data is stored in the storage account within your Databricks managed resource group. This typically includes DBFS files, managed/unmanaged tables, Unity Catalog volumes, logs, and libraries.โ
-
You can explore stored data using commands like dbutils.fs.ls("dbfs:/") for files and SQL tools (e.g., DESCRIBE DETAIL for tables).โ
-
Use the Azure portal to navigate to the resource group, access the storage account, and manage stored data. Unused files, tables, and volumes can be deleted to reduce storage charges.โ
-
For Unity Catalog volumes, you can also use the Databricks workspace UI to browse, view, and manage files and folders directly.โ
Checking and Deleting Data
-
Go to your Databricks workspace and use either the Files explorer for DBFS or Unity Catalog tab for catalog data to view what's being stored.โ
-
To delete data, use Databricks utilities (e.g., dbutils.fs.rm("dbfs:/my-folder/", True)) or drop SQL tables.
-
In the Azure portal, delete blobs or folders as needed in the linked storage account.
Billing and Cost Split-Up
-
Azure billing is fully integrated, with Databricks showing as a line item in your bill. The bill includes DBU (Databricks Unit) usage, VM costs, storage, and network fees.โ
-
For cost breakdown by resource (compute, storage, etc.), use Azure Cost Management > Cost Analysis, filtering by the specific resource group. This will show storage-related charges and granularity by Databricks cluster, workspace, and more.โ
-
Download detailed daily and monthly usage and costs via the Cost Management + Billing section in the Azure portal.โ
-
For advanced billing analysis and per-cluster breakdowns, assign yourself the "Billing reader" or "Admin" role on the relevant resource group and workspace.โ
Monitoring Databricks Usage
-
Azure Databricks offers system tables for monitoring billable usage, such as costs for clusters, jobs, and SQL warehouses.โ
-
Leverage these tables alongside Azure's Cost Analysis tools to get a full picture of resource consumption and costs.
Recommended Steps
-
Access the Azure portal, go to Cost Management, and filter by your Databricks managed resource group to review charges.โ
-
In Databricks, use workspace command line or UI tools to inspect files, tables, and volumes; delete unnecessary items to optimize cost.โ
-
Use the Azure Pricing Calculator for detailed estimates, and consider system tables or the billing API for granular resource analysis.โ
This approach allows you to fully understand what data is stored, how itโs charged, and how to track, split, and reduce storage costs for Azure Databricks. Every billable resource in Databricksโcompute, storage, networkingโcan be reviewed and managed through Azureโs native management tools and Databricksโ workspace utilities.โ