cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Expected size of managed Storage Accounts

EDDatabricks
Contributor

Dear all,

we are monitoring the size of managed storage accounts associated with our deployed Azure databricks instances.

We have 5 databricks instances for specific components of our platform replicated in 4 environments (DEV, TEST, PREPROD, PROD).

During our analysis we observed Storage Account sizes ranging from some MBytes to a couple of TBytes. Note, that we do not store any production tables on managed storage accounts nor do we upload any form of data. Production instances usually have the larger volume.

Note, that the size is comparable to our production tables (some TBytes).

Our main questions are:

  1. What do these Storage Accounts contain?
  2. What is the best way to reduce the size?
  3. How can we manually delete not important (e.g. logs) files?
  4. Can we automate the process on #3?

Thanks a lot,

Kind regards,
The European Dynamics team

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @EDDatabricksLet’s address your questions regarding Azure-managed storage accounts:

  1. What do these Storage Accounts contain?

    • An Azure storage account contains various data objects, including:
      • Blobs: Used for storing unstructured data like images, videos, and backups.
      • Files: Provides a file system interface for sharing files across VMs.
      • Queues: Used for reliable messaging between components.
      • Tables: A NoSQL data store for semi-structured data.
    • These storage accounts provide a unique namespace accessible globally via HTTP or HTTPS. Data within them is durable, secure, and highly available1.
  2. Best way to reduce the size:

  3. Manual deletion of unimportant files (e.g., logs):

    • You can manually delete files from the storage account using tools like Azure Storage Explorer:
  4. Automating file deletion:

Remember to test any deletion policies in a non-production environment first to ensure they behave as expected. If you have specific retention requirements, consider adjusting the rules accordingly. 🚀🔍

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.