cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Expected size of managed Storage Accounts

EDDatabricks
Contributor

Dear all,

we are monitoring the size of managed storage accounts associated with our deployed Azure databricks instances.

We have 5 databricks instances for specific components of our platform replicated in 4 environments (DEV, TEST, PREPROD, PROD).

During our analysis we observed Storage Account sizes ranging from some MBytes to a couple of TBytes. Note, that we do not store any production tables on managed storage accounts nor do we upload any form of data. Production instances usually have the larger volume.

Note, that the size is comparable to our production tables (some TBytes).

Our main questions are:

  1. What do these Storage Accounts contain?
  2. What is the best way to reduce the size?
  3. How can we manually delete not important (e.g. logs) files?
  4. Can we automate the process on #3?

Thanks a lot,

Kind regards,
The European Dynamics team

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @EDDatabricksLet’s address your questions regarding Azure-managed storage accounts:

  1. What do these Storage Accounts contain?

    • An Azure storage account contains various data objects, including:
      • Blobs: Used for storing unstructured data like images, videos, and backups.
      • Files: Provides a file system interface for sharing files across VMs.
      • Queues: Used for reliable messaging between components.
      • Tables: A NoSQL data store for semi-structured data.
    • These storage accounts provide a unique namespace accessible globally via HTTP or HTTPS. Data within them is durable, secure, and highly available1.
  2. Best way to reduce the size:

  3. Manual deletion of unimportant files (e.g., logs):

    • You can manually delete files from the storage account using tools like Azure Storage Explorer:
  4. Automating file deletion:

Remember to test any deletion policies in a non-production environment first to ensure they behave as expected. If you have specific retention requirements, consider adjusting the rules accordingly. 🚀🔍

 

seems not a relevant ans, as question is about databricks managed storage account...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group