Databricks Community

jakubk · ‎12-08-2024

Hi All,

I'm after some guidance on how to identify massive (100000%) spikes in bandwidth usage (and related costs) in the azure databricks provisioned/managed resource group storage account & stop them

These blips are adding 30-50% to our monthly costs. This is creating uncertainty in the organisation as we can't accurately predict our usage costs.

Our environment is still very much in the POC stage

I believe it's related to the geo-redundancy and databricks copying massive amounts of data to another region even when the clusters are turned off

I can't dig deeper because "system deny" security settings created by databtricks when provisioning the app lock me out even though I'm the subscription admin

(Azure support have not been helpful - just repeating that "yes, these costs were incurred by the databricks managed storage account")

This is weird because I use externally managed tables for everything linked to a storage account I control. I don't use DBFS for anything, especially not 1000s of gigs worth of data.

1.what is happening?

- the timestamps are on the weekend/overnight - our sql service and clusters are off outside business hours and are set to autostop & doesn't align to custers being turned on or any jobs - all of which are relatively simple and small

our 'raw data/datalake' is 2.9gig

the unity catalog + databricks tables are 54gig

2. why is this happening?

3. what do i do so it doesn't happen in the future?

-werners- · ‎12-17-2024

What exactly do you mean by the 'azure databricks provisioned/managed resource group storage account'?
The one that is linked to your UC metastore?

Alberto_Umana · ‎12-17-2024

Hi @jakubk,

Have you opened a case with MSFT? this looks to required a more in-depth analysis. Do you have an active support team with us?

jakubk · ‎12-17-2024

@-werners-

No, it's the 'databricks workspace' environment that's automatically created when you provision a new databricks workspace in azure. The resource group and storage account are locked down to the databricks service acc only. everyone else including the azure global admin are locked out

@Alberto_Umana

Yes, but the support i received was limited to 'it was caused by databricks - talk to your databricks admin (me) and turn on logging on the storage acc (not possible as because of the 'system deny' permissions which lock me out despite me being the global azure/subscription admin)

I've reached out to databricks support (Ticket Number: #00571565)

but haven't heard anything back

-werners- · ‎12-17-2024

That is strange, I think something went wrong in the deployment of the Databricks environment.
We have 3 databricks workspaces and do not see such behavior.
Any resource deployed on a subscription owned by you is accessible by some kind of admin. There should not be any locked down resource.

jakubk · ‎12-19-2024

@-werners- wrote:
That is strange, I think something went wrong in the deployment of the Databricks environment.
We have 3 databricks workspaces and do not see such behavior.
Any resource deployed on a subscription owned by you is accessible by some kind of admin. There should not be any locked down resource.

https://stackoverflow.com/questions/73064767/how-to-override-deny-assignment-so-that-i-can-access-th...

You can't do this on the managed resource group created by Azure Databricks even if you're owner - it's a resource managed by Databricks

@Alberto_Umana

thanks

-werners- · ‎12-19-2024

OK I understand, the resource group with the workers etc.
Indeed the content of that storage account is not accessible unfortunately.
But this is indeed something to let databricks look into.
Here is the used capacity of our prod env (data lake of 10TB-ish capacity):

Yours seems way off.
Keep us updated.

Alberto_Umana · ‎12-18-2024

Thanks for opening a case with us, we will have a look at it.

Databricks Community

unknown geo redundancy storage events (& costs) in azure databricks resource group

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples