Hi All,
I'm after some guidance on how to identify massive (100000%) spikes in bandwidth usage (and related costs) in the azure databricks provisioned/managed resource group storage account & stop them
These blips are adding 30-50% to our monthly costs. This is creating uncertainty in the organisation as we can't accurately predict our usage costs.
Our environment is still very much in the POC stage
I believe it's related to the geo-redundancy and databricks copying massive amounts of data to another region even when the clusters are turned off
I can't dig deeper because "system deny" security settings created by databtricks when provisioning the app lock me out even though I'm the subscription admin
(Azure support have not been helpful - just repeating that "yes, these costs were incurred by the databricks managed storage account")
This is weird because I use externally managed tables for everything linked to a storage account I control. I don't use DBFS for anything, especially not 1000s of gigs worth of data.
1.what is happening?
- the timestamps are on the weekend/overnight - our sql service and clusters are off outside business hours and are set to autostop & doesn't align to custers being turned on or any jobs - all of which are relatively simple and small
our 'raw data/datalake' is 2.9gig
the unity catalog + databricks tables are 54gig
2. why is this happening?
3. what do i do so it doesn't happen in the future?