2 weeks ago
Hello ,
We have configured our Databricks environment with private endpoint connections injected into our VNET, which includes two subnets (public and private). We have disabled public IPs and are using Network Security Groups (NSGs) on the subnet, as suggested by Microsoft. Additionally, we have a private endpoint for our Azure Data Lake Storage account, where our tables are created, and this storage is located within the same VNET.
We also utilize a private endpoint for authentication in a separate VNET that has been successfully peered with our main VNET. Currently, our developers are running shared or job compute clusters to create tables by transferring data from Storage Account A to Storage Account B.
However, we are seeing many ingress traffic that we believe should not be occurring. Given that both the cluster and the boths storage accounts are in the same VNET, my understanding is that there should be no costs associated with ingress/egress traffic between these resources.
Could you please provide guidance on whether other teams have encountered similar issues? Additionally, any insights into how we might resolve this unexpected ingress traffic would be greatly appreciated.
Thank you for your assistance.
2 weeks ago
Are all the resources created within the same region? If there is any cross-region traffic, even within the same VNET, it could incur additional costs.
2 weeks ago
Hi @Fkebbati ,
There always be some costs related to data transfer between those account. Let's have a look at private link pricing page. So it's expected, but MS likes to hide this kind of information 🙂
2 weeks ago
All in same regions actually , i just ran this there 3 minutes , this job workflow to read from strorage A and create table in storage B all same vnet same region, when i sort ressource cost by job tag it classified as databricks cost ,
2 weeks ago
I see this article : Azure virtual network service endpoints | Microsoft Learn
i'm guessing if i move from private link to Virtual network service endpoint that could be a good replacement to reduce the cost
2 weeks ago
@Fkebbati
First, traffic cost in Azure are not reported as a separate Resource Type, but appended to main resource causing the traffic. If you want to distinguish them use for instance Service Name. In this case traffic cost is appended to Databricks and not Storage Accounts.
Cost for network traffic for Databricks and Storage Account with Private Endpoints is not a trivial case.
Simple use of Databricks clusters to read and write data over Private Endpoint incurs Inbound and Outbound cost. Those cost are not some in volume as @szymon_dybczak mentioned, but in my experience can double overall Databricks cost, and will scale as traffic volume changes.
Also from experience Inbound cost will be much higher than Outbound cost. Circa 6 times more inbound. I imagine that Databricks/Spark makes a big overhead of data read, and to put data to worker nodes, or reading entire Delta Lake parquets. Read a lot, write some, i imagine.
Also it is worth reminding that if you do work between different peered Vnets you will be charged with peering transit Private Link cost. VMs of worker nodes simply do not have a Private Endpoint and storage account in your case do.
And this is all regarding transfer within a single region.
If you need to use Private Endpoint you need to accept those extra transfer cost.
Alternative to get some security, and avoid transit cost are Service Endpoint as you mentioned.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group