cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

[VNET injection] Container and container subnet

data_bricklayer
New Contributor III

Hi,

I was researching everywhere and could not find the answer. I understand that when workspace is created, it has 2 subnets, host and container. The VM, which runs the Databricks container, is in host subnet, which logically means the container is also in host subnet. Now why Databricks say that the container is in container subnet? It doesn't make any sense to me. Hope some experts could help. Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Technically speaking, neither subnet contains VMs, but NICs. As stated in documentation, each cluster node needs 2 IP addresses:

- one IP address for the host in the host subnet

- and one IP address for the container in the container subnet.

The good illustration of this is in the picture that you've attached. One VM has 2 NICs, one in host subnet, another one in container subnet. 

And why each node requires 2 IPs? For an optimal processing experience Databricks segregates the Spark application traffic from the management traffic to avoid Network contention. Spark application traffic communications between the Driver-Executor and the Executors themselves where the computation if happening. Management traffic includes things such as communication between the Control Plane <-> Data Plane, etc

View solution in original post

4 REPLIES 4

szymon_dybczak
Contributor

Hi @data_bricklayer ,

Public Subnet (host): The public subnet is typically used for resources that need to communicate with the internet or other Azure services. In Azure Databricks, this subnet is used for driver nodes of the clusters that require outbound internet access for various reasons, such as downloading Maven packages.

Private Subnet (container): The private subnet, on the other hand, is used for resources that do not need direct internet access. In Azure Databricks, this subnet is used for worker nodes of the clusters. They communicate with the driver nodes and other Azure services like Azure Blob storage or Azure Data Lake Storage, without needing a direct internet connection.

Of course, when you have SCC enabled them host subnet will not contain public IP.

 

Azure Databricks requires two IP for each cluster node: one IP address for the host in the host subnet and one IP address for the container in the container subnet.

 

Hi @szymon_dybczak ,

Thanks for getting back to me. So what you are saying is that, both subnets have VMs? I thought only Host subnet has VMs, and Container subnet has Databricks runtime container?

Technically speaking, neither subnet contains VMs, but NICs. As stated in documentation, each cluster node needs 2 IP addresses:

- one IP address for the host in the host subnet

- and one IP address for the container in the container subnet.

The good illustration of this is in the picture that you've attached. One VM has 2 NICs, one in host subnet, another one in container subnet. 

And why each node requires 2 IPs? For an optimal processing experience Databricks segregates the Spark application traffic from the management traffic to avoid Network contention. Spark application traffic communications between the Driver-Executor and the Executors themselves where the computation if happening. Management traffic includes things such as communication between the Control Plane <-> Data Plane, etc

hi Slash,

Thanks for the clarification, it's clear now. I always thought that VMs are inside the subnet, which causes the confusions. You have a great day! Cheers.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group