08-05-2024 06:43 AM
Hi,
I was researching everywhere and could not find the answer. I understand that when workspace is created, it has 2 subnets, host and container. The VM, which runs the Databricks container, is in host subnet, which logically means the container is also in host subnet. Now why Databricks say that the container is in container subnet? It doesn't make any sense to me. Hope some experts could help. Thanks in advance.
08-05-2024 03:25 PM - edited 08-05-2024 03:27 PM
Technically speaking, neither subnet contains VMs, but NICs. As stated in documentation, each cluster node needs 2 IP addresses:
- one IP address for the host in the host subnet
- and one IP address for the container in the container subnet.
The good illustration of this is in the picture that you've attached. One VM has 2 NICs, one in host subnet, another one in container subnet.
And why each node requires 2 IPs? For an optimal processing experience Databricks segregates the Spark application traffic from the management traffic to avoid Network contention. Spark application traffic communications between the Driver-Executor and the Executors themselves where the computation if happening. Management traffic includes things such as communication between the Control Plane <-> Data Plane, etc
08-05-2024 07:58 AM - edited 08-05-2024 08:00 AM
Hi @data_bricklayer ,
Public Subnet (host): The public subnet is typically used for resources that need to communicate with the internet or other Azure services. In Azure Databricks, this subnet is used for driver nodes of the clusters that require outbound internet access for various reasons, such as downloading Maven packages.
Private Subnet (container): The private subnet, on the other hand, is used for resources that do not need direct internet access. In Azure Databricks, this subnet is used for worker nodes of the clusters. They communicate with the driver nodes and other Azure services like Azure Blob storage or Azure Data Lake Storage, without needing a direct internet connection.
Of course, when you have SCC enabled them host subnet will not contain public IP.
Azure Databricks requires two IP for each cluster node: one IP address for the host in the host subnet and one IP address for the container in the container subnet.
08-05-2024 12:41 PM
Hi @szymon_dybczak ,
Thanks for getting back to me. So what you are saying is that, both subnets have VMs? I thought only Host subnet has VMs, and Container subnet has Databricks runtime container?
08-05-2024 03:25 PM - edited 08-05-2024 03:27 PM
Technically speaking, neither subnet contains VMs, but NICs. As stated in documentation, each cluster node needs 2 IP addresses:
- one IP address for the host in the host subnet
- and one IP address for the container in the container subnet.
The good illustration of this is in the picture that you've attached. One VM has 2 NICs, one in host subnet, another one in container subnet.
And why each node requires 2 IPs? For an optimal processing experience Databricks segregates the Spark application traffic from the management traffic to avoid Network contention. Spark application traffic communications between the Driver-Executor and the Executors themselves where the computation if happening. Management traffic includes things such as communication between the Control Plane <-> Data Plane, etc
08-06-2024 01:58 AM
hi Slash,
Thanks for the clarification, it's clear now. I always thought that VMs are inside the subnet, which causes the confusions. You have a great day! Cheers.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group