What is a Databricks Workspace IP Access List?
The Databricks Workspace IP Access List is a security feature that allows administrators to control access to the Databricks workspace by specifying which IP addresses or IP ranges are allowed or denied access. This feature is crucial for enhancing the security of your Databricks environment, especially when working in sensitive or regulated industries
Key Features
- Allows configuration of allow lists and block lists.
- Supports IPv4 and IPv6 address ranges.
- Ensures that unauthorized users outside the specified IP ranges cannot access the workspace.
Why Do We Use It?
The primary reason for implementing an IP access list is security. Here are some scenarios where this feature is indispensable:
- Restrict Unauthorized Access: By allowing only known IP ranges, you reduce the risk of unauthorized access to your data and computations.
- Compliance with Regulations: Many industries, such as finance and healthcare, require strict access controls to comply with data protection regulations.
- Network Segmentation: Organizations often want to ensure that only users within their corporate network or VPN can access sensitive data and resources.
- Auditing and Monitoring: Helps identify and block unexpected IP addresses attempting to access the workspace.
How Does It Operate?
- Definition of Rules: Administrators define a list of IP addresses or CIDR ranges to either allow or block access to the workspace.
- Priority of Rules: Allow rules take precedence over deny rules. If no allow rules match, access is denied by default.
- Propagation: Once configured, the rules are applied to all endpoints of the Databricks workspace, including the web UI, REST APIs, and notebooks.
- Enforcement: Any attempt to access the workspace from an IP not on the allow list will be blocked.
Real-World Use Case
Scenario: Securing Access to a Healthcare Analytics Workspace
A healthcare organization uses Databricks for advanced analytics on patient data. To ensure compliance with HIPAA regulations, they need to secure the workspace. They:
- Allow access only from their corporate VPN, which operates within the IP range 203.0.113.0/24.
- Block all other IP ranges by default.
Using the IP access list, they configure the allow rule for their corporate network and prevent any external unauthorized access.
Implementation
Using REST API
You can configure the IP access list via the Databricks REST API.
1. Authentication
First, generate a Databricks Personal Access Token (PAT) from your workspace.
2. Add an IP Access List
curl -X POST \
-H "Authorization: Bearer <your_pat_token>" \
-H "Content-Type: application/json" \
https://<your-databricks-instance>/api/2.0/ip-access-lists \
-d '{
"label": "Corporate Network",
"list_type": "ALLOW",
"ip_addresses": ["203.0.113.0/24"]
}'
3. Retrieve Current Lists
curl -X GET \
-H "Authorization: Bearer <your_pat_token>" \
https://<your-databricks-instance>/api/2.0/ip-access-lists
4. Remove an IP Access List
curl -X DELETE \
-H "Authorization: Bearer <your_pat_token>" \
https://<your-databricks-instance>/api/2.0/ip-access-lists/<ip_access_list_id>
Using Terraform
You can also use Terraform to manage your Databricks IP access list.
Terraform Code
provider "databricks" {
host = "https://<your-databricks-instance>"
token = var.databricks_pat_token
}
resource "databricks_ip_access_list" "corporate_network" {
label = "Corporate Network"
list_type = "ALLOW"
ip_addresses = [
"203.0.113.0/24"
]
}
Steps
- Save the above configuration as main.tf.
- Initialize Terraform: terraform init
- Apply the configuration: terraform apply
Conclusion
The Databricks Workspace IP Access List is a critical feature for securing your environment, ensuring compliance, and protecting sensitive data. Whether using REST APIs or Terraform, it’s easy to implement and highly effective in controlling access to your Databricks workspace. By leveraging this feature, you can significantly reduce the attack surface and ensure that only authorized users can interact with your Databricks resources.
Ajay Kumar Pandey