cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Public exposure for clusters in SCC enabled workspaces

NadithK
Contributor

Hi,
We are facing a requirement where we need to somehow expose one of our Databricks clusters to an external service. Our organization's cyber team is running a security audit of all of the resource we use and they have some tools which they use to run scans on VMs we use for our work.

Databricks clusters have also come up as our sensitive data goes through them. We have workspaces created with SCC enabled.

Is there a way for us to expose our Databricks clusters so these external tools could access the clusters to run their scans. At least some public IP or hostname that can be exposed. It looks like this is not possible at first glance.
Wondering if anyone else has run into the same requirement and if they have done a workaround or something.

We have our setup in Azure cloud and these cyber tools are running in AWS.

Thanks in advance.

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @NadithK, Exposing Databricks clusters to external services while maintaining security is a common challenge.

Let’s explore a couple of approaches you can consider:

  1. Access via Databricks User Groups and Table Access Control:

    • Create a Databricks User Group for each external consumer (e.g., your cyber team).
    • Enable Table Access Control for your Databricks account.
    • Set up a High-Concurrency cluster and enable Table Access Control for the cluster.
    • Grant the user group permission to attach to the cluster.
    • Create an external master database where you create tables for your data lake files (and optionally create views to transform data into a consumer-friendly schema).
    • Create a separate database with filtered data for each consumer and grant the user group access to objects in that database.
    • Connect to the consumer database from external tools (e.g., Redash, Power BI, Grafana) to access the...1.
  2. Using Databricks-Connect Package:

  3. Azure Databricks with Secure Cluster Connectivity (SCC):

Remember to evaluate these options based on your specific requirements and security policies. Each approach has its trade-offs, so choose the one that best aligns with your organization’s needs. If you have any further questions or need additional guidance, feel free to ask! 😊

 

NadithK
Contributor

Hi @Kaniz ,

Thank you very much for the reply. But I don't think this actually resolves our concern.
All these solutions talk about utilizing the databricks cluster to access/read data in Databricks. They focus on getting to the Databricks data through databricks cli or REST APIs.

We am not concerned with getting to the data.
What we want to achieve is connect to the underlying individual Virtual machines of the clusters and run some scans on those virtual machine. Scan the operating system, open ports, installed libraries and linux configurations of those cluster virtual machines.

We want to consider the databricks clusters as yet another set of virtual machines used in our organization and assess the security of those virtual machines.

I am not sure if this is achievable, but what we are looking for is a way to somehow expose these cluster virtual machines to external tools. For example, then we could may be give the public IPs of the cluster virtual machines to our tools so then can run port scans and such. May be there is another way to do it using privatelinks. Not sure.

Would love to hear if this is possible.

Thanks for the feedback. Really appreciate it.