Traditional warehouse administrators face several challenges: streamlining operations and security, improving efficiency in high concurrency and low latency environments, reducing costs and overhead associated with cluster setup and under-utilization, maintaining systems, and minimizing dependency on cloud providers.
Databricks Serverless SQL addresses these issues and brings many benefits, including enhanced productivity, efficiency, and simplicity in data analytics operations. Even though setting up Serverless SQL on Databricks does not pose any significant technical challenges, a common question we hear is, “what are the security aspects of SQL Serverless”?
In this blog, we will walk you through the possible setup scenarios you can use when enabling Serverless warehouses in your account and the different security considerations. Before enabling Serverless, remember the prerequisites for your cloud (AWS / Azure). Databricks Serverless SQL is available in Azure and AWS in documented regions.
Table of Contents
The serverless capability provides instantaneous and elastic compute resources, improving the customer experience as the infrastructure is available precisely when needed. This optimal performance is achieved through machine learning algorithms that provision and scale compute resources based on usage, eliminating the need for manual cluster management and shutdowns. The automatic up- and down-scaling of compute also minimizes costs associated with unnecessary idle time.
Moreover, it does not require network address allocation. This shift eliminates the burden of capacity management, patching, upgrading, and performance optimization of clusters, allowing users to focus solely on their data and insights without worrying about infrastructure management.
The simplified pricing model ensures a single bill to track costs efficiently. Furthermore, the platform continuously enhances performance and reduces costs through predictive I/O optimizations and persistent results caching features, with remote result cache for all analytics use cases, making it the simplest way to securely utilize the Databricks Data Intelligence Platform.
As part of Databricks Serverless SQL, security is an essential topic to keep in mind, and it is beneficial to have a good understanding of these concepts during your serverless setup. When talking about the security aspect of serverless SQL, we can speak to workload isolation - per cluster, workspace, and customer; secure network access to the data; and hardening of infrastructure.
Unity Catalog in Databricks ensures data security through centralized and granular access control over data assets, and data isolation. It also maintains secure data permissions and provides auditing and lineage capabilities. These measures collectively ensure that users can only access and query data they are entitled to in compliance with industry standards.
Below is an overview of the main features provided in our Serverless SQL architecture.
For more details, refer to serverless security and serverless computing (AWS/Azure).
This section will help you navigate the different decision points and methodologies for enabling Serverless connectivity for your Databricks SQL Warehouses. Depending on your setup and the company’s network security requirements, you might need to reconfigure several connectivity configurations, including your storage (i.e., S3 or ADLS).
Before diving into the specifics, here are the high-level steps you will need to take for the upgrade:
For AWS, check here.
For Azure, check here.
For Databricks SQL Serverless to work, all cloud storage objects that Databricks communicates with will need to be configured to allow for Serverless compute.
If your storage account is enabled for public network access (i.e., the <<Enabled from all networks>> option is selected under Networking > Public network access), there are no configuration changes needed, as Serverless SQL will work out of the box.
If your storage account is behind a Firewall (i.e., the <<Enabled from selected virtual networks and IP addresses>> option is selected under Networking > Public network access), you will need to configure your Azure Storage Firewall according to the public documentation.
If your storage account is private (i.e., the <<Disabled>> option is selected under Networking > Public network access), you will need to perform the steps described in the public documentation for Serverless Private Link*.
*At the time of writing this blog, this feature is still in Gated Public Preview, which requires you to contact your Databricks representative to have you enrolled in the program.
If you are using Gateway Endpoints for your AWS setup, you will need to perform the steps described in the public documentation for Gateway Endpoints.
Other options, such as private connectivity (i.e., Private Link) are not possible at the time of writing this blog post.
Just like with the cloud storage objects, the Unity Catalog metastore service might need to be reconfigured to allow for serverless connectivity.
If you choose one of the private Serverless connectivity options in Databricks, you must create NCC entities in your Account. NCCs, or Network Connectivity Configurations in Databricks are used to manage serverless network connectivity.
Account administrators must create them in the account console, and they can be attached to one or more workspaces. Below are a few considerations when choosing how to configure your NCCs:
Unlock the full potential of your data analytics with Databricks Serverless SQL. You will get enhanced productivity, efficiency, and simplicity as you focus on insights, not infrastructure. With automatic scaling and fully managed capabilities, Serverless SQL will empower you to harness the power of the Databricks Data Intelligence Platform securely and cost-effectively. We invite you to try Serverless SQL today and get started on AWS or Azure.
If you need guidance or have questions, our expert team at Databricks is ready to assist you. Join our community to share your experiences, learn from others, and discover new possibilities. Start your journey now, and let us help you uncover the true value of your data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.