cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
dgomezm
Databricks Employee
Databricks Employee

Many customers have been eager to adopt serverless compute on Databricks, attracted by the promise of eliminating infrastructure management, gaining elastic scalability, and simplifying the user experience for their teams. Unlocking the full value of serverless involves rethinking environment configuration, cost governance, and dependency management to ensure a smooth transition. 

In this post, I’ll walk through a practical playbook for what a serverless migration actually entails, so you can approach these migrations with the right tools and considerations in hand.


Before getting started...

It’s worth taking a moment to understand how serverless compute differs from classic compute—because many of the operational and financial changes stem from that. 

With classic compute, workloads run on clusters directly provisioned within the customer’s own cloud account. This means that the customer is responsible for choosing cluster size, runtime version, and the specific instance types to spin up. As the job runs, Databricks charges DBUs for managing and orchestrating the workload, while the cloud provider bills separately for the underlying infrastructure being used.

Serverless flips this paradigm. Compute resources run fully within Databricks-managed cloud accounts. Databricks takes care of provisioning, scaling, and managing the underlying infrastructure behind the scenes — from the user’s perspective, it just works.

This shift introduces a few important differences worth highlighting, especially around how compute is billed and governed.

  • You only pay for the time your workload is actually running.

There's no separate charge for idle clusters or VMs sitting idle between jobs. Billing starts when the workload starts processing and stops when it completes. Note. Pricing details may vary across features. For a full breakdown of pricing rates, refer to the official pricing page  

  • No virtual machine costs to manage.

With serverless, the $DBU rate includes both the compute resources and the operational costs. While it may appear higher than classic compute, it's fully inclusive. 

To put it into perspective: 

If a job runs on classic compute for 5 minutes of processing but requires several minutes of startup time and stays up for idle time after completion, you’re billed for all of that: startup, processing, idle—both in $DBU and VM charges.

With serverless, you’d only be billed for the 5 minutes of actual processing time. Startup happens faster (since Databricks-managed warm capacity pools), and you’re not charged for any idle or startup time.

This difference is what enabled many customers to see significant cost efficiencies—if the proper controls are in place. Which brings us to why cost governance is such an important part of a successful serverless adoption.


Laying the Groundwork: Cost Controls

Before even thinking about moving into serverless, it’s important to understand how to govern its costs—so you don’t end up surprised by unexpected charges later on. 

One key difference with serverless compute is that once it’s enabled in a workspace, it’s immediately available to all users by default. Since serverless operates within Databricks-managed cloud accounts, any existing compute policies that govern classic compute—such as cluster policies—don’t automatically apply to serverless workloads. 

While this flexibility allows teams to get up and running quickly, it also means that organizations need to establish clear cost controls upfront to ensure that serverless consumption aligns with their financial and operational expectations.

Let's make this more concrete by walking through an example.

In this case, serverless compute is being enabled in a development workspace used by three groups: data engineers, data scientists, and analysts. Each persona has different workload patterns and resource needs: 

  • Data engineers primarily run batch pipelines and ETL workloads.
  • Data scientists have started experimenting with deep learning models using newly released GPU Serverless instances.
  • Analysts mostly work with ad-hoc queries and dashboards using SQL Serverless.

Without proper guardrails, it would be easy for serverless consumption to scale unpredictably — particularly as GPU Serverless workloads can generate larger costs if left unchecked.

Budgets

The Budgets feature allows administrators to define spending thresholds and monitor usage across workspaces. In this case, we can configure two separate budgets within the development workspace:

  • One budget for data scientists, who will be running GPU Serverless workloads, with a higher threshold to give them room to experiment. It will apply to any policies tagged with identifiers related to the data science team—for example, tags containing data-science or ds-team.

 

  • A second, smaller budget for data engineers and analysts, whose serverless workloads tend to be more predictable and stable. This budget will apply to jobs or compute resources tagged with identifiers like data-engineer or analytics

Budgets give us visibility into spend as it grows, and allow us to configure alerts when usage approaches defined thresholds. These budgets serve as proactive guardrails, helping avoid surprises and giving administrators early signals before costs escalate unexpectedly. Budgets can be configured at both workspace and account levels, allowing flexibility in how spend is tracked and controlled across environments.

budget.gif

Serverless Budget Policies

With budgets in place, the next step is to enforce Budget Policies for serverless compute to automatically apply cost attribution tags and guard spending across different teams.

Serverless budget policies allow administrators to define one or more tag key-value pairs that get automatically applied to any serverless compute activity initiated by users or jobs assigned to the policy. As users launch serverless compute, these tags are automatically attached behind the scenes, ensuring accurate cost attribution and reliable billing records for all serverless workloads. 

The tags then surface directly in the system billing tables (for example, system.billing.usage → custom_tags), making it easy to track usage and attribute costs to specific teams, personas, or business units as needed.

In our development workspace scenario, we’ll create budget policies aligned to each persona by applying tags: 

  • team=data-science for data scientists
  • team=data-engineer for data engineers
  • team=analyst for analysts.

policy.gif

 

Monitoring Serverless Spend

Once the budgets and serverless budget policies are in place, the next step is to establish ongoing monitoring. This ensures that administrators have full visibility into serverless consumption as usage scales—and allows organizations to react quickly if spending starts to drift outside expected patterns.

The system.billing.usage system table provides detailed billing records across all compute types, which allows for more advanced usage queries, especially when paired with system.billing.list_prices. With this information, teams can build custom dashboards, configure alerts, and create periodic reporting pipelines to track serverless usage in near real-time — providing FinOps teams with the level of transparency required to confidently scale serverless adoption.

For example, the following query calculates total list-price cost for serverless compute by workspace over the last 30 days:

SELECT
   t1.workspace_id,
   SUM(t1.usage_quantity * list_prices.pricing.default) AS list_cost
FROM system.billing.usage t1
INNER JOIN system.billing.list_prices 
   ON t1.cloud = list_prices.cloud 
   AND t1.sku_name = list_prices.sku_name 
   AND t1.usage_start_time >= list_prices.price_start_time 
   AND (t1.usage_end_time <= list_prices.price_end_time OR list_prices.price_end_time IS NULL)
WHERE
   t1.sku_name LIKE '%SERVERLESS%'
   AND billing_origin_product IN ('JOBS', 'INTERACTIVE')
   AND t1.usage_date >= CURRENT_DATE() - INTERVAL 30 DAYS
GROUP BY
   t1.workspace_id
HAVING
   list_cost > {budget}

By scheduling this query to run as an alert, administrators can receive proactive notifications as soon as a workspace approaches or exceeds its assigned budget.

Actually Migrating a Job Into Serverless

With cost controls and monitoring in place, we can now start migrating workloads into serverless compute. The actual migration process is fairly straightforward, but there are a few important considerations to be aware of when switching compute, managing dependencies, and validating performance. 

Switching Compute to Serverless 

The first step is updating the job to use serverless compute instead of classic clusters. For jobs that are already running in production, this can typically be done directly from the job configuration UI or API:

Tip: When testing the migration, it’s often a good idea to create a duplicate of the job to avoid any unintended impact on production workloads.

At this point, you’ve officially moved the workload onto serverless — just like that. The only catch is that serverless keeps things light by default, so you’ll need to make sure your job has all the libraries it needs before hitting run.

Understanding Serverless Environment Versions

One key difference when running on serverless is that you no longer control the exact runtime image underneath — Databricks takes care of that for you. Serverless environments are versioned, with Databricks periodically rolling out updates to maintain compatibility, security, and stability.

You can find the full list of available serverless environment versions in the Serverless Release Notes. Each version includes a set of pre-installed system libraries, drivers, and configurations.

When getting started, it’s generally a good practice to select the latest available serverless environment version to take advantage of the most recent fixes, performance improvements, and package updates.

Configuring Library Dependencies

Since serverless keeps the environment minimal by default, it’s important to define any additional libraries your workload requires. There are a few different ways to manage dependencies depending on your use case:

  • Workspace-Level Default Dependencies

You can configure default libraries that apply to all serverless compute in the workspace. This works well for common packages used across multiple teams. Details on how to set this up can be found in the documentation.

 

  • Job Task-Level Dependencies (Recommended for Production)

For jobs, you can define dependencies directly at the job task level. This provides finer control over which packages are installed for each workload, reduces the risk of package conflicts, and keeps things more predictable as multiple teams share the same workspace.

 

 

  • Notebook-Scoped Dependencies (Useful for Development)

For notebooks running on serverless compute, Databricks provides an Environment panel directly inside the notebook UI. This allows you to configure dependencies (including uploading Python wheel files), memory settings, budget policies, and the serverless environment version — all from within the notebook itself. Find more here

These settings will only apply when the notebook is attached to serverless compute.

Run, Monitor, and Compare

With your job now fully serverless-ready, you’re ready to hit run. This is where all the cost controls, budget policies, and monitoring we set up earlier start to pay off. 

We’ve created a dashboard that uses system tables to compare classic jobs against serverless. Just grab the job_id’s, and you’ll be able to view duration and costs per run to better understand how the migration is performing. 

dashboard_img.png

Resource: You can find the code and dashboard in this Git repository.

With these tools in place, you’ll be able to experiment safely, validate serverless performance for your workloads, and decide which jobs are worth fully moving over.

Conclusion:

And with that — I hope this guide helps you feel better equipped to tackle serverless migrations on your own. You’ve got the controls, the monitoring, and the migration process in place — now you’re ready to put serverless to work.