Databricks Community

Luca_Borin · ‎08-01-2025

The Lakeflow Declarative Pipelines framework makes it easy to build cost-effective streaming and batch ETL workflows using a simple and declarative syntax: you define the transformations for your data, and the platform will automatically manage typical data engineering challenges like task orchestration, scaling, monitoring, data quality, and error handling.

This blog post explores how to automate the bulk conversion of pipelines to the serverless tier.

What are the advantages of serverless Lakeflow Declarative Pipelines?

The pipelines can be configured with three classic and two serverless editions that provide different built-in capabilities.

The clearest advantage of the serverless tiers is that you do not need to dedicate time and effort to define which virtual machine is best suited for the workload, the autoscaling properties to handle spiky loads or how to tackle out-of-memory issues. The Databricks platform takes care of all these common data engineering and infrastructure challenges leveraging telemetry and AI.

The serverless tier provides some unique differentiators:

Photon is the next-generation engine on the Databricks platform that provides extremely fast query performance at a low cost. Unlike classic tiers, it is included by default in the serverless tier with no unit cost multiplier.
Ezyme is a cost-based optimizer that enables automatic and incremental refresh of materialized views, eliminating the need for users to write complex logic. Compared to equivalent materialized view refreshes on Lakeflow Declarative Pipelines using classic compute, Enzyme can deliver up to 6.5x higher throughput and 85% lower latency.
Stream Pipelining improves the throughput of loading files and events in Lakeflow Declarative Pipelines when using streaming tables, providing up to 5x better price performance than the equivalent ingestion workload on a classic pipeline.

If you want to deep-dive into the recent optimization of serverless Lakeflow Declarative Pipelines, take a look at this Databricks blog.

Considerations for Databricks serverless compute adoption

Prerequisites

The adoption of Databricks’ serverless architecture requires some account-level configuration to satisfy common enterprise security best practices (Unity Catalog, Network Connectivity Configuration, Serverless Egress Control). These configurations are typically managed centrally by a core infrastructure or data team, so let’s assume you have the environment already prepared to use the Databricks serverless infrastructure.

Cost monitoring

Databricks offers several features to help you monitor the cost of serverless compute (doc).

Tagging your resources with proper identifiers is fundamental for cost management and cost attribution. The serverless tier leverages a dedicated tagging mechanism called Budget Policies, which applies tags to any serverless compute activity performed by an identity assigned to the policy, similarly to how classic resource tagging works. We are going to integrate the Budget Policies into the migration process.

Serverless modes

The serverless tier for Lakeflow Declarative Pipelines is available in two different flavors:

Standard: Best suited for workloads that do not require very fast startup times and are focused on optimizing for cost.
Performance Optimized: Ideal for high-throughput and critical workloads that need to optimize for SLAs.

The configuration of serverless mode is specified in the Workflow that schedules the pipeline (doc); thus, the pipelines converted using the utility will use either the default value or the configuration explicitly set in the associated Workflow.

At the moment, the serverless Standard tier can be enabled from the Preview Portal.

Automating Bulk Conversion of pipelines to serverless

The Lakeflow Declarative Pipelines serverless converter is an accelerator that speeds up migrating existing pipelines from classic to serverless computing, minimizing manual conversion effort.

The converter provides the following capabilities:

Workspace pipeline discovery
Automated budget policy creation and grant association
Automated pipeline conversion to serverless computing
Automated backup and recovery

Step 1: Installation

Start cloning the GitHub project, move to the project folder serverless_converter, and install the utility in your environment.

The Python databricks-sdk is going to be installed in your preferred Python environment.

git clone https://github.com/databricks-solutions/databricks-blogposts
cd databricks-blogposts/2025-07-fast-track-to-serverless
pip install .

You can easily interact with the converter through the command line interface.

Step 2: Authentication

The converter authenticates using an identity recognized by your Databricks account:

Create a new principal or use an existing one (Add service principals to your account using the account console).
Assign the service principal to your workspace (Assign a service principal to a workspace using the account console).
Grant the service principal proper roles to get the desired visibility on your pipelines

Workspace Admin or CAN MANAGE / IS OWNER on the pipelines you want to convert.
If you want to leverage the utility to manage the Budget Policies automatically, the service principal must have the Billing Admin role. You can assign this permission directly from the Account Console.

You can set the required environment variables or provide the values through the command line to authenticate the converter using the selected identity:

DATABRICKS_WORKSPACE_HOST is the URL of the workspace where the target pipelines exist
DATABRICKS_CLIENT_ID is the identifier of the designated Service Principal
DATABRICKS_CLIENT_SECRET is the secret of the designated Service Principal (doc)
DATABRICKS_ACCOUNT_HOST is the URL of the Databricks Account Console (Azure, AWS, GCP)
DATABRICKS_ACCOUNT_ID is the ID of your Databricks account. You can retrieve it directly from the Account portal (doc).

Step 3: Conversion to serverless

serverless-converter convert [--backup-file FILE_PATH] [--budget-policy-id POLICY_ID] [--skip-budget-policy]

The convert command enables you to migrate all selected pipelines in place while backing up the current configuration.

The converter returns a list of all the workspace pipelines and asks you which objects you want to convert to serverless. If the converter does not list the pipeline you are searching for, ensure that the principal has the correct permissions (Step 2).

The user is then prompted regarding their preferences for the Budget Policies.

Option 1: Use an existing budget policy

The pipeline owner must be granted at least User permission for the selected Budget Policy to use it for the pipeline execution (doc).

The user must input the budget policy ID. All selected pipelines are then easily converted in place to the serverless tier:

Option 2: Generate policies matching the pipelines

A budget policy will be created for each pipeline, replicating the tags set in the classic compute. The user/service principal owner of the pipeline will be granted the appropriate permissions to attach to the new budget policy.

Let’s explore “Application_01” pipeline as an example. The existing pipeline has several tags associated to track the costs associated with the executions:

These tags are replicated in a new budget policy with the same name as the pipeline:

The pipelines are then converted to serverless and associated with their respective budget policies:

As you can see, the tags are preserved, while the compute configuration has been updated:

Option 3: No budget policy

You can also choose to convert the pipelines without associating any policy by simply providing the flag –skip-budget-policy and executing the convert command.

Step 4: Performance and cost analysis

Databricks System Tables (doc) provide a comprehensive view of pipeline executions, allowing users to easily compare different runs. With the unified monitoring offered by the Databricks Intelligence Platform, you can easily quantify and evaluate the benefits of the conversion to serverless.

[Optional] Rollback to a backup configuration

During the conversion process, a backup file containing all pipeline configurations is created. By using the rollback command, you can load the original configuration and revert the selected pipelines to the previous version.

To prevent issues with cross-dependencies, the rollback process does not delete the Budget Policies; it simply detaches them from the related pipelines once it restores to the classic tier.

serverless-converter rollback --backup-file FILE_PATH

Conclusion

Transitioning your Lakeflow Declarative Pipelines to the serverless infrastructure can greatly enhance your data processing capabilities by improving performance, reducing latency, and lowering operational costs. Automating bulk conversion removes significant effort and human errors, streamlines pipeline management, and quickly scales your promotion to serverless.