Databricks Community

dgomezm · ‎04-21-2025

In an era where data drives innovation and competitive advantage, protecting it becomes a non-negotiable priority. Particularly when it involves sensitive information, even minor lapses can translate into significant risks and losses. For organizations leveraging Databricks, regular Databricks Runtime (DBR) migrations aren’t simply about staying current—they’re essential to safeguarding your data, ensuring optimal performance, and driving business value from your analytics and AI investments.

For serverless workloads—Databricks offers a versionless experience—removing the need for customers to manage or upgrade runtime versions altogether. But not all workloads are suited for serverless today. For teams managing classic compute environments, staying current with DBR versions remains a best practice—and at scale, it can be challenging.

In this blog, we delve into the DBR migration process, address the challenges organizations may face, and offer actionable best practices along with automation techniques to streamline the transition. Keeping your dependencies up to date not only enhances performance but also serves as a critical defense against vulnerabilities, ensuring that your data remains secure.

Why do we need to migrate?

At its core, the Databricks Runtime is built upon Apache Spark and enriched by additional libraries and components designed to simplify your analytics and AI workloads. Regular updates help ensure your workflows run securely and smoothly, enabling your teams to focus on driving impactful insights and innovations rather than managing disruptions.

Migrations also play a critical role in addressing vulnerabilities. Each release includes essential security patches and performance enhancements that strengthen the stability and resiliency of your Databricks environment—future-proofing your analytics and AI strategy.

Understanding the DBR Lifecycle:

DBR Version	Spark Version	Release Date	EoS Date	Key Changes
15.4 LTS	3.5.0	Aug 19, 2024	Aug 19, 2027	Stability improvements for large-scale workloads
14.3 LTS	3.5.0	Feb 1, 2024	Feb 1, 2027	Predictive Optimization GA; automated Delta table maintenance.
13.3 LTS	3.4.1	Aug 22, 2023	Aug 22, 2026	Scala support for Unity Catalog shared clusters, volumes support for storing artifact.
12.2 LTS	3.3.2	Mar 1, 2023	Mar 1, 2026	Delta Lake performance optimizations; new techniques for joins and aggregations.
11.3 LTS	3.3.0	Oct 19, 2022	Oct 19, 2025	Predictive I/O support for accelerated reads (Photon engine)

In addition to mitigating risks, upgrading to the latest DBR version unlocks a host of significant benefits. Each new release brings value enhancements, such as improved query performance, optimized resource utilization, and advanced data governance features like enhanced metadata management, robust access controls, and lineage tracking. These upgrades not only boost workload efficiency but also help ensure your entire data estate remains compliant with industry standards. For instance, DBR 12.2 leverages Unity Catalog to enable powerful features like row-level and column-level security.

Planning Your DBR Migration:

Assess your workspace

A successful migration begins by assessing your existing workspace and clearly identifying the resources affected by the upcoming runtime transition. To simplify this critical step, we’ve developed an assessment dashboard [link] to help you quickly identify and prioritize workloads based on DBR versions and job spend, minimizing risk and accelerating your migration.

Note: Estimates may not reflect actual billing. The dashboard requires system tables and a Unity Catalog-enabled workspace.

Note: Be sure to account for external dependencies—such as Azure Data Factory, Apache Airflow, or other third-party tools directly triggering job clusters—as these will need to be updated accordingly. The impact to these external dependencies falls outside the scope of this dashboard.

Establishing a Development Environment:

A dedicated development environment is essential for safe testing and validation. This isolated workspace lets you test DBR upgrades, identify potential issues, and iterate without risking production workloads.

If you don’t already have a dev environment, we recommend setting up a separate Databricks workspace specifically for testing purposes. This setup offers better governance, separation of duties, and reproducibility of test results. [Create a Databricks workspace]

Note: Skip this section if a development environment already exists, but ensure jobs exactly match your target environments to guarantee accurate testing.

Manually Mirroring Jobs into a Development Environment

Carefully replicate existing job configurations into your new development environment:

Navigate to the targeted jobs in your current workspace.
Within the Jobs UI, click on the ellipsis menu (...) and choose “View JSON”.
Click “Create” to automatically generate a Databricks CLI command to recreate this job.
Authenticate with your development workspace via Databricks CLI and run the command to replicate the job.

Warning: Ensure all dependencies—including notebooks and libraries—are appropriately migrated.

Testing Jobs

With jobs mirrored into your development environment, validate compatibility proactively to identify potential issues early without impacting production workloads. Update each job’s DBR version and closely monitor performance, diagnosing and resolving issues directly within this isolated environment.

Important: Watch for subtle behavioral changes, not just explicit errors. For example, Spark’s size() function behaves differently across versions in ANSI mode:

ANSI Mode Enabled	ANSI Mode Disabled
`size(NULL) → NULL`	`size(NULL) → -1`

Thorough testing with representative datasets ensures smoother migrations and robust CI/CD practices. For comprehensive testing guidelines, refer to Databricks’ documentation.

Once validated, migrate these configurations confidently to your production environment, ensuring minimal risk and seamless continuity.

Exploring automation opportunities

Manual migration quickly becomes resource-intensive, especially at scale. Databricks strongly encourages automation to simplify complex migrations and democratize data operations by reducing complexity.

Databricks Terraform Exporter: Automates extraction and replication of workspace resources, including users, groups, jobs, and notebooks. [Find more here.]

Databricks Asset Bundles: Enables batch extraction using the bundle generate command, modification, and deployment of prioritized jobs. [Find more here]

We’re actively developing new automation tools to further streamline your DBR migration experience. Stay tuned for updates and new developments.

Common Pitfalls and Best Practices:

Avoid migration complexity by:

Thoroughly documenting each step: Job configurations, compatibility issues, dependencies, and solutions.

Proactively reviewing documentation: Spark migration guides, DBR release notes, and the upcoming DBR Migration tool.

If you encounter challenges, reach out to your Databricks account team. They provide valuable resources and direct support to keep your migration on track.

Transforming Migrations Into Strategic Advantage

Regular DBR migrations aren’t just technical housekeeping; they’re strategic opportunities. For example, companies migrating to DBR 14 leveraged Predictive Optimization to significantly reduce query costs and accelerate insights, unlocking new analytics-driven opportunities.

Conclusion

Migrating your Databricks Runtime is not merely routine—it’s a strategic imperative. Proactively managing migrations enhances performance, data security, and simplifies governance, empowering your organization to leverage data and AI more effectively to solve your toughest challenges. Stay proactive, informed, and ensure your Databricks environment remains secure, agile, and ready to support your evolving data initiatives.

Databricks Community

Mastering DBR Migrations at Scale

Why do we need to migrate?

Understanding the DBR Lifecycle:

Planning Your DBR Migration:

Assess your workspace

Establishing a Development Environment:

Manually Mirroring Jobs into a Development Environment

Testing Jobs

Exploring automation opportunities

Common Pitfalls and Best Practices:

Transforming Migrations Into Strategic Advantage

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks