Learning Series | Databricks Performance Optimizat... - Databricks Community

Series: Part of the Advanced Data Engineering with Databricks series
Syllabus: 4 sections | 19 lessons
Duration: 2 hours
Skill level: Professional
Cost: Free
Includes labs: No

Community Articles

Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.

Databricks Academy offers the free Databricks Performance Optimization course to help data engineers improve workload performance on the Databricks Data Intelligence Platform. As part of the Advanced Data Engineering with Databricks series, it focuses on practical ways to optimize Spark and Delta Lake workloads, improve physical data layout, and use the Spark UI to troubleshoot performance issues.

You’ll learn to:

Use the Spark UI to debug performance: Learn how to identify jobs, stages, tasks, and key signals like shuffle, spill, file pruning, and storage access.
Improve data layout for better query speed: Understand small-file issues, compare Z-Ordering and Liquid Clustering, and use Databricks features to keep file layouts healthy.
Optimize code and query execution: Diagnose skew, shuffle, and UDF bottlenecks, then apply techniques like AQE, broadcast joins, and native functions to improve performance.
Choose the right compute for the workload: Learn how to select cluster types, instance families, and Photon settings using practical Databricks rules of thumb.