Databricks Academy offers the free Databricks Performance Optimization course to help data engineers improve workload performance on the Databricks Data Intelligence Platform. As part of the Advanced Data Engineering with Databricks series, it focuses on practical ways to optimize Spark and Delta Lake workloads, improve physical data layout, and use the Spark UI to troubleshoot performance issues.
You’ll learn to:
- Use the Spark UI to debug performance: Learn how to identify jobs, stages, tasks, and key signals like shuffle, spill, file pruning, and storage access.
- Improve data layout for better query speed: Understand small-file issues, compare Z-Ordering and Liquid Clustering, and use Databricks features to keep file layouts healthy.
- Optimize code and query execution: Diagnose skew, shuffle, and UDF bottlenecks, then apply techniques like AQE, broadcast joins, and native functions to improve performance.
- Choose the right compute for the workload: Learn how to select cluster types, instance families, and Photon settings using practical Databricks rules of thumb.
Designed for:
- Data engineers working with Spark and Delta Lake workloads
- Users with basic Databricks development skills
- Learners with intermediate PySpark and Delta Lake experience
Course format & details:
- Series: Part of the Advanced Data Engineering with Databricks series
- Syllabus: 4 sections | 19 lessons
- Duration: 2 hours
- Skill level: Professional
- Cost: Free
- Includes labs: No
🔗 Enroll Now