💡 ML Training Tip Of The Week #4: Speed up your ML workload with one toggle

Technical Blog

Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.

Machine learning workload can be time-consuming due to training data processing such as ETL and feature engineering and iterative model training steps.

Recent ML Runtime releases have made both faster by 2X with a single toggle during the cluster creation:

Photon (available in MLR 15.2 or above), which is the C++ query execution engine, can speed up Spark SQL and Spark DataFrame that are commonly used in ETL and feature engineering by 1.5~2X
Graviton instances (available in MLR 15.4 LTS or above) powered by ARM-based CPUs designed by AWS can speed up XGBoost, LightGBM, etc. algorithms by up to 1.5X.

To enable Photon, select “Use Photon Acceleration” when creating a cluster as shown below: