Databricks Community

Lanz · 10-25-2024

When running an AutoML experiment on Databricks, the default setup treats each data sample as equally important. However, this approach can be problematic when dealing with highly imbalanced datasets. To address this issue and accommodate users who w...

Lanz · 10-17-2024

When launching an AutoML experiment on Databricks, the default run splits the dataset randomly with 60% for training, 20% for validation, and 20% for testing. Starting from ML Runtime 15.3, users can customize the dataset split in AutoML. Use Case #1...

Lanz · 09-05-2024

When running distributed training or batch inference on multi-node GPU clusters with Spark, the GPUs on the Driver node often remain underutilized, resulting in unnecessary waste of GPU resources. The figures below illustrate this issue: Fig.1: Only ...

Databricks Community

User Stats

User Activity

💡 ML Training Tip Of The Week #3 - Adjust Sample Weight for Imbalanced Dataset in AutoML

💡 ML Training Tip Of The Week #2 - Custom Dataset Split in AutoML

💡 ML Training Tip Of The Week #1: Optimizing GPU Utilization in Multi-Node Spark Clusters