-
Dedicated Cluster for AutoML:
- Create a dedicated single-user cluster specifically for your AutoML pipeline. This cluster will be used exclusively for running your AutoML experiments.
- Deploy your AutoML pipeline on this dedicated cluster.
- However, keep in mind that other users won’t be able to share this cluster for their workloads.
-
Shared Cluster with AutoML:
- If you want to use a shared cluster, you can deploy your AutoML pipeline on a single-user cluster and then allow other users to use that same cluster.
- However, this approach has some limitations:
- Resource Contention: Since the cluster is shared, other users’ workloads might compete for resources with your AutoML pipeline, potentially affecting performance.
- Compatibility: As you mentioned, AutoML is not fully compatible with shared clusters. You might encounter issues related to resource allocation, libraries, or dependencies.
- Isolation: Running AutoML on a shared cluster might not provide the desired level of isolation for your production workload.
-
Hybrid Approach:
- Consider a hybrid approach where you use a dedicated cluster for AutoML training and a shared cluster for serving predictions.
- Train your AutoML models on the dedicated cluster, ensuring optimal resources and compatibility.
- Once the model is trained, deploy it to a separate shared cluster that serves predictions to production applications.
- This way, you maintain isolation during training and utilize shared resources for serving.
Remember that the choice depends on your specific requirements, resource availability, and performance considerations. Evaluate the trade-offs carefully and choose the approach that best aligns with your production needs. 😊
For more detailed guidance, you can refer to the official Databricks documentation on AutoML.