cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

AutoML in production

HiraNisar
New Contributor

I have a workflow in Databricks and an AutoML pipeline in it.

I want to deploy that pipeline in production, but I want to use the shared cluster in production, since AutoML is not compatible with the shared clusters, what can be the workaround.

(Is it possible to deploy the pipeline in production using single user cluster and allow other users to use that cluster too or can we somehow deploy the single user cluster AutoML pipeline on a shared cluster in production?)

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @HiraNisar ,Deploying an AutoML pipeline in production while utilizing a shared cluster can be a bit tricky, but there are some workarounds you can consider:

  1. Dedicated Cluster for AutoML:

    • Create a dedicated single-user cluster specifically for your AutoML pipeline. This cluster will be used exclusively for running your AutoML experiments.
    • Deploy your AutoML pipeline on this dedicated cluster.
    • However, keep in mind that other users won’t be able to share this cluster for their workloads.
  2. Shared Cluster with AutoML:

    • If you want to use a shared cluster, you can deploy your AutoML pipeline on a single-user cluster and then allow other users to use that same cluster.
    • However, this approach has some limitations:
      • Resource Contention: Since the cluster is shared, other users’ workloads might compete for resources with your AutoML pipeline, potentially affecting performance.
      • Compatibility: As you mentioned, AutoML is not fully compatible with shared clusters. You might encounter issues related to resource allocation, libraries, or dependencies.
      • Isolation: Running AutoML on a shared cluster might not provide the desired level of isolation for your production workload.
  3. Hybrid Approach:

    • Consider a hybrid approach where you use a dedicated cluster for AutoML training and a shared cluster for serving predictions.
    • Train your AutoML models on the dedicated cluster, ensuring optimal resources and compatibility.
    • Once the model is trained, deploy it to a separate shared cluster that serves predictions to production applications.
    • This way, you maintain isolation during training and utilize shared resources for serving.

Remember that the choice depends on your specific requirements, resource availability, and performance considerations. Evaluate the trade-offs carefully and choose the approach that best aligns with your production needs. 😊

For more detailed guidance, you can refer to the official Databricks documentation on AutoML.