cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Move files

Gilg
Contributor II

Hi

I am using DLT with Autoloader.

DLT pipeline is running in Continuous mode.

Autoloader is in Directory Listing mode (Default)

Question.

I want to move files that has been processed by the DLT to another folder (archived) and planning to have another notebook pipeline to do this. Is this possible by setting the ignoreMissingFiles in my DLT pipeline?

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @GilgDeploying an AutoML pipeline in production while using a shared cluster in Databricks can be a bit tricky due to compatibility constraints.

Let’s explore some potential workarounds:

  1. Shared Cluster with AutoML Compatibility:

    • As you mentioned, AutoML is not compatible with shared clusters. However, if you have a specific shared cluster that you want to use for production, consider the following steps:
      • Create a Dedicated Cluster: Set up a dedicated cluster specifically for your AutoML pipeline. This cluster should not be shared with other users.
      • Deploy AutoML Pipeline: Deploy your AutoML pipeline on this dedicated cluster.
      • Access Control: Ensure that only authorized users have access to this dedicated cluster. You can manage access through Databricks workspace permissions.
      • Monitoring and Scaling: Monitor the cluster’s performance and scale it up or down as needed based on workload demands.
  2. Single User Cluster with Shared Access:

    • If you want to use a single user cluster for deploying your AutoML pipeline, you can allow other users to utilize the same cluster. Here’s how:
      • Deploy on Single User Cluster: Deploy your AutoML pipeline on a single user cluster.
      • Shared Access Mode: Change the cluster’s access mode to “Shared.” This allows other users to share the same cluster.
      • Resource Allocation: Be cautious about resource allocation. Since multiple users will be sharing the cluster, ensure that it has sufficient resources (CPU, memory, etc.) to handle the workload.
      • Concurrency and Performance: Keep in mind that shared clusters may experience performance variations due to concurrent usage by multiple users.
  3. Hybrid Approach:

    • Consider a hybrid approach where you use a dedicated cluster for AutoML training and a shared cluster for serving predictions:
      • Training Cluster: Use a dedicated cluster for training your AutoML models. This ensures optimal performance during model training.
      • Deployment Cluster: Deploy the trained models to a separate shared cluster for serving predictions in production. This way, other users can also utilize the deployment cluster without affecting the training workload.

Remember to thoroughly test your chosen approach in a staging environment before deploying it in production. Additionally, consult Databricks documentation or community forums for any specific best practices r...12.

Keep in mind that the choice of approach depends on your specific requirements, resource availability, and organizational policies. Choose the one that aligns best with your production needs and scalability considerations.

 

Thanks for the reply, but your reply is for someone else not mine

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!