cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Why not enable "decommissioning" in spark?

Erik
Valued Contributor II

You can enable "decommissioning" in spark, which causes it to remove work from a worker when it gets a notification from the cloud that the instance goes away (e.g. SPOT instances). This is dissabled by default, but it seems like such a no-brainer to activate.

So: Why is it not enabled by default in databricks, and is there any downside to enabling it?

 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Erik , 

Enabling decommissioning in Spark is valuable, especially when dealing with cloud environments and transient instances like SPOT.

Let’s delve into the reasons behind its default state and potential downsides:

  1. Why Not Enabled by Default?

    • Databricks, as a managed Spark platform, makes certain design choices based on a balance of flexibility, performance, and ease of use.
    • By default, decommissioning is turned off in Databricks clusters. The following considerations likely influence this decision:
      • Stability: Enabling decommissioning introduces additional complexity. Ensuring stability and predictable behaviour across various scenarios is crucial.
      • Data Consistency: When a worker is decommissioned, any cached data or shuffle files stored on that worker are lost. This can impact performance and data consistency.
      • Resource Management: Databricks aims to maintain a stable cluster size. Decommissioning could lead to frequent node replacements, affecting resource availability.
      • User Expectations: Default settings prioritize simplicity and reliability for a broad user base. Not all users may be aware of or require decommissioning.
  2. Potential Downsides of Enabling Decommissioning:

    • Data Loss: As mentioned, cached data and shuffle files on a decommissioned worker are lost. If your workload relies heavily on caching, this could impact performance.
    • Increased Instability: Frequent node replacements due to decommissioning might lead to instability, especially if autoscaling is enabled.
    • Complexity: Managing decommissioned nodes requires additional logic and coordination. Ensuring proper data migration and task re-computation can be challenging.
  3. Databricks Improvements:

In summary, while enabling decommissioning can be beneficial, weighing the trade-offs based on your specific use case is essential. Databricks allows users to turn this feature on or off as needed, making informed decisions.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.