cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why not enable "decommissioning" in spark?

Erik
Valued Contributor II

You can enable "decommissioning" in spark, which causes it to remove work from a worker when it gets a notification from the cloud that the instance goes away (e.g. SPOT instances). This is dissabled by default, but it seems like such a no-brainer to activate.

So: Why is it not enabled by default in databricks, and is there any downside to enabling it?

 

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Erik , 

Enabling decommissioning in Spark is valuable, especially when dealing with cloud environments and transient instances like SPOT.

Let’s delve into the reasons behind its default state and potential downsides:

  1. Why Not Enabled by Default?

    • Databricks, as a managed Spark platform, makes certain design choices based on a balance of flexibility, performance, and ease of use.
    • By default, decommissioning is turned off in Databricks clusters. The following considerations likely influence this decision:
      • Stability: Enabling decommissioning introduces additional complexity. Ensuring stability and predictable behaviour across various scenarios is crucial.
      • Data Consistency: When a worker is decommissioned, any cached data or shuffle files stored on that worker are lost. This can impact performance and data consistency.
      • Resource Management: Databricks aims to maintain a stable cluster size. Decommissioning could lead to frequent node replacements, affecting resource availability.
      • User Expectations: Default settings prioritize simplicity and reliability for a broad user base. Not all users may be aware of or require decommissioning.
  2. Potential Downsides of Enabling Decommissioning:

    • Data Loss: As mentioned, cached data and shuffle files on a decommissioned worker are lost. If your workload relies heavily on caching, this could impact performance.
    • Increased Instability: Frequent node replacements due to decommissioning might lead to instability, especially if autoscaling is enabled.
    • Complexity: Managing decommissioned nodes requires additional logic and coordination. Ensuring proper data migration and task re-computation can be challenging.
  3. Databricks Improvements:

In summary, while enabling decommissioning can be beneficial, weighing the trade-offs based on your specific use case is essential. Databricks allows users to turn this feature on or off as needed, making informed decisions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group