mark_ott
Databricks Employee
Databricks Employee

To resolve the error with setting spark.scheduler.allocation.file to a workspace file in DBR 16.4 LTS, you must adjust the cluster configuration and environment variables. The error message:

text
com.databricks.backend.daemon.driver.WSFSCredentialForwardingHelper$WorkspaceFilesystemException: Failed to update command details for Files in Repos. Set cluster environment variable WSFS_ENABLE=false if you do not need the feature on this cluster.

indicates that the use of workspace files (like file:/Workspace/init/fairscheduler.xml) for Spark scheduler configuration is limited by Databricks' Workspace Filesystem (WSFS) feature. By default, DBR 16.x may try to leverage workspace files with credential forwarding, but this can fail if your cluster or storage setup does not support it fully.​

Correct File Location and Syntax

  • The spark.scheduler.allocation.file configuration usually expects a path accessible by all cluster nodes, typically either:

    • A local path on each node (placed by an init script)

    • Or, a path in DBFS: dbfs:/path/to/fairscheduler.xml (preferred for portability)

  • The workspace path format like file:/Workspace/... is not always supported for Spark configs, especially in newer runtime versions, unless WSFS is enabled and functioning properly.​

Resolving the WSFS Error

  • Disable the Workspace Filesystem (WSFS) feature: Set the environment variable WSFS_ENABLE=false in your cluster's environment. This disables WSFS credential forwarding, and falls back to classic DBFS and driver-local files, avoiding Workspace Filesystem issues.

  • To set the environment variable:

    • Go to your Databricks cluster configuration

    • Under Edit > Advanced Options > Environment Variables

    • Add: WSFS_ENABLE=false

    • Save and restart the cluster.​

  • Ensure your Spark allocation file is accessible to all driver and worker nodes. The most portable approach is to upload your fairscheduler.xml to DBFS and set:

    text
    spark.scheduler.allocation.file=dbfs:/path/to/fairscheduler.xml

    This DBFS path should be used instead of a Workspace path if WSFS is not available.​

Example Setup

  • Place fairscheduler.xml on DBFS:

    text
    dbfs:/databricks/init/fairscheduler.xml
  • Set the Spark configuration:

    text
    spark.scheduler.allocation.file=dbfs:/databricks/init/fairscheduler.xml
  • Add the environment variable:

    text
    WSFS_ENABLE=false

in your cluster's environment variables section.​

This should resolve the error and allow your cluster to start and use your fair scheduler configuration as expected.