cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Location of spark.scheduler.allocation.file

bidek56
Contributor

In DBR 164.LTS, I am trying to add the following Spark config: 

spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xml
But the all purpose cluster is throwing this error
 
Spark error: Driver down cause: com.databricks.backend.daemon.driver.WSFSCredentialForwardingHelper$WorkspaceFilesystemException: Failed to update command details for Files in Repos. Set cluster environment variable WSFS_ENABLE=false if you do not need the feature on this cluster..

What is the location and syntax to resolve this error? Thx

1 ACCEPTED SOLUTION

Accepted Solutions

mark_ott
Databricks Employee
Databricks Employee

Here's some solutions without using DBFS.. 

Yes, there are solutions for using the Spark scheduler allocation file on Databricks without DBFS, but options are limited and depend on your environment and access controls.

Alternatives to DBFS for Scheduler Files

1. Local Driver or Worker Node Files

  • You can place the fairscheduler.xml directly on each cluster node’s local filesystem (e.g., /databricks/driver/init/fairscheduler.xml).

  • Use an init script to distribute the file to these locations at cluster startup.

  • Set your Spark config as:

    text
    spark.scheduler.allocation.file: file:/databricks/driver/init/fairscheduler.xml

    This method is only reliable if the file placement is consistent across nodes and managed by an init script.​

2. Classpath Inclusion

  • Package fairscheduler.xml in your application’s JAR, and reference it via classpath:

    text
    spark.scheduler.allocation.file: fairscheduler.xml

    If it’s present on the classpath, Spark can pick it up. However, this does not work reliably in all Databricks environments, as cluster packaging and bind mounting policies may vary.​

3. Workspace Filesystem with Disabled WSFS

  • If you are using workspace files (e.g., under /Workspace/init/), disabling WSFS (WSFS_ENABLE=false) may allow fallback to classic filesystem access.

  • However, this often does not resolve the error on newer runtime unless you still ensure your file is locally accessible to all cluster nodes (either by classpath or node path). There are reports from community users that disabling WSFS may not always have the intended effect.​

4. Unity Catalog Volumes

  • If you have access to Unity Catalog, you may be able to store files as Unity Catalog volumes and reference them for Spark, but this requires setup and may have accessibility limitations for non-tabular files.​

What Doesn't Work Reliably

  • Direct Workspace paths (e.g., file:/Workspace/init/fairscheduler.xml) tend to fail without proper WSFS credential setup and cluster config, which is the error you are seeing.​

  • Disabling WSFS on some clusters does not force fallback to workspace files, especially on DBR 16.x and newer Databricks versions.​

View solution in original post

5 REPLIES 5

mark_ott
Databricks Employee
Databricks Employee

To resolve the error with setting spark.scheduler.allocation.file to a workspace file in DBR 16.4 LTS, you must adjust the cluster configuration and environment variables. The error message:

text
com.databricks.backend.daemon.driver.WSFSCredentialForwardingHelper$WorkspaceFilesystemException: Failed to update command details for Files in Repos. Set cluster environment variable WSFS_ENABLE=false if you do not need the feature on this cluster.

indicates that the use of workspace files (like file:/Workspace/init/fairscheduler.xml) for Spark scheduler configuration is limited by Databricks' Workspace Filesystem (WSFS) feature. By default, DBR 16.x may try to leverage workspace files with credential forwarding, but this can fail if your cluster or storage setup does not support it fully.​

Correct File Location and Syntax

  • The spark.scheduler.allocation.file configuration usually expects a path accessible by all cluster nodes, typically either:

    • A local path on each node (placed by an init script)

    • Or, a path in DBFS: dbfs:/path/to/fairscheduler.xml (preferred for portability)

  • The workspace path format like file:/Workspace/... is not always supported for Spark configs, especially in newer runtime versions, unless WSFS is enabled and functioning properly.​

Resolving the WSFS Error

  • Disable the Workspace Filesystem (WSFS) feature: Set the environment variable WSFS_ENABLE=false in your cluster's environment. This disables WSFS credential forwarding, and falls back to classic DBFS and driver-local files, avoiding Workspace Filesystem issues.

  • To set the environment variable:

    • Go to your Databricks cluster configuration

    • Under Edit > Advanced Options > Environment Variables

    • Add: WSFS_ENABLE=false

    • Save and restart the cluster.​

  • Ensure your Spark allocation file is accessible to all driver and worker nodes. The most portable approach is to upload your fairscheduler.xml to DBFS and set:

    text
    spark.scheduler.allocation.file=dbfs:/path/to/fairscheduler.xml

    This DBFS path should be used instead of a Workspace path if WSFS is not available.​

Example Setup

  • Place fairscheduler.xml on DBFS:

    text
    dbfs:/databricks/init/fairscheduler.xml
  • Set the Spark configuration:

    text
    spark.scheduler.allocation.file=dbfs:/databricks/init/fairscheduler.xml
  • Add the environment variable:

    text
    WSFS_ENABLE=false

in your cluster's environment variables section.​

This should resolve the error and allow your cluster to start and use your fair scheduler configuration as expected.

@mark_ott Was this answer generated by AI? It conflicts with other docs b/c:

1. dbfs is deprecated in the latest DBR versions, workspaces are recommend for storing cluster-scoped init scripts and other workspace-related files. Cluster startup fails when I use dbfs.

2. I include fairscheduler.xml in the .jar file used to run the job, which gets loaded property in the open source version of Spark, but somehow it does not work in Databricks.

Thx for your help.

bidek56
Contributor

@mark_ott 
Setting WSFS_ENABLE=false does not effect anything. Thx

mark_ott
Databricks Employee
Databricks Employee

I used AI but maybe it was based on older docs?

mark_ott
Databricks Employee
Databricks Employee

Here's some solutions without using DBFS.. 

Yes, there are solutions for using the Spark scheduler allocation file on Databricks without DBFS, but options are limited and depend on your environment and access controls.

Alternatives to DBFS for Scheduler Files

1. Local Driver or Worker Node Files

  • You can place the fairscheduler.xml directly on each cluster node’s local filesystem (e.g., /databricks/driver/init/fairscheduler.xml).

  • Use an init script to distribute the file to these locations at cluster startup.

  • Set your Spark config as:

    text
    spark.scheduler.allocation.file: file:/databricks/driver/init/fairscheduler.xml

    This method is only reliable if the file placement is consistent across nodes and managed by an init script.​

2. Classpath Inclusion

  • Package fairscheduler.xml in your application’s JAR, and reference it via classpath:

    text
    spark.scheduler.allocation.file: fairscheduler.xml

    If it’s present on the classpath, Spark can pick it up. However, this does not work reliably in all Databricks environments, as cluster packaging and bind mounting policies may vary.​

3. Workspace Filesystem with Disabled WSFS

  • If you are using workspace files (e.g., under /Workspace/init/), disabling WSFS (WSFS_ENABLE=false) may allow fallback to classic filesystem access.

  • However, this often does not resolve the error on newer runtime unless you still ensure your file is locally accessible to all cluster nodes (either by classpath or node path). There are reports from community users that disabling WSFS may not always have the intended effect.​

4. Unity Catalog Volumes

  • If you have access to Unity Catalog, you may be able to store files as Unity Catalog volumes and reference them for Spark, but this requires setup and may have accessibility limitations for non-tabular files.​

What Doesn't Work Reliably

  • Direct Workspace paths (e.g., file:/Workspace/init/fairscheduler.xml) tend to fail without proper WSFS credential setup and cluster config, which is the error you are seeing.​

  • Disabling WSFS on some clusters does not force fallback to workspace files, especially on DBR 16.x and newer Databricks versions.​