cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

smukhi
New Contributor II

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of the job was the previous night.

In this job we started encountering the following error message:


"Py4JJavaError: An error occurred while calling o1830.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task creation failed: com.databricks.unity.error.MissingCredentialScopeException: [UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not found in thread locals.. SQLSTATE: XXKUC com.databricks.unity.error.MissingCredentialScopeException: [UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not found in thread locals.. SQLSTATE: XXKUC"

At a high level, the job takes a list of string UUIDs, then reads from a Delta table stored in Unity catalog and filters that table on matches to that UUID. It then checkpoints the DataFrame halfway through the transformations, and lastly culminates in writing data to AWS S3 before moving on to the next UUID. It's very interesting to note, that on the first loop through, there are no issues encountered in the loop and the files are successfully written to S3. However every time the job fails, it fails on the second loop, where we encounter the above error message (consistently reproducible). If I kick off a separate job for each UUID in the original list, the job succeeds (this is a sub-optimal approach due to cluster startup time). To reiterate, this worked fine yesterday and none of the code has changed.

It appears this may be a bug with something that changed with the Unity Catalog service. Anyone have any ideas?

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @smukhi, The error message you’re encountering, specifically the “Py4JJavaError” with the “Missing Credential Scope” issue, can be quite puzzling.

Let’s explore some potential solutions and ideas to troubleshoot this problem:

  1. Check Cluster Configuration:

    • Even though you mentioned that the cluster configuration hasn’t changed, it’s still worth verifying that there haven’t been any inadvertent changes. Ensure that the cluster has the necessary credentials and permissions to access the Unity Catalog service.
  2. Unity Catalog Service:

    • The error message points to an issue related to the Unity Catalog service. Unity is a Databricks service that provides a unified metadata catalog for managing data and metadata across different storage systems.
    • Verify that the Unity Catalog service is running and accessible. If there have been any updates or changes to the Unity service, it might be worth investigating those.
    • Check if there are any known issues or updates related to Unity in the Databricks documentation or community forums.
  3. Credential Scopes:

    • The error specifically mentions “Missing Credential Scope.” This suggests that the job is expecting a certain credential scope that isn’t available.
    • Review the credentials used by your job. Ensure that the necessary credentials (such as AWS S3 credentials) are correctly configured and accessible.
    • Double-check the credential scope associated with the job. If it’s missing or incorrect, that could be the root cause.
  4. Session and Thread Context:

    • The error message also refers to “Unity Credential Scope id not found in thread locals.” This indicates that the credential scope might not be properly propagated to the thread where the job is executing.
    • Investigate how the credential scope is set and propagated within your code. Make sure it’s correctly established at the session level and accessible by all tasks.
    • If you’re using any custom libraries or code that modifies thread context, ensure that it doesn’t interfere with the credential scope.
  5. Reproducibility:

    • You mentioned that the issue occurs consistently during the second loop. This suggests that there might be some state or context that changes between the first and second loop.
    • Look for any differences in the environment, data, or configuration between the loops. It could be related to the specific UUIDs being processed or other factors.
  6. Logging and Debugging:

    • Enable detailed logging for your job. Check the logs to see if there are any additional clues about what’s going wrong.
    • If possible, run the job in debug mode or step through the code to identify the exact point where the error occurs.
    • If you have a Databricks support plan, consider reaching out to their support team for assistance.

Good luck, and I hope you find a solution soon! 😊1234

 

smukhi
New Contributor II

As advised, I double confirmed that no code or cluster configuration was changed (even got a second set of eyes on it that confirmed the same).

I was able to find a "fix" which puts a bandaid on the issue:

I was able to pinpoint that the issue seems to only occur when I checkpoint the DataFrame using "df.checkpoint(eager=True)". After replacing this checkpoint with a write to S3 using "df.write.parquet(...)", followed by a read back from the same location "df = spark.read.parquet(...)", I'm able to circumvent the above error message while still creating a "checkpoint" without using the df.checkpoint method.

I'm still unsure how the Unity Catalog service pertains to the df.checkpoint method, or what may have changed last Friday with this functionality, as I'm unable to read the proprietary Databricks modules that appear in the stack trace. It may still be worth an investigation on your end!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group