cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

smukhi
New Contributor II

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of the job was the previous night.

In this job we started encountering the following error message:


"Py4JJavaError: An error occurred while calling o1830.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task creation failed: com.databricks.unity.error.MissingCredentialScopeException: [UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not found in thread locals.. SQLSTATE: XXKUC com.databricks.unity.error.MissingCredentialScopeException: [UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not found in thread locals.. SQLSTATE: XXKUC"

At a high level, the job takes a list of string UUIDs, then reads from a Delta table stored in Unity catalog and filters that table on matches to that UUID. It then checkpoints the DataFrame halfway through the transformations, and lastly culminates in writing data to AWS S3 before moving on to the next UUID. It's very interesting to note, that on the first loop through, there are no issues encountered in the loop and the files are successfully written to S3. However every time the job fails, it fails on the second loop, where we encounter the above error message (consistently reproducible). If I kick off a separate job for each UUID in the original list, the job succeeds (this is a sub-optimal approach due to cluster startup time). To reiterate, this worked fine yesterday and none of the code has changed.

It appears this may be a bug with something that changed with the Unity Catalog service. Anyone have any ideas?

3 REPLIES 3

smukhi
New Contributor II

As advised, I double confirmed that no code or cluster configuration was changed (even got a second set of eyes on it that confirmed the same).

I was able to find a "fix" which puts a bandaid on the issue:

I was able to pinpoint that the issue seems to only occur when I checkpoint the DataFrame using "df.checkpoint(eager=True)". After replacing this checkpoint with a write to S3 using "df.write.parquet(...)", followed by a read back from the same location "df = spark.read.parquet(...)", I'm able to circumvent the above error message while still creating a "checkpoint" without using the df.checkpoint method.

I'm still unsure how the Unity Catalog service pertains to the df.checkpoint method, or what may have changed last Friday with this functionality, as I'm unable to read the proprietary Databricks modules that appear in the stack trace. It may still be worth an investigation on your end!

johnmacnamara
New Contributor

@smukhiDid you guys ever pinpoint what exactly was going on here? We just started hitting a similar issue and we're at a bit of a loss

smukhi
New Contributor II

@johnmacnamara unfortunately not, we're still using the "temporary" fix I described above where we just write out to S3 using `df.write.parquet` and then read the data back in.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group