12-23-2024 05:47 AM
Hi All,
Recently we have implemented the change to make databricks workspace accessible only via a private network. After this change, we found lot of errors on connectivity like from Power BI to Databricks, Azure Data factory to Databricks etc.
I was able to resolve above mentioned issues whereas currently I am stuck with some other issue whose root cause looks like the same implementation.
The Error I get is:
run failed with error message Library installation failed for library due to user error for whl: "dbfs:/Volumes/any.whl" Error messages: Library installation attempted on the driver node of cluster XXXX and failed. Please refer to the following error message to fix the library or contact Databricks support. Error code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error message: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: 403: Unauthorized access to workspace: 12345543234
I have executed this job for 200 times and out of which it failed with above error on 13 instances. So, it is clearly a temporary issue happenning sometimes.
Does someone have faced similar issue or has any idea how to proceed with this?
12-23-2024 06:08 AM
Hi @Uj337,
Couple of questions:
How are you installing the library? I assume through cluster libraries?
Are you the only using starting the cluster or other users as well?
What DBR version are you using, also is a single access mode or shared cluster?
Based on the error it does look that wheel package is downloaded from a Volume, right? are any packages being downloaded from internet?
12-23-2024 06:44 AM
How are you installing the library? I assume through cluster libraries?
Its a job cluster created from ADF pipeline. Within ADF activity, we are passing this wheel library as DBFS URI.
Are you the only using starting the cluster or other users as well? Yes
What DBR version are you using (14.3 LTS), also is a single access mode or shared cluster (single)?
Based on the error it does look that wheel package is downloaded from a Volume, right? YES.
Are any packages being downloaded from internet? - No.
12-23-2024 08:17 AM
@Uj337 The relevant part is the "org.apache.spark.sql.AnalysisException: 403: Unauthorized access to workspace: 12345543234". Anything from the Driver logs correlated to this analysis exception? Does it probably come with a full stacktrace?
12-23-2024 09:19 AM
24/12/23 11:22:33 INFO DriverCorral: [Thread 161] AttachLibraries - candidate libraries: List(dbfs:/Volumes/external_location/any.whl)
24/12/23 11:22:33 INFO DriverCorral: [Thread 161] AttachLibraries - new libraries to install (including resolved dependencies): List(dbfs:/Volumes/external_location/any.whl)
24/12/23 11:22:33 INFO SharedDriverContext: [Thread 161] attachLibrariesToSpark dbfs:/Volumes/external_location/any.whl
24/12/23 11:22:33 INFO SharedDriverContext: Attaching Python lib: dbfs:/Volumes/external_location/any.whl to clusterwide nfs path
24/12/23 11:22:33 INFO DriverConf: Configured feature flag data source LaunchDarkly
24/12/23 11:22:33 WARN DriverConf: REGION environment variable is not defined. getConfForCurrentRegion will always return default value
24/12/23 11:22:33 INFO LibraryDownloadManager: Downloading a library that was not in the cache: dbfs:/Volumes/external_location/any.whl
24/12/23 11:22:33 INFO LibraryDownloadManager: Attempt 1: wait until library dbfs:/Volumes/external_location/any.whl is downloaded
24/12/23 11:22:33 INFO LibraryDownloadManager: Preparing to download library file from UC Volume path: dbfs:/Volumes/external_location/any.whl
24/12/23 11:22:34 INFO RDriverLocal: 9. RDriverLocal.b7abce53-a76c-4c5e-ba76-f0e5221260b6: R process started with RServe listening on port 1100.
24/12/23 11:22:34 INFO RDriverLocal: 10. RDriverLocal.b7abce53-a76c-4c5e-ba76-f0e5221260b6: starting interpreter to talk to R process ...
24/12/23 11:22:35 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
24/12/23 11:22:35 INFO ROutputStreamHandler: Successfully connected to stdout in the RShell.
24/12/23 11:22:35 INFO ROutputStreamHandler: Successfully connected to stderr in the RShell.
24/12/23 11:22:35 INFO RDriverLocal: 11. RDriverLocal.b7abce53-a76c-4c5e-ba76-f0e5221260b6: R interpreter is connected.
24/12/23 11:22:35 INFO RDriverWrapper: setupRepl:ReplId-4fde1-7f7fb-47ee1-1: finished to load
24/12/23 11:22:39 INFO LibraryDownloadManager: Attempt 2: wait until library dbfs:/Volumes/external_location/any.whl is downloaded
24/12/23 11:22:39 INFO LibraryDownloadManager: Preparing to download library file from UC Volume path: dbfs:/Volumes/external_location/any.whl
24/12/23 11:22:44 INFO LibraryDownloadManager: Attempt 3: wait until library dbfs:/Volumes/external_location/any.whl is downloaded
24/12/23 11:22:44 INFO LibraryDownloadManager: Preparing to download library file from UC Volume path: dbfs:/Volumes/external_location/any.whl
24/12/23 11:22:44 ERROR LibraryDownloadManager: Could not download dbfs:/Volumes/external_location/any.whl.
org.apache.spark.sql.AnalysisException: 403: Unauthorized access to workspace: 12345563455323
12-23-2024 01:10 PM
@Uj337 - based on the error looks like the identity making the request from your ADF pipeline is not authorized to perform the action, it could be related to a secret or token, are you generating token based on request or how is the authentication being made?
12-24-2024 03:34 AM
Adding to @Alberto_Umana questions, this seems like an ephemeral token/session issue or a race condition or maybe a network/private link glitch. I'd suggest increasing the log verbosity and capture detailed logs to compare successful vs failed runs, ensure that the service principal or job user is consistently recognized. If this still doesn't help with isolating the root cause, please raise a support ticket with out Databricks Support team for deeper analysis.
4 weeks ago
Hi everyone,
This issue appeared for couple of times but later it got disappeared. So, I didn't have to do anything 😐
4 weeks ago
Hi @Uj337,
How are you doing today?
This issue seems to be tied to the private network setup affecting access to the .whl file on DBFS. i recommend you to start by ensuring the driver node has proper access to the dbfs:/Volumes/any.whl path and that all permissions are correctly configured. If this doesn’t resolve it, consider hosting the .whl file in an accessible cloud storage location, such as Azure Blob or S3, and update your job to reference it there. Additionally, i recommend you to check your network and cluster settings to ensure they allow access to the storage location and that the driver node isn't resource-constrained. Since this is an intermittent issue, you can implement a retry mechanism for your job to handle these temporary failures. If the problem persists, reaching out to Databricks support with the error code and cluster details might help identify a solution.
Give a try and let me know for any queries.
Thanks,
Brahma
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group