03-02-2023 04:07 PM
I'm just getting started with Databricks and DLTs. I've followed all the docs and tutorials I can find on this and believe I have set up everything in Azure correctly: service principal, paths, and spark configs. When I run a simple DLT autoloader pipeline I get a bizarre error about Google Compute Engine (see attached). What could this mean? I'm on Azure so it's very confusing. Obviously I'm making a basic mistake but have no idea what it is. Perhaps someone else has been here and can tell me.
03-13-2023 04:51 AM
@David Bernstein : The error message you attached suggests that Databricks is trying to authenticate using a Google Cloud service account, but you are using Azure. This can happen if the GOOGLE_APPLICATION_CREDENTIALS environment variable is set on your Databricks cluster and is pointing to a Google Cloud service account JSON file. Check the environment variables set on your Databricks cluster. Go to the "Advanced" tab of your cluster configuration page and look for the "Environment Variables" section. Make sure that the GOOGLE_APPLICATION_CREDENTIALS
variable is not set or is set to a JSON file that belongs to an Azure service principal. Can you please check and confrim that this has been ruled out?
03-13-2023 01:37 PM
No that environment is not set in the advanced portion of my (very few) cluster configurations. Is it possible it gets set in some other way?
03-13-2023 01:43 PM
actually this error occurs in a delta live table pipeline and I don't know what the cluster configuration is for that.
03-13-2023 01:49 PM
Now I have added some code to set spark configs, oauth2.client.id and all that.
This changed the error message to what is attached below. This error message is frustratingly obtuse. What request? What operation? What permission?
org.apache.spark.sql.streaming.StreamingQueryException: Query sandbox_autoloader [id = eb65bca3-7d32-4392-a6ff-f187803322b4, runId = a5031fb6-5e6d-4071-8e47-e9d4d810994e] terminated with exception: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://pallidustoolsupload.dfs.core.windows.net/albany/Sandbox/csv?upn=false&action=getStatus&timeo...
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:382)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:251)
04-01-2023 10:37 PM
@David Bernstein :
The error message "This request is not authorized to perform this operation using this permission" suggests that the credentials you provided in the spark configs do not have the necessary permissions to access the Azure storage account where you are trying to load the data.
You may need to check the following:
It may also be helpful to check the Azure portal logs to see if there are any additional error messages or information about the failed request.
04-11-2023 03:55 PM
Thank you, I will look into this.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group