Databricks Community

david_bernstein · ‎03-02-2023

I'm just getting started with Databricks and DLTs. I've followed all the docs and tutorials I can find on this and believe I have set up everything in Azure correctly: service principal, paths, and spark configs. When I run a simple DLT autoloader pipeline I get a bizarre error about Google Compute Engine (see attached). What could this mean? I'm on Azure so it's very confusing. Obviously I'm making a basic mistake but have no idea what it is. Perhaps someone else has been here and can tell me.

Anonymous · ‎03-13-2023

@David Bernstein : The error message you attached suggests that Databricks is trying to authenticate using a Google Cloud service account, but you are using Azure. This can happen if the GOOGLE_APPLICATION_CREDENTIALS environment variable is set on your Databricks cluster and is pointing to a Google Cloud service account JSON file. Check the environment variables set on your Databricks cluster. Go to the "Advanced" tab of your cluster configuration page and look for the "Environment Variables" section. Make sure that the GOOGLE_APPLICATION_CREDENTIALS

variable is not set or is set to a JSON file that belongs to an Azure service principal. Can you please check and confrim that this has been ruled out?

david_bernstein · ‎03-13-2023

No that environment is not set in the advanced portion of my (very few) cluster configurations. Is it possible it gets set in some other way?

david_bernstein · ‎03-13-2023

actually this error occurs in a delta live table pipeline and I don't know what the cluster configuration is for that.

david_bernstein · ‎03-13-2023

Now I have added some code to set spark configs, oauth2.client.id and all that.

This changed the error message to what is attached below. This error message is frustratingly obtuse. What request? What operation? What permission?

org.apache.spark.sql.streaming.StreamingQueryException: Query sandbox_autoloader [id = eb65bca3-7d32-4392-a6ff-f187803322b4, runId = a5031fb6-5e6d-4071-8e47-e9d4d810994e] terminated with exception: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://pallidustoolsupload.dfs.core.windows.net/albany/Sandbox/csv?upn=false&action=getStatus&timeo...

at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:382)

at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:251)

Anonymous · ‎04-01-2023

@David Bernstein :

The error message "This request is not authorized to perform this operation using this permission" suggests that the credentials you provided in the spark configs do not have the necessary permissions to access the Azure storage account where you are trying to load the data.

You may need to check the following:

Ensure that the service principal you created in Azure has the correct permissions to access the storage account.
Ensure that the storage account key or the OAuth 2.0 credentials used for authentication are correct and have the necessary permissions.
Check if there are any network or firewall restrictions that might be blocking access to the storage account.

It may also be helpful to check the Azure portal logs to see if there are any additional error messages or information about the failed request.