cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

dilkushpatel
New Contributor II

I see two articles on databricks documentations

https://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-python

https://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal

 

Polybase one is legacy one and is not supported anymore

Copy Into one is new one and it ideally should not need CONTROL access on whole dw

 

Here comes confusion:

looking at query structure itself I do not see any difference on how we access DW and code block looks exactly same, so how do we make sure we are using COPY option and not POLYBASE?

 

I ran code and it is still giving access issue, I gave read access on schema and table. CREATE access on DW & ADMINISTER DATABASE BULK OPERATIONS permissions as mentioned in document.

Not clear what am I missing here.

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @dilkushpatelThank you for sharing your confusion regarding PolyBase and the COPY INTO command in Databricks when working with Azure Synapse

  1. PolyBase (Legacy):

  2. COPY INTO:

  3. Access Issue:

    • Even though the query structure and code blocks appear similar for both PolyBase and COPY INTO, the underlying behaviour differs.
    • To ensure you’re using COPY INTO:
    • If the issue persists, consider:
      • Checking the Azure Synapse side for any specific requirements.
      • Verifying that the file location and credentials are correctly configured.
      • Examining any additional error messages or logs.

Remember that COPY INTO is the way forward, and it should not require CONTROL access on the entire data warehouse.

Feel free to share more details or ask additional questions—I’m here to help! 😊

 

dilkushpatel
New Contributor II

Hi @Kaniz 

thanks for taking time and wiriting detailed text.

So i'm not writing exlicit copy into, I'm just trying to read from table in SQL DW, syntax looks like this

df = spark.read \
    .format("sqldw") \
    .option("host", "<server name>.sql.azuresynapse.net") \
    .option("port", "1433") \
    .option("user", "<user name>") \
    .option("password", "<password>") \
    .option("database", "<DB Name>") \
    .option("dbtable", "<schema>.<table>") \
    .option("tempDir", "abfss://<storage container>@<storage account>.dfs.core.windows.net/temp-data") \
    .option("forwardSparkAzureStorageCredentials", "true") \
    .load()
 
Before this i have blocked where i assign storage app id and scret for access from DW and Databricks
 
with this error I get when I do df.show() is
 
com.microsoft.sqlserver.jdbc.SQLServerException: User does not have permission to perform this action. [ErrorCode = 15247] [SQLState = S0001]
 
I have given Schema level read
DB level create
ADMINISTRATE BULK LOAD
INSERT AT DB level

Hi @dilkushpatel

  • Double-check the credentials (username and password) you’re using. Ensure they are correct and have the necessary privileges.
  • Confirm that the server name and port are accurate.
  • You mentioned blocking where you assign storage app ID and secret for access from both DW and Databricks.
  • Ensure that the storage app ID and secret are correctly configured and have the required permissions to access the storage container.
  • Verify that the storage container path (abfss://<storage container>@<storage account>.dfs.core.windows.net/temp-data) is accurate.
  • Confirm that the firewall rules on the Azure Synapse side allow connections from the Databricks cluster
  • Whitelist the Databricks cluster’s IP address if necessary.
  • Temporarily grant the user more permissions (e.g., CONTROL on the entire data warehouse) to see if the issue persists.
  • If it works with additional permissions, gradually reduce them to the minimum required.
  • Check the Azure Synapse logs for any additional error messages or warnings.
  • Look for specific details related to the access issue.
  • Consider using the COPY INTO command (which is the recommended method) for loading data into Azure Synapse.
  • COPY INTO is more reliable and efficient than PolyBase.
  • If you decide to switch, ensure that your Databricks Runtime version supports COPY INTO.

dilkushpatel
New Contributor II

Hi @Kaniz 

Double-check the credentials (username and password) you’re using. Ensure they are correct and have the necessary privileges.
Im able to run select using same user from SSMS

Confirm that the server name and port are accurate.
They are

You mentioned blocking where you assign storage app ID and secret for access from both DW and Databricks.
Sorry I mean code block.

Ensure that the storage app ID and secret are correctly configured and have the required permissions to access the storage container.
They are correct, I can list and read files from that location.

Verify that the storage container path (abfss://<storage container>@<storage account>.dfs.core.windows.net/temp-data) is accurate.
It is

Confirm that the firewall rules on the Azure Synapse side allow connections from the Databricks cluster
IP blocking is not configured, everyone from anywhere can connect to DW

Whitelist the Databricks cluster’s IP address if necessary.
IP blocking is not configured, everyone from anywhere can connect to DW

Temporarily grant the user more permissions (e.g., CONTROL on the entire data warehouse) to see if the issue persists.
Will try this and update

If it works with additional permissions, gradually reduce them to the minimum required.
As such other than control rest permissions look fine

Consider using the COPY INTO command (which is the recommended method) for loading data into Azure Synapse.
Im not loading anything into Synapse DW. Reading from DW

If you decide to switch, ensure that your Databricks Runtime version supports COPY INTO.
Databricks runtime version is 13.1 if I remember correctly