cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL Warehouse Serverless - Not able to access the external tables in the hive_metastore

Fz1
New Contributor III

I have DLT tables created under the hive_metastore with external data stored in ADL gen2.

The ADL blob storage is mounted into /mnt/<storage-account>

The tables are successfully created and accessible from my notebooks, as well the ADL storage.

I have configured the `Data Access Configuration` in the Admin Settings with my service principle and this service principle was added to my storage.

I have created a SQL Warehouse with the SERVERLESS option and trying to query the tables from the SQL editor `select * from hive_metastore.<schema>.<table_name>;` , but always getting this error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: This request is not authorized to perform this operation.

I have created another SQL Warehouse, but this time with the PRO option, without changing anything else, and the same above query seems to be working and returning needed data.

Can you please advise what could be the issue, is serverless type not supporting external tables in the hive?

Please note that the unity catalog is enabled on my workspace but not used in my case here (and tried to switch it on/off in my Warehouse and got the same results as above for both types)

PS: If this is helpful, databricks is installed on Azure within a private network

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

If you have firewall configured for the ADLS Gen2 Storage account which blocks public access, You will need to configure network rules to allow access for subnets containing the compute resources for DBX SQL Serverless in your workspace region.

Detailed explanation can found here. And the Databricks SQL Serverless subnets for your region can be found here.

If you need an automated network rule creation utility, you can use this PowerShell script.

param(
    [Parameter(Mandatory)]
    [string]
    $SubnetFile,
    [Parameter(Mandatory)]
    [string]
    $AccountName,
    [Parameter(Mandatory)]
    [string]
    $ResourceGroup
)

[string[]]$subnet_file = Join-Path -Path $PWD.Path -ChildPath $SubnetFile
$subnetFromFile = Get-Content -Path $subnet_file
foreach($subnet in $subnetFromFile) {
   az storage account network-rule add -g $ResourceGroup --account-name $AccountName  --subnet $subnet
}

Save this as configure-subnets.ps1 and in a separate file copy the list of the subnets. You could then run it like:

./configure-subnets.ps1 -AccountName "<storage-account-name>" -ResourceGroup "<resource-group-name>" -SubnetFile subnet_list.txt

View solution in original post

6 REPLIES 6

Anonymous
Not applicable

Sql Serverless works on external tables.  We only use external tables with serverless.
But no Unity (yet) and not installed on a private network.

I think the latter could be the issue as with the serverless option, the actual VMs are not installed within your own account/resource group, where classic warehouses are deployed within your resource group and so also belong to your private network.

Anonymous
Not applicable

If you have firewall configured for the ADLS Gen2 Storage account which blocks public access, You will need to configure network rules to allow access for subnets containing the compute resources for DBX SQL Serverless in your workspace region.

Detailed explanation can found here. And the Databricks SQL Serverless subnets for your region can be found here.

If you need an automated network rule creation utility, you can use this PowerShell script.

param(
    [Parameter(Mandatory)]
    [string]
    $SubnetFile,
    [Parameter(Mandatory)]
    [string]
    $AccountName,
    [Parameter(Mandatory)]
    [string]
    $ResourceGroup
)

[string[]]$subnet_file = Join-Path -Path $PWD.Path -ChildPath $SubnetFile
$subnetFromFile = Get-Content -Path $subnet_file
foreach($subnet in $subnetFromFile) {
   az storage account network-rule add -g $ResourceGroup --account-name $AccountName  --subnet $subnet
}

Save this as configure-subnets.ps1 and in a separate file copy the list of the subnets. You could then run it like:

./configure-subnets.ps1 -AccountName "<storage-account-name>" -ResourceGroup "<resource-group-name>" -SubnetFile subnet_list.txt

The storage account I want to connect to is not under my management.
Would a private endpoint result in the same?

Fz1
New Contributor III

Thank you @Anonymous @Anonymous for your replies.

@Anonymous I will check the details shared and will come back once I have an answer. Thank you again.

Fz1
New Contributor III

@Anonymous the provided link concerning the Databricks SQL Serverless subnets works.

Thank you again!!

TjommeV-Vlaio
New Contributor III

Can this be done using Terraform as well?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group