Databricks Community

Alex006 · ‎12-06-2022

I would like to publish data from from delta live table (DLT) to an Azure ADLS Gen2 storage. Checked the documentation but cannot get it to work. Any insights on how to do so?

Invalid configuration value detected for fs.azure.account.key Storage location I've tried to specify the "Storage location" with many combinations of abfs://root@storageaccountname.dfs.core.windows.net/dev/delta_live_tables/ and also abfss://root@storageaccountname.dfs.core.windows.net/dev/delta_live_tables/ without any success.

Only succeeded to write to hive_metastore and dbfs so far. But I want to write to an external location (ADLS Gen2) so the tables can be referenced by Unity Catalog.

Rishabh-Pandey · ‎12-06-2022

if you are using abfs protocol , you need to configure some more configurations to read file from abfs protocol , so firstly mount a databricks worspace with that location then you can use abfs protocol i guess ,

Rishabh Pandey

Alex006 · ‎12-06-2022

I can use write and read from ADLS Gen2 using both the abfs and abfss protocol from regular notebooks. But for some reason it is not working for DLT.

Do you know if there is a need to mount ADLS Gen2 to make it work specifically with DLT?

User16752244991 · ‎12-06-2022

Mounting an ADLS container is no longer recommended. If you want DLT to materialize your data in ADLS, you need to do two things:

In DLT Pipeline settings, configure ADLS credentials using either SAS token or Service Principal. See here for more details. If you are using SAS token, we also recommed using Secrets to store the key. See here for a concrete example.
Specify ADLS location as your DLT pipeline storage path. e.g. abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/my_folder/

Hubert-Dudek · ‎12-07-2022

In the settings of DLT you need to go to Compute -> advanced and aff configuration (of course, each key needs a value required in your case):

fs.azure.account.auth.type OAuth

fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

fs.azure.account.oauth2.client.id {{secrets/scope/databricks-clientid}}

fs.azure.account.oauth2.client.secret {{secrets/scope/databricks-secret}}

fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<YOUR_APP_ID>/oauth2/token

if you have the problem you can prefix all keys with spark_conf.spark.hadoop

Alex006 · ‎12-09-2022

Great, will try this. Thanks a lot!

Alex006 · ‎12-20-2022

Finally got it to work and here's what we did:

Pre-conditions:

Service Principal created
Secret created in Azure KeyVault
Scope & Secret created in Databricks (Append "#secrets/createScope" in the url to get the databricks page where you create secrets)

Note:

Only one DLT pipeline per external storage location
The keys, related to ADLS access, are prefixed with "spark.hadoop."
You won't get lineage in Unity Catalog when using DLT, however, you see the lineage in DLT instead.

DLT Pipeline settings - JSON (example of what worked, substitute for your own parameter values)

{

"clusters": [

{

"label": "default",

"autoscale": {

"min_workers": 1,

"max_workers": 5,

"mode": "ENHANCED"

}

},

{

"label": "maintenance"

}

],

"development": true,

"continuous": false,

"channel": "PREVIEW",

"edition": "ADVANCED",

"photon": false,

"libraries": [

{

"notebook": {

"path": "/Repos/repo_folder/delta-live-tables-notebooks/sql/Retail Sales"

}

],

"name": "Retail Sales SQL External Tables",

"storage": "abfss://<container>@<azure-storage-account>.dfs.core.windows.net/delta_live_tables/",

"configuration": {

"spark.hadoop.fs.azure.account.auth.type": "OAuth",

"spark.hadoop.fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",

"spark.hadoop.fs.azure.account.oauth2.client.id": "<application-id>",

"spark.hadoop.fs.azure.account.oauth2.client.secret": "{{secrets/<scope-name-in-databricks>/<secret-name-in-azure-keyvault>}}",

"spark.hadoop.fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/"<directory-id>"/oauth2/token"

}

Unity Catalog - Register a new External Table

CREATE TABLE IF NOT EXISTS customers

LOCATION 'abfss://<container-name>@<-storage-account-name>.dfs.core.windows.net/-folder/tables/customers/'

Summary - How to get DLT & Unity Catalog to Work Together

This way you can get Delta Live Tables (DLT) to work together with Unity Catalog by referring to external tables.

evogelpohl · ‎02-25-2023

Thanks for the write up. Worked for me. Question. In my case, I let DLT write its tables to an abfss://data@... storage path using your config pattern above. It successfully writes the tables, but when I view the tables in hive_metastore in data explorer, I get the fs.configuration error. If I manually run the oauth routine to connect to that account, it connects, but I still can't view the data in data-explorer. However, I can create an external table in Unity just fine (as it's using the unity storage credentials).

guostong · ‎05-01-2023

I am trying to load from ADLS with DLT, after add the spark configurations I got below errors:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 818323fc-80d5-4833-9f46-7d1afc9c5bf7, runId = 722e9aac-0fdd-4206-9d49-683bb151f0bf] terminated with exception: The container in the file event `{"backfill":{"bucket":"root@dbstoragelhdp7mflfxe2y","key":"5810201264315799/Data/Temp/test_account.csv","size":1801,"eventTime":1682522202000,"newerThan$default$2":false}}` is different from expected by the source: `my_container@my_storageaccount`.

any sequestions?

Thanks,