cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

how to load structured stream data into delta table whose location is in ADLS Gen2

Harun
Honored Contributor

Hi All,

I am working on a streaming data processing. As a intial step i have read the data from azure eventhub using readstream. now i want to writestream this into a delta table.

My requirement is, The data should present in external location (adls gen2) and the table should be available in my metastore.

When i tried the below code

Code_Snippet:

ext_table_location = "adls path"

autoloader_df.writeStream \

       .format("delta") \

       .outputMode("append") \

       .option("checkpointLocation", checkpoint_directory) \

       .option("mergeSchema", "true") \

       .option("path",ext_table_location) \

       .table(ext_table_location)

It is failing. Is there any standard approach for streaming data for this kind of scenario?

Thanks in Advance!

6 REPLIES 6

Anonymous
Not applicable

There are a couple ways to connect to ADLS Gen2. Please refer to below doc. For instance, if you decide to go by service principal method, you need to add below storage account configurations details to the cluster or notebooks. Same goes for storage for SAS token and storage account keys.

service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")

spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")

spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)

spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage

https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage

Harun
Honored Contributor

Connection is fine, I need to know how to write the stream data to adls gen2 path and the same time have the delta table registered to the metastore as well.

-werners-
Esteemed Contributor III

can you try to use <database>.<tablename> for the .table option instead of a path?

Harun
Honored Contributor

Hi @werners, i tried that option as well. But it will create a managed table. But i want a external table. So now i created a external table prior to the streaming part.

What i understand from the research i did is, we cannot write to external table in writestream query.

-werners-
Esteemed Contributor III

If you write in delta format to a path, and create an unmanaged table on that path, that should work.

Kaniz
Community Manager
Community Manager

Hi @Harun Raseed Basheer​​​, We haven't heard from you since the last response from @Werner Stinckens​​, and I was checking back to see if his suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.