cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

how to load structured stream data into delta table whose location is in ADLS Gen2

Harun
Honored Contributor

Hi All,

I am working on a streaming data processing. As a intial step i have read the data from azure eventhub using readstream. now i want to writestream this into a delta table.

My requirement is, The data should present in external location (adls gen2) and the table should be available in my metastore.

When i tried the below code

Code_Snippet:

ext_table_location = "adls path"

autoloader_df.writeStream \

       .format("delta") \

       .outputMode("append") \

       .option("checkpointLocation", checkpoint_directory) \

       .option("mergeSchema", "true") \

       .option("path",ext_table_location) \

       .table(ext_table_location)

It is failing. Is there any standard approach for streaming data for this kind of scenario?

Thanks in Advance!

5 REPLIES 5

Anonymous
Not applicable

There are a couple ways to connect to ADLS Gen2. Please refer to below doc. For instance, if you decide to go by service principal method, you need to add below storage account configurations details to the cluster or notebooks. Same goes for storage for SAS token and storage account keys.

service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")

spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")

spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)

spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage

https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage

Harun
Honored Contributor

Connection is fine, I need to know how to write the stream data to adls gen2 path and the same time have the delta table registered to the metastore as well.

-werners-
Esteemed Contributor III

can you try to use <database>.<tablename> for the .table option instead of a path?

Harun
Honored Contributor

Hi @werners, i tried that option as well. But it will create a managed table. But i want a external table. So now i created a external table prior to the streaming part.

What i understand from the research i did is, we cannot write to external table in writestream query.

-werners-
Esteemed Contributor III

If you write in delta format to a path, and create an unmanaged table on that path, that should work.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group