how to load structured stream data into delta table whose location is in ADLS Gen2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2023 08:55 AM
Hi All,
I am working on a streaming data processing. As a intial step i have read the data from azure eventhub using readstream. now i want to writestream this into a delta table.
My requirement is, The data should present in external location (adls gen2) and the table should be available in my metastore.
When i tried the below code
Code_Snippet:
ext_table_location = "adls path"
autoloader_df.writeStream \
.format("delta") \
.outputMode("append") \
.option("checkpointLocation", checkpoint_directory) \
.option("mergeSchema", "true") \
.option("path",ext_table_location) \
.table(ext_table_location)
It is failing. Is there any standard approach for streaming data for this kind of scenario?
Thanks in Advance!
- Labels:
-
Data Processing
-
Delta table
-
Stream Data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2023 09:29 AM
There are a couple ways to connect to ADLS Gen2. Please refer to below doc. For instance, if you decide to go by service principal method, you need to add below storage account configurations details to the cluster or notebooks. Same goes for storage for SAS token and storage account keys.
service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage
https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2023 08:24 PM
Connection is fine, I need to know how to write the stream data to adls gen2 path and the same time have the delta table registered to the metastore as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-16-2023 01:08 AM
can you try to use <database>.<tablename> for the .table option instead of a path?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-16-2023 02:37 AM
Hi @werners, i tried that option as well. But it will create a managed table. But i want a external table. So now i created a external table prior to the streaming part.
What i understand from the research i did is, we cannot write to external table in writestream query.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-16-2023 02:55 AM
If you write in delta format to a path, and create an unmanaged table on that path, that should work.

