07-13-2023 08:07 AM
Hi all,
I hope you could help me to figure out what I am missing.
I'm trying to do a simple thing. To read the data from the data ingestion zone (csv files saved to Azure Storage Account) using the Delta Live Tables pipeline and share the resulting table to another Databricks workspace using Delta Sharing.
Here is the code that describes the DLT pipeline.
import dlt
from pyspark.sql.functions import *
from pyspark.sql.types import *
raw_path = "/mnt/ingestion/sensors-readings"
@dlt.table(
comment = "Contains data received from sensors API"
)
def sensors_raw():
# Auto-loader to load newly ingested files only.
df = spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "csv") \
.option("header",True) \
.load(raw_path)
return (df)
It runs successfully and the table is added to the target schema.
But when I try to create a share the table is not displayed.
The above tables available for sharing are created from the notebook as df.write.saveAsTable().
When reading available documentation I've seen that the STREAMING_TABLE can't be shared via Delta Sharing.
Maybe I'm missing some settings? It would be great if you could help me to figure it out.
Thanks.
07-17-2023 02:39 AM
After reading the documentation carefully I've found that it's mentioned in the current list of DLT and Unity Catalog limitations.
07-18-2023 10:04 AM
@vkuznetsov - As a workaround, can you please try converting the streaming table to a regular table using a stand-alone/periodic job and use it for Delta sharing?
spark.readStream.table("<streaming-table>")
.writeStream
.option("checkpointLocation", "dbfs:/checkpoints/checkpoint_bar_1")
.partitionBy("<partition-column>")
.toTable("<delta-table>")
07-20-2023 12:02 AM
Hi @vkuznetsov
We appreciate your question being posted in our community! It brings us joy to offer our assistance.
In order to ensure we provide you with the most precise information, kindly take a moment to review the responses and choose the one that best addresses your query.
Doing so will also benefit other community members who might have similar questions later on. We thank you for participating and please don't hesitate to reach out if you require any further help!
07-20-2023 04:12 AM
Hi @shan_chandra,
In the end, I did it in the way you proposed. Unfortunately, you must do it outside the Delta Live Tables workflow, and you cannot use all its benefits.
It's strange to me, as DLT and Delta Sharing are powerful tools that can't be combined out of the box. Or I'm missing something since I'm new to DLT and Delta Sharing.
03-08-2024 10:42 AM
I'm curious if Databricks plans to address this. We use delta live streaming tables extensively and also planned on using delta sharing to get our data from our production unity catalog (different region). Duplicating the data as a workaround is not practical for huge tables. What is Databricks recommended practice for getting data across to other databricks workspaces across regions?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group