databricks structured streaming external table unity catalog
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ06-24-2024 01:42 AM
Hi everbody,
I am facing a issue with spark structured steaming, with databricks on gcp.
I use external table type but schema are managed schema.
The code below in my notebook works very well
But if i add column masking to target table, and rerun the notebook i get an error :
"
[RequestId=xxxx-exxx-xxxx-adf4-86b9b7e82252 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path gs://table overlaps with other external tables or volumes. Conflicting tables/volumes: xxx.xxx.table, xxx.xxx.another_table
"
here is a sample of my code
"
df = spark.readStream.load(f"{bronze_table_path}")
df.writeStream \
.format("delta") \
.option("checkpointLocation", f"{silver_checkpoint}") \
.option("mergeSchema", "true") \
.trigger(availableNow=True) \
.outputMode("append") \
.start(path=f"{silver_table_path}")
"
Thanks you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ06-26-2024 05:38 AM
Is your answer generated by llm ?
I reply to each of your point below :
"Itโs essential to ensure that your masking configuration doesnโt interfere with the table paths."
=> how to ensure that masking configuration doesn't interfere with table path ?
"Check External Table Paths: Verify that the paths for your external tables (gs://table, xxx.xxx.table, and xxx.xxx.another_table) do not overlap. If they do, consider renaming or reorganizing the tables to avoid conflicts."
=> To my point of view the path doesn't overlap between them. But what does it mean exactly overlap. they are more sibling in the directory structure
"Review Column Masking: Double-check your column masking configuration. Ensure that it doesnโt inadvertently affect the table paths or cause conflicts."
=> How to check that ?
"Delta Lake and Structured Streaming: Databricks recommends using Delta Live Tables for most incremental and streaming workloads. Delta Live Tables leverage Delta tables and Structured Streaming. If youโre not already using it, consider exploring this approach1."
=> to my knowledge there is no connector with dlt and google pub sub. So i need to use structured streaming
"Checkpoint Location: Confirm that the checkpoint location (silver_checkpoint) is correctly set. Itโs crucial for maintaining the streaming state and ensuring fault tolerance."
=>yes, checkpointing is well set
Thanks you in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ07-05-2024 07:09 AM
@Retired_mod did you see my last answer ?
Thnak you!

