cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

databricks structured streaming external table unity catalog

christian_chong
New Contributor III

Hi everbody,

I am facing a issue with spark structured steaming, with databricks on gcp. 

I use external table type but schema are managed schema. 

The code below in my notebook works very well

But if i add column masking to target table, and rerun the notebook i get an error :

"
[RequestId=xxxx-exxx-xxxx-adf4-86b9b7e82252 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path gs://table overlaps with other external tables or volumes. Conflicting tables/volumes: xxx.xxx.table, xxx.xxx.another_table
"

here is a sample of my code

"
df = spark.readStream.load(f"{bronze_table_path}")

df.writeStream \
.format("delta") \
.option("checkpointLocation", f"{silver_checkpoint}") \
.option("mergeSchema", "true") \
.trigger(availableNow=True) \
.outputMode("append") \
.start(path=f"{silver_table_path}")
"


Thanks you

 

2 REPLIES 2

Is your answer generated by llm ? 

I reply to each of your point below :

"It’s essential to ensure that your masking configuration doesn’t interfere with the table paths."

=> how to ensure that masking configuration doesn't interfere with table path ? 

"Check External Table Paths: Verify that the paths for your external tables (gs://table, xxx.xxx.table, and xxx.xxx.another_table) do not overlap. If they do, consider renaming or reorganizing the tables to avoid conflicts."

=> To my point of view the path doesn't overlap between them. But what does it mean exactly overlap. they are more sibling in the directory structure

"Review Column Masking: Double-check your column masking configuration. Ensure that it doesn’t inadvertently affect the table paths or cause conflicts."

=> How to check that ?

"Delta Lake and Structured Streaming: Databricks recommends using Delta Live Tables for most incremental and streaming workloads. Delta Live Tables leverage Delta tables and Structured Streaming. If you’re not already using it, consider exploring this approach1."

=> to my knowledge there is no connector with dlt and google pub sub. So i need to use structured streaming

"Checkpoint Location: Confirm that the checkpoint location (silver_checkpoint) is correctly set. It’s crucial for maintaining the streaming state and ensuring fault tolerance."

=>yes, checkpointing is well set

Thanks you in advance

@Retired_mod did you see my last answer ? 

Thnak you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group