cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

databricks structured streaming external table unity catalog

christian_chong
New Contributor III

Hi everbody,

I am facing a issue with spark structured steaming, with databricks on gcp. 

I use external table type but schema are managed schema. 

The code below in my notebook works very well

But if i add column masking to target table, and rerun the notebook i get an error :

"
[RequestId=xxxx-exxx-xxxx-adf4-86b9b7e82252 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path gs://table overlaps with other external tables or volumes. Conflicting tables/volumes: xxx.xxx.table, xxx.xxx.another_table
"

here is a sample of my code

"
df = spark.readStream.load(f"{bronze_table_path}")

df.writeStream \
.format("delta") \
.option("checkpointLocation", f"{silver_checkpoint}") \
.option("mergeSchema", "true") \
.trigger(availableNow=True) \
.outputMode("append") \
.start(path=f"{silver_table_path}")
"


Thanks you

 

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @christian_chong, The error message you’re encountering indicates that the input path gs://table overlaps with other external tables or volumes.

Specifically, it mentions conflicting tables or volumes: xxx.xxx.table and xxx.xxx.another_table. You mentioned that the issue arises when you add column masking to the target table. Column masking is a security feature that restricts access to specific columns based on user roles or policies. It’s essential to ensure that your masking configuration doesn’t interfere with the table paths.

Here are some steps to troubleshoot and resolve the issue:

  • Check External Table Paths: Verify that the paths for your external tables (gs://table, xxx.xxx.table, and xxx.xxx.another_table) do not overlap. If they do, consider renaming or reorganizing the tables to avoid conflicts.

  • Review Column Masking: Double-check your column masking configuration. Ensure that it doesn’t inadvertently affect the table paths or cause conflicts.

  • Delta Lake and Structured Streaming: Databricks recommends using Delta Live Tables for most incremental and streaming workloads. Delta Live Tables leverage Delta tables and Structured Streaming. If you’re not already using it, consider exploring this approach1.

  • Checkpoint Location: Confirm that the checkpoint location (silver_checkpoint) is correctly set. It’s crucial for maintaining the streaming state and ensuring fault tolerance.

  1. Additional Resources:

Feel free to provide more context or ask further questions if needed! 😊

 

Is your answer generated by llm ? 

I reply to each of your point below :

"It’s essential to ensure that your masking configuration doesn’t interfere with the table paths."

=> how to ensure that masking configuration doesn't interfere with table path ? 

"Check External Table Paths: Verify that the paths for your external tables (gs://table, xxx.xxx.table, and xxx.xxx.another_table) do not overlap. If they do, consider renaming or reorganizing the tables to avoid conflicts."

=> To my point of view the path doesn't overlap between them. But what does it mean exactly overlap. they are more sibling in the directory structure

"Review Column Masking: Double-check your column masking configuration. Ensure that it doesn’t inadvertently affect the table paths or cause conflicts."

=> How to check that ?

"Delta Lake and Structured Streaming: Databricks recommends using Delta Live Tables for most incremental and streaming workloads. Delta Live Tables leverage Delta tables and Structured Streaming. If you’re not already using it, consider exploring this approach1."

=> to my knowledge there is no connector with dlt and google pub sub. So i need to use structured streaming

"Checkpoint Location: Confirm that the checkpoint location (silver_checkpoint) is correctly set. It’s crucial for maintaining the streaming state and ensuring fault tolerance."

=>yes, checkpointing is well set

Thanks you in advance

@Kaniz_Fatma did you see my last answer ? 

Thnak you!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!