cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

databricks structured streaming external table unity catalog

christian_chong
New Contributor III

Hi everbody,

I am facing a issue with spark structured steaming, with databricks on gcp. 

I use external table type but schema are managed schema. 

The code below in my notebook works very well

But if i add column masking to target table, and rerun the notebook i get an error :

"
[RequestId=xxxx-exxx-xxxx-adf4-86b9b7e82252 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path gs://table overlaps with other external tables or volumes. Conflicting tables/volumes: xxx.xxx.table, xxx.xxx.another_table
"

here is a sample of my code

"
df = spark.readStream.load(f"{bronze_table_path}")

df.writeStream \
.format("delta") \
.option("checkpointLocation", f"{silver_checkpoint}") \
.option("mergeSchema", "true") \
.trigger(availableNow=True) \
.outputMode("append") \
.start(path=f"{silver_table_path}")
"


Thanks you

 

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @christian_chong, The error message youโ€™re encountering indicates that the input path gs://table overlaps with other external tables or volumes.

Specifically, it mentions conflicting tables or volumes: xxx.xxx.table and xxx.xxx.another_table. You mentioned that the issue arises when you add column masking to the target table. Column masking is a security feature that restricts access to specific columns based on user roles or policies. Itโ€™s essential to ensure that your masking configuration doesnโ€™t interfere with the table paths.

Here are some steps to troubleshoot and resolve the issue:

  • Check External Table Paths: Verify that the paths for your external tables (gs://table, xxx.xxx.table, and xxx.xxx.another_table) do not overlap. If they do, consider renaming or reorganizing the tables to avoid conflicts.

  • Review Column Masking: Double-check your column masking configuration. Ensure that it doesnโ€™t inadvertently affect the table paths or cause conflicts.

  • Delta Lake and Structured Streaming: Databricks recommends using Delta Live Tables for most incremental and streaming workloads. Delta Live Tables leverage Delta tables and Structured Streaming. If youโ€™re not already using it, consider exploring this approach1.

  • Checkpoint Location: Confirm that the checkpoint location (silver_checkpoint) is correctly set. Itโ€™s crucial for maintaining the streaming state and ensuring fault tolerance.

  1. Additional Resources:

Feel free to provide more context or ask further questions if needed! ๐Ÿ˜Š

 

Is your answer generated by llm ? 

I reply to each of your point below :

"Itโ€™s essential to ensure that your masking configuration doesnโ€™t interfere with the table paths."

=> how to ensure that masking configuration doesn't interfere with table path ? 

"Check External Table Paths: Verify that the paths for your external tables (gs://table, xxx.xxx.table, and xxx.xxx.another_table) do not overlap. If they do, consider renaming or reorganizing the tables to avoid conflicts."

=> To my point of view the path doesn't overlap between them. But what does it mean exactly overlap. they are more sibling in the directory structure

"Review Column Masking: Double-check your column masking configuration. Ensure that it doesnโ€™t inadvertently affect the table paths or cause conflicts."

=> How to check that ?

"Delta Lake and Structured Streaming: Databricks recommends using Delta Live Tables for most incremental and streaming workloads. Delta Live Tables leverage Delta tables and Structured Streaming. If youโ€™re not already using it, consider exploring this approach1."

=> to my knowledge there is no connector with dlt and google pub sub. So i need to use structured streaming

"Checkpoint Location: Confirm that the checkpoint location (silver_checkpoint) is correctly set. Itโ€™s crucial for maintaining the streaming state and ensuring fault tolerance."

=>yes, checkpointing is well set

Thanks you in advance

@Kaniz_Fatma did you see my last answer ? 

Thnak you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group