Databricks Community

christian_chong · ‎06-21-2024

Hi everbody,

I am facing a issue with spark structured steaming.

here is a sample of my code:

df = spark.readStream.load(f"{bronze_table_path}") df.writeStream \ .format("delta") \ .option("checkpointLocation", f"{silver_checkpoint}") \ .option("mergeSchema", "true") \ .trigger(availableNow=True) \ .outputMode("append") \ .start(path=f"{silver_table_path}")<div> The code above work pretty well<div><p>But if i add column masking to silver table, and rerun the notebook i get the following error <li-code lang="markup">Exception: Exception in quality process: [RequestId=xxxx-exxx-xxxx-adf4-86b9b7e82252 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path gs://table overlaps with other external tables or volumes. Conflicting tables/volumes: xxx.xxx.table, xxx.xxx.another_table JVM stacktrace: com.databricks.sql.managedcatalog.UnityCatalogServiceException at com.databricks.managedcatalog.TypeConversionUtils$.toUnityCatalogDeniedException(TypeConversionUtils.scala:2224) at com.databricks.managedcatalog.TypeConversionUtils$.toCatalyst(TypeConversionUtils.scala:2181) at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$checkPathAccess$1(ManagedCatalogClientImpl.scala:4088) at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:4555) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:4554) at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:26) at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:24) at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:158) at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:4551) at com.databricks.managedcatalog.ManagedCatalogClientImpl.checkPathAccess(ManagedCatalogClientImpl.scala:4064) at com.databricks.sql.managedcatalog.ManagedCatalogCommon.checkPathAccess(ManagedCatalogCommon.scala:1974) at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$checkPathAccess$1(ProfiledManagedCatalog.scala:633) at org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:714) at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$profile$1(ProfiledManagedCatalog.scala:62) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.profile(ProfiledManagedCatalog.scala:61) at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.checkPathAccess(ProfiledManagedCatalog.scala:633) at com.databricks.unity.CredentialScopeSQLHelper$.registerShortestParentPath(CredentialScopeSQLHelper.scala:299) at com.databricks.unity.CredentialScopeSQLHelper$.register(CredentialScopeSQLHelper.scala:195) at com.databricks.unity.CredentialScopeSQLHelper$.registerPathAccess(CredentialScopeSQLHelper.scala:638) at org.apache.spark.sql.streaming.DataStreamUtils$.$anonfun$registerSinkPathInUC$1(DataStreamUtils.scala:267) at org.apache.spark.sql.streaming.DataStreamUtils$.$anonfun$registerSinkPathInUC$1$adapted(DataStreamUtils.scala:266) at scala.Option.foreach(Option.scala:407) at org.apache.spark.sql.streaming.DataStreamUtils$.registerSinkPathInUC(DataStreamUtils.scala:266) at org.apache.spark.sql.streaming.DataStreamWriter.startInternal(DataStreamWriter.scala:478) at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:256) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleWriteStreamOperationStart(SparkConnectPlanner.scala:3218) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:2697) at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.handleCommand(ExecuteThreadRunner.scala:285) at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:229) at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:167) at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:332) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175) at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:332) at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:97) at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:84) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:234) at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:83) at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:331) at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:167) at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:118) at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.$anonfun$run$1(ExecuteThreadRunner.scala:349) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107) at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:348)

all table are external table and schema are managed schema

It is a known limitation of column masking ?

Thanks you

christian_chong · ‎06-21-2024

My first message was not well formatted.

i wrote :

df = spark.readStream.load(f"{bronze_table_path}") 
df.writeStream \ 
.format("delta") \ 
.option("checkpointLocation", f"{silver_checkpoint}") \ 
.option("mergeSchema", "true") \ 
.trigger(availableNow=True) \ 
.outputMode("append") \ 
.start(path=f"{silver_table_path}")

But if i add column masking to silver table, and rerun the notebook i get the following error ...

View solution in original post

christian_chong · ‎06-21-2024

My first message was not well formatted.

i wrote :

df = spark.readStream.load(f"{bronze_table_path}") 
df.writeStream \ 
.format("delta") \ 
.option("checkpointLocation", f"{silver_checkpoint}") \ 
.option("mergeSchema", "true") \ 
.trigger(availableNow=True) \ 
.outputMode("append") \ 
.start(path=f"{silver_table_path}")

But if i add column masking to silver table, and rerun the notebook i get the following error ...

Databricks Community

unity catalog with external table and column masking

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences