cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Data profiling monitoring with foreign catalog

sta_gas
New Contributor

Hi team,

I’m currently working with Azure Databricks and have created a foreign catalog for my source database in Azure SQL. I can successfully run SELECT statements from Databricks to the Azure SQL database.

However, I would like to set up data profiling monitoring using the Quality tab, but I’m facing limitations in terms of availability and functionality.

sta_gas_0-1760357690503.png

The table type is FOREIGN and the catalog type is FOREIGN_CATALOG.

Could you please advise on the best approach or any recommended steps to enable this feature in this catalog? I acknowledge that i can create materialize views or replicate the data into managed tables on another catalog, however I would like not to replicate all the data.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @sta_gas ,

Since data quality monitoring is in beta I'm quite sure they don't support foreign tables as of now (but they forgot to mentioned it in docs).

But more important question if they ever will be supported. For me data quality monitoring applies only to Delta Tables. According to docs description of how it works, we can see that they leverage delta properties to build this functionality. So I guess it won't work for foreign tables (at least there won't be the same feature parity).

"Databricks creates a background job that monitors tables for freshness and completeness. Databricks uses smart scanning to determine when to scan tables.

Freshness refers to how recently a table has been updated. Data quality monitoring analyzes the history of commits to a table and builds a per-table model to predict the time of the next commit. If a commit is unusually late, the table is marked as stale."

View solution in original post

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @sta_gas ,

Since data quality monitoring is in beta I'm quite sure they don't support foreign tables as of now (but they forgot to mentioned it in docs).

But more important question if they ever will be supported. For me data quality monitoring applies only to Delta Tables. According to docs description of how it works, we can see that they leverage delta properties to build this functionality. So I guess it won't work for foreign tables (at least there won't be the same feature parity).

"Databricks creates a background job that monitors tables for freshness and completeness. Databricks uses smart scanning to determine when to scan tables.

Freshness refers to how recently a table has been updated. Data quality monitoring analyzes the history of commits to a table and builds a per-table model to predict the time of the next commit. If a commit is unusually late, the table is marked as stale."

Hi szymon, 

Thank you for your quick response. I understand that data quality can be more complex. However, I believe that for ā€œData Profilingā€ monitoring, this approach could still be valid, as Unity Catalog generates predefined SQL queries to extract statistical and other relevant metrics and this could be done with SQL pushdowns.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now