Databricks Community

Gecofer · ‎06-09-2025

Hi everyone,

I’m running into a strange issue in one of my ETL pipelines using dbt on Databricks, and I’d appreciate any insights or ideas. I have a query that is part of my dbt model. When I run the ETL process, the results from this query are incorrect. However, if I copy exactly the same query and run it directly from the Databricks SQL Editor, it works as expected and returns the correct results.

This inconsistency is puzzling, and I’ve already ruled out the following:

The underlying data is the same (no recent changes in the base tables).
I’ve tested both with and without the model being materialized.
The SQL syntax and logic are identical in both places.
The query is deterministic and doesn’t depend on dynamic runtime values.

I'm starting to wonder if this might be related to:

Caching, session-level context, or temp views?
Permissions or isolation between the dbt runner and my personal SQL workspace?
Some hidden state being introduced by previous dbt models?

Has anyone else experienced something similar? Any ideas on how to debug this or what could be causing the inconsistency?

Thanks in advance!

Isi · ‎06-09-2025

Hey @Gecofer

I’ve seen similar cases before where the SQL logic was perfectly valid, the data unchanged, and the only difference was who was running the query. Sometimes, discrepancies like this can be related to runtime context—specifically:

The identity used to run the dbt model (e.g., a service principal or job user) might not have exactly the same permissions as the one used in the SQL Editor.
If your workspace uses Unity Catalog, there could be data masking policies or fine-grained access controls in place that behave differently depending on the caller.
It’s also worth checking if the field in question is subject to any dynamic filtering or masking, especially in sensitive datasets, where visibility might be altered by design.

If the results differ, the root cause could lie in policy-based behavior rather than the query logic itself.

Hope this helps, 🙂

Isi

View solution in original post

Isi · ‎06-09-2025

Hey @Gecofer

I’ve seen similar cases before where the SQL logic was perfectly valid, the data unchanged, and the only difference was who was running the query. Sometimes, discrepancies like this can be related to runtime context—specifically:

The identity used to run the dbt model (e.g., a service principal or job user) might not have exactly the same permissions as the one used in the SQL Editor.
If your workspace uses Unity Catalog, there could be data masking policies or fine-grained access controls in place that behave differently depending on the caller.
It’s also worth checking if the field in question is subject to any dynamic filtering or masking, especially in sensitive datasets, where visibility might be altered by design.

If the results differ, the root cause could lie in policy-based behavior rather than the query logic itself.

Hope this helps, 🙂

Isi

Gecofer · ‎06-09-2025

Hi @Isi

Thanks so much for your insight!

It turned out to be a combination of the two things you mentioned:

There was a data masking policy applied to one of the columns, and while I had permissions to view the unmasked data, the service principal running the dbt model did not. That’s exactly why I was getting inconsistent results between the dbt run and the SQL Editor.

Appreciate the help! 😊

Databricks Community

Inconsistent query results between dbt ETL run and SQL editor in Databricks

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐