Databricks Community

Dharmin · ‎03-19-2026

While using DQX to generate DQ rules on a table i have used the following prompt.

from databricks.labs.dqx.profiler.generator import DQGenerator

from databricks.labs.dqx.profiler.profiler import DQProfiler

from databricks.labs.dqx.config import InputConfig

from databricks.sdk import WorkspaceClient

ws = WorkspaceClient()

generator = DQGenerator(workspace_client=ws, spark=spark)

# Step 1: Profile the data to generate summary statistics

profiler = DQProfiler(workspace_client=ws, spark=spark)

# Generate rules with input data schema awareness

input_df = spark.read.table(f"{schema}.{tablename}")

summary_stats, profiles = profiler.profile(input_df)

user_input = """

Make sure tid and tdescr are one to one mapped

make sure l and ldesc are one to one mapped

make sure effective startdate and effective enddate are lesser than current date

Make sure the test cases are executable through dbx and should not throw any errors like table or view not found

"""

# Option A: With business context

checks = generator.generate_dq_rules_ai_assisted(

user_input=user_input,

summary_stats=summary_stats

)

print(checks)

For the above code i am getting some valid response but the response is not as directly usable.

{'criticality': 'error', 'check': {'function': 'sql_query', 'arguments': {'query': 'SELECT TID, COUNT(DISTINCT TIDDESCR) as desc_count FROM input_view GROUP BY TID HAVING COUNT(DISTINCT TIDDESCR) > 1', 'name': 'tid_to_description_one_to_one', 'msg': 'CTID must map to exactly one TIDDESCR'}}}

I tried to correct it as below.

- check:

arguments:

merge_columns:

- TID

- TIDDESCR

msg: TID has multiple TIDDESCR values

input_placeholder: input_view

condition_column: desc_count

name: tid_tiddescr_one_to_one

query: with main_cte as (SELECT TID, TIDDESCR, COUNT(DISTINCT TIDDESCR) AS desc_count

FROM {{ input_view }} WHERE TID IS NOT NULL GROUP BY TID, TIDDESCR

HAVING COUNT(DISTINCT TIDDESCR) > 1)select *,case when desc_count > 1 then true else false end as condition from main_cte

function: sql_query

criticality: error

- check:
arguments:
merge_columns:
- l
- lDESC
input_placeholder: input_view
msg: l has multiple lDESC values
name: l_ldesc_one_to_one
condition_column: condition
query: with main_cte as (SElECT l, lDESC, COUNT(DISTINCT lDESC) AS desc_count
FROM {{ input_view }} WHERE l IS NOT NUll GROUP BY l, lDESC
HAVING COUNT(DISTINCT lDESC) > 1)select *,case when desc_count > 1 then true else false end as condition from main_cte
function: sql_query
criticality: error

Still facing the following issue

{"ts": "2026-03-13 06:46:43.193", "level": "ERROR", "logger": "DataFrameQueryContextLogger", "msg": "[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve \"CASE WHEN Tid_Tiddescr_one_to_one_desc_count_de5f40594692464892443d2e0412aaef THEN TID has multiple TIDDESCR values ELSE CAST(NULL AS STRING) END\" due to data type mismatch: The first parameter requires the \"BOOLEAN\" type, however \"Tid_Tiddescr_one_to_one_desc_count_de5f40594692464892443d2e0412aaef\" has the type \"BIGINT\". SQLSTATE: 42K09;\n'Project [TID#1233, TIDDESCR#1234, l#1235, lDESC#1236,

Can some one guide me on what chould be the issue.

I need help on the following things

1.Is there any possibility to get better response from the LLM model. Any better prompt etc.?

2.Why is the sql_query function not working?