09-06-2024 02:23 AM
I need a DLT pipeline to create a materialized view for fetching event logs. All the ways below I tried are failed:
I follow the instructions here to query event log: https://docs.databricks.com/en/delta-live-tables/observability.html#monitor-delta-live-tables-pipeli...
How can I achieve this with the DLT pipeline?
09-06-2024 08:05 AM
Hi @guangyi ,
As @szymon_dybczak informed, .py and .sql formats are supported. The 2 first options you mentioned are not supported: you cannot use SQL in .py files, but SQL commands supported when .sql file is used.
Check the docs:
https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/tutorial-pipelines#sql-example
09-06-2024 02:39 AM
Hi @guangyi ,
It should be sufficient to attach notebook with sql language set as a default. What error did you get?
09-06-2024 08:05 AM
Hi @guangyi ,
As @szymon_dybczak informed, .py and .sql formats are supported. The 2 first options you mentioned are not supported: you cannot use SQL in .py files, but SQL commands supported when .sql file is used.
Check the docs:
https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/tutorial-pipelines#sql-example
09-08-2024 06:34 PM
Hi @szymon_dybczak @filipniziol
Here is how I implement the 3rd option and how it failed:
I create a JSON file with the DLT pipeline definition inside
{
"name": "query_data_quality_event_log_pipeline",
"clusters": [
{
"label": "default",
"spark_conf": {
"spark.databricks.acl.needAdminPermissionToViewLogs": "false"
},
"policy_id": "xxxxxx",
"autoscale": {
"min_workers": 1,
"max_workers": 2,
"mode": "ENHANCED"
}
},
{
"label": "maintenance",
"policy_id": "xxxxxx"
}
],
"development": true,
"continuous": false,
"channel": "PREVIEW",
"edition": "CORE",
"catalog": "xxxxxx",
"target": "xxxxxx",
"libraries": [
{
"notebook": {
"path": "/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql"
}
}
]
}
Then create the pipeline via Databricks CLI
databricks pipelines create --json "$(cat single-dlt.json)" -p PID
The pipeline can be created successfully. However, when I clicked the running button, It show me this result:
BAD_REQUEST: Failed to load notebook '/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql'. Only SQL and Python notebooks are supported currently.
09-08-2024 07:00 PM
After just finishing my last reply, I realized what’s wrong with my code: I should use “file” property instead of “notebook” in the libraries section.
It works now. Thank you guys, you are my rubber duck!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group