โ09-06-2024 02:23 AM
I need a DLT pipeline to create a materialized view for fetching event logs. All the ways below I tried are failed:
I follow the instructions here to query event log: https://docs.databricks.com/en/delta-live-tables/observability.html#monitor-delta-live-tables-pipeli...
How can I achieve this with the DLT pipeline?
โ09-06-2024 08:05 AM
Hi @guangyi ,
As @szymon_dybczak informed, .py and .sql formats are supported. The 2 first options you mentioned are not supported: you cannot use SQL in .py files, but SQL commands supported when .sql file is used.
Check the docs:
https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/tutorial-pipelines#sql-example
โ09-06-2024 02:39 AM
Hi @guangyi ,
It should be sufficient to attach notebook with sql language set as a default. What error did you get?
โ09-06-2024 08:05 AM
Hi @guangyi ,
As @szymon_dybczak informed, .py and .sql formats are supported. The 2 first options you mentioned are not supported: you cannot use SQL in .py files, but SQL commands supported when .sql file is used.
Check the docs:
https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/tutorial-pipelines#sql-example
โ09-08-2024 06:34 PM
Hi @szymon_dybczak @filipniziol
Here is how I implement the 3rd option and how it failed:
I create a JSON file with the DLT pipeline definition inside
{
"name": "query_data_quality_event_log_pipeline",
"clusters": [
{
"label": "default",
"spark_conf": {
"spark.databricks.acl.needAdminPermissionToViewLogs": "false"
},
"policy_id": "xxxxxx",
"autoscale": {
"min_workers": 1,
"max_workers": 2,
"mode": "ENHANCED"
}
},
{
"label": "maintenance",
"policy_id": "xxxxxx"
}
],
"development": true,
"continuous": false,
"channel": "PREVIEW",
"edition": "CORE",
"catalog": "xxxxxx",
"target": "xxxxxx",
"libraries": [
{
"notebook": {
"path": "/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql"
}
}
]
}
Then create the pipeline via Databricks CLI
databricks pipelines create --json "$(cat single-dlt.json)" -p PID
The pipeline can be created successfully. However, when I clicked the running button, It show me this result:
BAD_REQUEST: Failed to load notebook '/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql'. Only SQL and Python notebooks are supported currently.
โ09-08-2024 07:00 PM
After just finishing my last reply, I realized whatโs wrong with my code: I should use โfileโ property instead of โnotebookโ in the libraries section.
It works now. Thank you guys, you are my rubber duck!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group