cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to create a DLT pipeline with SQL statement

guangyi
Contributor III

I need a DLT pipeline to create a materialized view for fetching event logs. All the ways below I tried are failed:

  • Attach a notebook with pure SQL inside: No magic cell like `%sql` are failed
  • Attach a notebook with `spark.sql` python code: Failed because I cannot use the `CREATE VIEW` statement.
  • Attach a file with `.sql` format: Failed because sql format is not supported yet.

I follow the instructions here to query event log: https://docs.databricks.com/en/delta-live-tables/observability.html#monitor-delta-live-tables-pipeli...

How can I achieve this with the DLT pipeline?

1 ACCEPTED SOLUTION

Accepted Solutions

filipniziol
Contributor III

Hi @guangyi ,
As @szymon_dybczak informed, .py and .sql formats are supported. The 2 first options you mentioned are not supported: you cannot use SQL in .py files, but SQL commands supported when .sql file is used.

Check the docs:

filipniziol_0-1725635057151.png

https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/tutorial-pipelines#sql-example

View solution in original post

4 REPLIES 4

szymon_dybczak
Contributor III

Hi @guangyi ,

It should be sufficient to attach notebook with sql language set as a default. What error did you get?

filipniziol
Contributor III

Hi @guangyi ,
As @szymon_dybczak informed, .py and .sql formats are supported. The 2 first options you mentioned are not supported: you cannot use SQL in .py files, but SQL commands supported when .sql file is used.

Check the docs:

filipniziol_0-1725635057151.png

https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/tutorial-pipelines#sql-example

guangyi
Contributor III

Hi @szymon_dybczak @filipniziol 

Here is how I implement the 3rd option and how it failed:

I create a JSON file with the DLT pipeline definition inside

{
    "name": "query_data_quality_event_log_pipeline",
    "clusters": [
        {
            "label": "default",
            "spark_conf": {
                "spark.databricks.acl.needAdminPermissionToViewLogs": "false"
            },
            "policy_id": "xxxxxx",
            "autoscale": {
                "min_workers": 1,
                "max_workers": 2,
                "mode": "ENHANCED"
            }
        },
        {
            "label": "maintenance",
            "policy_id": "xxxxxx"
        }
    ],
    "development": true,
    "continuous": false,
    "channel": "PREVIEW",
    "edition": "CORE",
    "catalog": "xxxxxx",
    "target": "xxxxxx",
    "libraries": [
        {
            "notebook": {
                "path": "/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql"
            }
        }
    ]
}

Then create the pipeline via Databricks CLI

databricks pipelines create --json "$(cat single-dlt.json)" -p PID

The pipeline can be created successfully. However, when I clicked the running button, It show me this result:

BAD_REQUEST: Failed to load  notebook '/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql'. Only SQL and Python notebooks are supported currently.

guangyi
Contributor III

After just finishing my last reply, I realized what’s wrong with my code: I should use “file” property instead of “notebook” in the libraries section.

It works now. Thank you guys, you are my rubber duck!




Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group