Re: How to create a DLT pipeline with SQL statemen...

guangyi · ‎09-08-2024

Here is how I implement the 3rd option and how it failed:

I create a JSON file with the DLT pipeline definition inside

{
    "name": "query_data_quality_event_log_pipeline",
    "clusters": [
        {
            "label": "default",
            "spark_conf": {
                "spark.databricks.acl.needAdminPermissionToViewLogs": "false"
            },
            "policy_id": "xxxxxx",
            "autoscale": {
                "min_workers": 1,
                "max_workers": 2,
                "mode": "ENHANCED"
            }
        },
        {
            "label": "maintenance",
            "policy_id": "xxxxxx"
        }
    ],
    "development": true,
    "continuous": false,
    "channel": "PREVIEW",
    "edition": "CORE",
    "catalog": "xxxxxx",
    "target": "xxxxxx",
    "libraries": [
        {
            "notebook": {
                "path": "/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql"
            }
        }
    ]
}

Then create the pipeline via Databricks CLI

databricks pipelines create --json "$(cat single-dlt.json)" -p PID

The pipeline can be created successfully. However, when I clicked the running button, It show me this result:

BAD_REQUEST: Failed to load notebook '/Workspace/Users/xxx@xxx/query_data_quality_event_log.sql'. Only SQL and Python notebooks are supported currently.