Databricks Community

famous_jt33 · ‎06-16-2023

I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separate notebook on its own but both failed with the same error (see attached image below).

Here is the UDF:

CREATE FUNCTION IF NOT EXISTS gtin_std(number STRING)

RETURNS STRING

BEGIN

DECLARE gtin VARCHAR(20);

DECLARE gtin_std VARCHAR(20);

SET gtin = REGEXP_REPLACE(number, '[^0-9]', '');

IF LENGTH(gtin) IN (8, 12, 13, 14) THEN

SET gtin_std = LPAD(TRIM(gtin), 14, '0');

RETURN gtin_std;

ELSE

RETURN NULL;

END IF;

END;

Anonymous · ‎06-17-2023

Hi @Joshua Abiodun-Olojede

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

6502 · ‎12-19-2023

You can't.
The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL code, which is supposed to be a subset of the original one.

Databricks Community

SQL UDFs for DLT pipelines

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐