topic Re: SQL UDFs for DLT pipelines in Data Engineering

SQL UDFs for DLT pipelines

famous_jt33 — Fri, 16 Jun 2023 23:33:23 GMT

I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separate notebook on its own but both failed with the same error (see attached image below).

Here is the UDF:

CREATE FUNCTION IF NOT EXISTS gtin_std(number STRING)

RETURNS STRING

BEGIN

DECLARE gtin VARCHAR(20);

DECLARE gtin_std VARCHAR(20);

SET gtin = REGEXP_REPLACE(number, '[^0-9]', '');

IF LENGTH(gtin) IN (8, 12, 13, 14) THEN

SET gtin_std = LPAD(TRIM(gtin), 14, '0');

RETURN gtin_std;

ELSE

RETURN NULL;

END IF;

END;

Re: SQL UDFs for DLT pipelines

Anonymous — Sat, 17 Jun 2023 09:31:57 GMT

Hi @Joshua Abiodun-Olojede

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

Re: SQL UDFs for DLT pipelines

6502 — Tue, 19 Dec 2023 17:52:13 GMT

You can't.
The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL code, which is supposed to be a subset of the original one.