cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SQL UDFs for DLT pipelines

famous_jt33
New Contributor

I am trying to implement a UDF for a DLT pipeline. I have seen the documentation stating that it is possible but I am getting an error after adding an SQL UDF to a cell in the notebook attached to the pipeline. The aim is to have the UDF in a separate notebook on its own but both failed with the same error (see attached image below).

Here is the UDF:

CREATE FUNCTION IF NOT EXISTS gtin_std(number STRING)

RETURNS STRING

BEGIN

  DECLARE gtin VARCHAR(20);

  DECLARE gtin_std VARCHAR(20);

   

  SET gtin = REGEXP_REPLACE(number, '[^0-9]', '');

  IF LENGTH(gtin) IN (8, 12, 13, 14) THEN

    SET gtin_std = LPAD(TRIM(gtin), 14, '0');

    RETURN gtin_std;

  ELSE

    RETURN NULL;

  END IF;

END;

2 REPLIES 2

Anonymous
Not applicable

Hi @Joshua Abiodun-Olojede​ 

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

6502
New Contributor III

You can't. 
The SQL support on DLT pipeline cluster is limited compared to a normal notebook. You can still define a UDF in Python using, of course, a Python notebook. In this case, you can use the spark.sql() function to execute your original SQL code, which is supposed to be a subset of the original one. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group