cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Tables UDFs and Versions

NotARobot
New Contributor III

Trying to do a url_decode on a column, which works great in development, but running via DLT fails when trying multiple ways.

1. pyspark.sql.functions.url_decode - This is new as of 3.5.0, but isn't supported using whatever version running a DLT pipeline provides. I haven't been able to figure out what version of PySpark this is actually running. It says 12.2, but I suspect that might actually be the version of something else:
dlt:12.2-delta-pipelines-dlt-release-2024.04-rc0-commit-24b74

2. Attempt to use a simple UDF that wraps urllib.parse.unquote_plus, however this appears to be unsupported with Unit Catalog. Given the documentation states that this should be supported in versions greater than 13.1, again guessing the version is why I get this error:
pyspark.errors.exceptions.AnalysisException: [UC_COMMAND_NOT_SUPPORTED] UDF/UDAF functions are not supported in Unity Catalog

3. Have also tried to use cluster policies to attempt to set the version, however regardless of what version this attempts to force, cluster gets the same version as above. Have tried using regex, explicit version, and auto:latest with no luck.

This leads to two questions:
1. What version of PySpark is DLT running and how can users consistently find this to know what is available for use?
2. How do users force versions if cluster policies don't work?
3. Any other recommendations for doing a URL decode via DLT, since this is where the rest of our ETL pipeline is running, would prefer to not fragment out tables into separate workflows to manage.

1 REPLY 1

NotARobot
New Contributor III

Thanks @Retired_mod, for reference if anybody finds this, the DLT release docs are here: https://docs.databricks.com/en/release-notes/delta-live-tables/index.html
This shows which versions are running for CURRENT and PREVIEW channels. In this case, was running on CURRENT channel (Spark 3.3.2), so PREVIEW channel (Spark 3.5.0) should work for the latest PySpark functions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group