Trying to do a url_decode on a column, which works great in development, but running via DLT fails when trying multiple ways.
1. pyspark.sql.functions.url_decode - This is new as of 3.5.0, but isn't supported using whatever version running a DLT pipeline provides. I haven't been able to figure out what version of PySpark this is actually running. It says 12.2, but I suspect that might actually be the version of something else:
dlt:12.2-delta-pipelines-dlt-release-2024.04-rc0-commit-24b74
2. Attempt to use a simple UDF that wraps urllib.parse.unquote_plus, however this appears to be unsupported with Unit Catalog. Given the documentation states that this should be supported in versions greater than 13.1, again guessing the version is why I get this error:
pyspark.errors.exceptions.AnalysisException: [UC_COMMAND_NOT_SUPPORTED] UDF/UDAF functions are not supported in Unity Catalog
3. Have also tried to use cluster policies to attempt to set the version, however regardless of what version this attempts to force, cluster gets the same version as above. Have tried using regex, explicit version, and auto:latest with no luck.
This leads to two questions:
1. What version of PySpark is DLT running and how can users consistently find this to know what is available for use?
2. How do users force versions if cluster policies don't work?
3. Any other recommendations for doing a URL decode via DLT, since this is where the rest of our ETL pipeline is running, would prefer to not fragment out tables into separate workflows to manage.