โ07-03-2024 10:42 AM
Hi There,
Scenario at work is as described in the subject line:
Is it possible to author a SQL (Python) scalar UDF IN A SQL SERVERLESS WAREHOUSE, which involves a library NOT included in any Databricks Runtime? And how would one go about it? (Documentation seems to indicate this is not possible, since its compute is self-provisioned...though not 100% clear to me. Again, not resorting to any other cluster types, it has to be done directly in SQL Serverless, otherwise this would have been resolved already).
For context, the use case deals with xml validation against (nested, very complex and large) xsd schemas, which goes beyond what DBR currently offers "out of the box" in terms of built-in functions (need all validation errors returned, flexible schema sources, etc.). But it could be anything else!! (e.g. in Snowflake, one is able to "declare" the necessary libraries within the SQL function body, plus the full Anaconda distribution is available!)
Any help/ suggestions/ comments would be greatly appreciated!!
Thank you in advance for your time!
โ07-04-2024 11:17 PM
Hi @cpelazza,
โ07-05-2024 06:44 AM
Thank you so much for the clarification @Kaniz_Fatma !!
I believe we are either going with that option, or possibly exploring a Python DBT model.
Best regards,
cpelazza
โ07-04-2024 08:47 AM
Hi @cpelazza,
CREATE FUNCTION
statement t....lxml
or any other relevant ones.โ07-04-2024 06:08 PM
Thank you for your response, @Kaniz_Fatma!!
While the answer makes sense, I haven't been able to figure out "how" one would do that, in particular (from your answer):
I mean, I can import a library within the UDF's body, BUT how do I make that specific library available to the compute component of the cluster if it is NOT part of a Runtime?? (plus, I only see defusedxml library in 13.3 LTS, with deprecated .lxml module, now erroring out, lxml in 14.3 LTS though not fit for purpose, and NO Python xml library onwards, e.g. Runtime 15 series).
Would you so kind as to provide some examples of how to bring a specific library in?
Most examples I came across just do imports of built in libs.... but if I want a library not in a Runtime, the only way I know is declaring it as part of a cluster's compute, which is not the same as using Serverless.
Please correct me if I am wrong on any of these, as I am not fully familiarized with Serverless warehouses.
An example would be much appreciated!!
Thank you in advance!!
โ07-04-2024 11:17 PM
Hi @cpelazza,
โ07-05-2024 06:44 AM
Thank you so much for the clarification @Kaniz_Fatma !!
I believe we are either going with that option, or possibly exploring a Python DBT model.
Best regards,
cpelazza
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group