10-14-2021 01:12 PM
Is it somehow possible to create an SQL external function using Python code?
the examples only show how to use JARs
something like:
CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf' USING FILE '/tmp/SimpleUdf.py';
10-15-2021 04:55 AM
I would think the USING FILE would work.
As long as you follow the class_name requirements.
The implementing class should extend one of the base classes as follows:
Also the docs literally state python is possible:
In addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See User-defined scalar functions (UDFs) and User-defined aggregate functions (UDAFs) for more information.
So it should be possible, maybe your python class does not meet the requirements?
10-15-2021 04:55 AM
I would think the USING FILE would work.
As long as you follow the class_name requirements.
The implementing class should extend one of the base classes as follows:
Also the docs literally state python is possible:
In addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See User-defined scalar functions (UDFs) and User-defined aggregate functions (UDAFs) for more information.
So it should be possible, maybe your python class does not meet the requirements?
01-27-2022 02:31 PM
For python which class to extend then? All of the listed parent classes are java
01-31-2022 11:08 PM
for pyspark you can use udf().
Here is an example on how to do this.
02-01-2022 06:43 AM
Thanks for your response. What I am looking for is to define a view with the UDF. However, a session level UDF as described in this example you provided does not seem to allow that. Maybe I should clarify my question as to define a external UDF like those Hive ones.
01-31-2022 03:20 PM
@Wugang Xu - My name is Piper, and I'm a moderator here for Databricks. Thanks for coming to us with your question. We'll give the members a bit longer to respond and come back if we need to. Thanks in advance for your patience. 🙂
02-04-2022 06:11 PM
As a user of your code, I'd find it a less pleasant API because I'd have to some_module.some_func.some_func() rather than just some_module.some_func()
No reason to have "some_func" exist twice in the hierarchy. It's kind of redundant. If some_func is so large that adding any more ocde to the file seems crazy, maybe some_func is too large and you want to refactor it and simplify it.
Having one file serve one purpose makes sense. Having it literally have only a single function and nothing else is pretty unusual.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group