- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-14-2021 01:12 PM
Is it somehow possible to create an SQL external function using Python code?
the examples only show how to use JARs
something like:
CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf' USING FILE '/tmp/SimpleUdf.py';
- Labels:
-
Function
-
Python
-
Python Code
-
Python File
-
SQL
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2021 04:55 AM
I would think the USING FILE would work.
As long as you follow the class_name requirements.
The implementing class should extend one of the base classes as follows:
- Should extend UDF or UDAF in org.apache.hadoop.hive.ql.exec package.
- Should extend AbstractGenericUDAFResolver, GenericUDF, or GenericUDTF in org.apache.hadoop.hive.ql.udf.generic package.
- Should extend UserDefinedAggregateFunction in org.apache.spark.sql.expressions package.
Also the docs literally state python is possible:
In addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See User-defined scalar functions (UDFs) and User-defined aggregate functions (UDAFs) for more information.
So it should be possible, maybe your python class does not meet the requirements?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2021 04:55 AM
I would think the USING FILE would work.
As long as you follow the class_name requirements.
The implementing class should extend one of the base classes as follows:
- Should extend UDF or UDAF in org.apache.hadoop.hive.ql.exec package.
- Should extend AbstractGenericUDAFResolver, GenericUDF, or GenericUDTF in org.apache.hadoop.hive.ql.udf.generic package.
- Should extend UserDefinedAggregateFunction in org.apache.spark.sql.expressions package.
Also the docs literally state python is possible:
In addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See User-defined scalar functions (UDFs) and User-defined aggregate functions (UDAFs) for more information.
So it should be possible, maybe your python class does not meet the requirements?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2022 02:31 PM
For python which class to extend then? All of the listed parent classes are java
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2022 11:08 PM
for pyspark you can use udf().
Here is an example on how to do this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2022 06:43 AM
Thanks for your response. What I am looking for is to define a view with the UDF. However, a session level UDF as described in this example you provided does not seem to allow that. Maybe I should clarify my question as to define a external UDF like those Hive ones.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2022 03:20 PM
@Wugang Xu - My name is Piper, and I'm a moderator here for Databricks. Thanks for coming to us with your question. We'll give the members a bit longer to respond and come back if we need to. Thanks in advance for your patience. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-04-2022 06:11 PM
As a user of your code, I'd find it a less pleasant API because I'd have to some_module.some_func.some_func() rather than just some_module.some_func()
No reason to have "some_func" exist twice in the hierarchy. It's kind of redundant. If some_func is so large that adding any more ocde to the file seems crazy, maybe some_func is too large and you want to refactor it and simplify it.
Having one file serve one purpose makes sense. Having it literally have only a single function and nothing else is pretty unusual.

