- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2023 03:08 AM
I have pandas_udf, its working for 1 rows, but I tried with more than one rows getting below error.
PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 2.'.
Code
@func.pandas_udf(StringType())
def find_data(inputs : Iterator[pd.Series]) -> Iterator[pd.Series]:
for input in inputs :
--doing logic have for loop, if etc.
yield pd.Series(str(result_json))
df = df.withColumn("outData",find_data("inputData"))
- Labels:
-
Azure databricks
-
Pandas_udf
-
Pyspark
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2023 04:18 AM
I was testing, and your function is correct. So you need to have an error in inputData
type (is all string) or with result_json. Please also check the runtime version. I was using 11 LTS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2023 04:18 AM
I was testing, and your function is correct. So you need to have an error in inputData
type (is all string) or with result_json. Please also check the runtime version. I was using 11 LTS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2023 04:41 AM
Thanks @Hubert Dudek . Let me check the version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2023 09:20 PM
Hi @Hubert Dudek
I tried, hard coded data Frame input data , its working as expected.
But if am loading same data from file getting above mentioned error, do you have any idea.

