Databricks Community

Ancil · ‎01-17-2023

I have pandas_udf, its working for 1 rows, but I tried with more than one rows getting below error.

PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 2.'.

Code

@func.pandas_udf(StringType())
 
def find_data(inputs : Iterator[pd.Series]) -> Iterator[pd.Series]:
 
       for input in inputs :
 
           --doing logic have for loop, if etc.
 
           yield pd.Series(str(result_json))
 
 
 
df = df.withColumn("outData",find_data("inputData"))

Hubert-Dudek · ‎01-17-2023

I was testing, and your function is correct. So you need to have an error in inputData

type (is all string) or with result_json. Please also check the runtime version. I was using 11 LTS.

View solution in original post

Hubert-Dudek · ‎01-17-2023

I was testing, and your function is correct. So you need to have an error in inputData

type (is all string) or with result_json. Please also check the runtime version. I was using 11 LTS.

Ancil · ‎01-17-2023

Thanks @Hubert Dudek . Let me check the version.

Ancil · ‎01-17-2023

Hi @Hubert Dudek

I tried, hard coded data Frame input data , its working as expected.

But if am loading same data from file getting above mentioned error, do you have any idea.

Databricks Community

PythonException: 'RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 2.'.

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!