cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

mapInPandas not working in serverless compute

chinmay0924
New Contributor III

Using `mapInPandas` in serverless compute (Environment version 2) gives the following error,
```
Py4JError: An error occurred while calling o543.mapInPandas. Trace: py4j.Py4JException: Method mapInPandas([class org.apache.spark.sql.catalyst.expressions.PythonUDF, class java.lang.Boolean]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:352) at py4j.Gateway.invoke(Gateway.java:297) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:197) at py4j.ClientServerConnection.run(ClientServerConnection.java:117) at java.lang.Thread.run(Thread.java:750)
```

4 REPLIES 4

Khaja_Zaffer
Contributor III

Hello @chinmay0924

Good day

According to the documentation - (https://learn.microsoft.com/en-us/azure/databricks/compute/serverless/limitations) - This is a limitation of databricks connect. Unfortunately, you have to work with spark.sql or dataframes orSwitch to a standard (non-serverless) all-purpose cluster or job cluster, 

 

I am open to other contributions on this issue. 

 

Hello @Khaja_Zaffer 
The documentation you linked does not mention anywhere that mapInPandas is not supported. It says `Only Spark connect APIs are supported. Spark RDD APIs are not supported`. I have not used Spark RDD APIs. All I am trying to do is `dataframe.mapInPandas()` on a spark dataframe.

Hello @chinmay0924 

Are you using 
Serverless compute via notebook UI OR

Serverless compute via Databricks Connect 

mark_ott
Databricks Employee
Databricks Employee

The error you are seeing when using mapInPandas in serverless compute with Environment version 2 is due to an incompatibility in the environment's supported Spark features. Specifically, Environment version 2 on serverless compute does not support mapInPandas, which triggers the Py4JException indicating that the method does not exist on the JVM side of Spark in your environment.

Why This Happens

  • Serverless Environment Restrictions
    The version of Spark or configuration for serverless pools (especially with certain environments like Databricks Runtime 11.x or higher) may not expose certain Apache Spark features, including mapInPandas, for security and resource isolation reasons.

  • Method Not Available
    The error message means the Spark JVM backend does not recognize or export the mapInPandas method for remote invocation. It's not your codeโ€”it's the compute environment not supporting direct Python UDFs with this Spark construct.

What You Can Do

  • Switch to Standard Compute
    If possible, use a non-serverless, standard compute cluster or an environment version (like Databricks Runtime 9.x or below) where mapInPandas is supported.

  • Use Supported APIs
    In environments where mapInPandas is not supported, use alternatives such as:

    • applyInPandas (on newer versions/environments that support only some Pandas UDFs)

    • Explicit Spark SQL or DataFrame operations

  • Environment Upgrade/Change
    Check if a newer version of the serverless environment, or a configuration update, supports this method since feature support evolves frequently in managed Spark environments.