cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'

jeremy98
New Contributor III

Hello community,
I installed databricks extension on my vscode ide. How to fix this error? I created the environment to run locally my notebooks and selected the available remote cluster to execute my notebook, what else?

I Have this error: ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'

This is the snippet code:

 

from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()

spark.sql("SELECT * FROM catalog.00_bronze_layer.client_email LIMIT 10")

 

1 ACCEPTED SOLUTION

Accepted Solutions

spiky001
New Contributor II

Hi,

We encountered the same issues when importing sql from pyspark in the following code snippet.

 

from pyspark import sql

def get_spark_session() -> sql.SparkSession:
    spark = sql.SparkSession.getActiveSession()
    if not spark:
        # trying to get a spark connect Sessions
        from databricks.connect import DatabricksSession
        from pyspark.errors.exceptions.connect import SparkConnectGrpcException
        spark = DatabricksSession.builder.getOrCreate()
    return spark

 

Error Encountered:

 

from pyspark.sql.udtf import AnalyzeArgument, AnalyzeResult  # noqa: F401
ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'

 

Environment Information:

python                   3.11.10
pyspark                  3.5.0
databricks-connect       15.4.4

FYI, occasionally deleting and reinstalling the virtual environment can fix the issue, but it's not a consistent solution.

View solution in original post

10 REPLIES 10

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @jeremy98,

The error you are encountering, ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf', is likely due to a version mismatch between the pyspark library and the databricks-connect library. This issue arises because the AnalyzeArgument class is not present in the pyspark version you are using. Could you please advise which version of pyspark and databricks-connect are you using?
Can you try: pip install --upgrade databricks-connect

Hello Alberto,
Thanks for your help. Sure, now I upgraded the Databricks-connect to v16.0.0. I was using a pyspark, but how can I find it? I was having only:

 

pyspark -h                
Python 3.13.0 (main, Oct  7 2024, 05:02:14) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
usage: pyspark [-h] [--remote REMOTE]

 

In poetry should be 3.5.0

Now, I have another import error: ImportError: cannot import name 'is_remote_only' from 'pyspark.util'

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @jeremy98,

Can you also upgrade pyspark?

pip install --upgrade pyspark

 

  • Check if the is_remote_only function exists in the version of PySpark you are using. You can do this by inspecting the pyspark.util module:

    import pyspark.util
    print(dir(pyspark.util))

 

spiky001
New Contributor II

Hi,

We encountered the same issues when importing sql from pyspark in the following code snippet.

 

from pyspark import sql

def get_spark_session() -> sql.SparkSession:
    spark = sql.SparkSession.getActiveSession()
    if not spark:
        # trying to get a spark connect Sessions
        from databricks.connect import DatabricksSession
        from pyspark.errors.exceptions.connect import SparkConnectGrpcException
        spark = DatabricksSession.builder.getOrCreate()
    return spark

 

Error Encountered:

 

from pyspark.sql.udtf import AnalyzeArgument, AnalyzeResult  # noqa: F401
ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'

 

Environment Information:

python                   3.11.10
pyspark                  3.5.0
databricks-connect       15.4.4

FYI, occasionally deleting and reinstalling the virtual environment can fix the issue, but it's not a consistent solution.

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @spiky001,

Could you please advise what is the DBR version of you cluster?

unj1m
New Contributor II

What version of pyspark is required?  I did a clean install and got 3.5.3. I'm running Python 3.11.

$ pip freeze
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.0
databricks-connect==16.0.0
databricks-sdk==0.39.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
grpcio==1.68.1
grpcio-status==1.68.1
idna==3.10
numpy==1.26.4
packaging==24.2
pandas==2.2.3
protobuf==5.29.2
py4j==0.10.9.7
pyarrow==18.1.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pyspark==3.5.3
python-dateutil==2.9.0.post0
pytz==2024.2
requests==2.32.3
rsa==4.9
six==1.17.0
tzdata==2024.2
urllib3==2.2.3

I get the error just importing databricks.connect, so I don't see how a cluster property can matter.

$ python
Python 3.11.9 (main, Aug 13 2024, 12:21:18) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import databricks.connect
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/databricks/connect/__init__.py", line 20, in <module>
from .session import DatabricksSession
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/databricks/connect/session.py", line 28, in <module>
from .auth import DatabricksChannelBuilder
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/databricks/connect/auth.py", line 26, in <module>
from pyspark.sql.connect.client import ChannelBuilder
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/__init__.py", line 148, in <module>
from pyspark.sql import SQLContext, HiveContext, Row # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/__init__.py", line 43, in <module>
from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration, UDTFRegistration
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/context.py", line 39, in <module>
from pyspark.sql.session import _monkey_patch_RDD, SparkSession
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/session.py", line 48, in <module>
from pyspark.sql.functions import lit
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/functions/__init__.py", line 20, in <module>
from pyspark.sql.functions.builtin import * # noqa: F401,F403
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/functions/builtin.py", line 50, in <module>
from pyspark.sql.udtf import AnalyzeArgument, AnalyzeResult # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf' (/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/udtf.py)

unj1m
New Contributor II

I wonder if I need to install Java. 😁  I bet I do.

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @unj1m,

databricks-connect comes with pyspark “included” 

unj1m
New Contributor II

Right you are!  I actually did install pyspark and that caused the error, until I installed java.

Sorry.

jeremy98
New Contributor III

@unj1m yes, as Alberto said you don't need to install pyspark, it is included in your cluster configuration.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group