- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
Hello community,
I installed databricks extension on my vscode ide. How to fix this error? I created the environment to run locally my notebooks and selected the available remote cluster to execute my notebook, what else?
I Have this error: ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'
This is the snippet code:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
spark.sql("SELECT * FROM catalog.00_bronze_layer.client_email LIMIT 10")
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi,
We encountered the same issues when importing sql from pyspark in the following code snippet.
from pyspark import sql
def get_spark_session() -> sql.SparkSession:
spark = sql.SparkSession.getActiveSession()
if not spark:
# trying to get a spark connect Sessions
from databricks.connect import DatabricksSession
from pyspark.errors.exceptions.connect import SparkConnectGrpcException
spark = DatabricksSession.builder.getOrCreate()
return spark
Error Encountered:
from pyspark.sql.udtf import AnalyzeArgument, AnalyzeResult # noqa: F401
ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'
Environment Information:
python 3.11.10 pyspark 3.5.0 databricks-connect 15.4.4
FYI, occasionally deleting and reinstalling the virtual environment can fix the issue, but it's not a consistent solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
Hi @jeremy98,
The error you are encountering, ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'
, is likely due to a version mismatch between the pyspark
library and the databricks-connect
library. This issue arises because the AnalyzeArgument
class is not present in the pyspark
version you are using. Could you please advise which version of pyspark and databricks-connect are you using?
Can you try: pip install --upgrade databricks-connect
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago - last edited 4 weeks ago
Hello Alberto,
Thanks for your help. Sure, now I upgraded the Databricks-connect to v16.0.0. I was using a pyspark, but how can I find it? I was having only:
pyspark -h
Python 3.13.0 (main, Oct 7 2024, 05:02:14) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
usage: pyspark [-h] [--remote REMOTE]
In poetry should be 3.5.0
Now, I have another import error: ImportError: cannot import name 'is_remote_only' from 'pyspark.util'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
Hi @jeremy98,
Can you also upgrade pyspark?
pip install --upgrade pyspark
-
Check if the
is_remote_only
function exists in the version of PySpark you are using. You can do this by inspecting thepyspark.util
module:import pyspark.util print(dir(pyspark.util))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi,
We encountered the same issues when importing sql from pyspark in the following code snippet.
from pyspark import sql
def get_spark_session() -> sql.SparkSession:
spark = sql.SparkSession.getActiveSession()
if not spark:
# trying to get a spark connect Sessions
from databricks.connect import DatabricksSession
from pyspark.errors.exceptions.connect import SparkConnectGrpcException
spark = DatabricksSession.builder.getOrCreate()
return spark
Error Encountered:
from pyspark.sql.udtf import AnalyzeArgument, AnalyzeResult # noqa: F401
ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'
Environment Information:
python 3.11.10 pyspark 3.5.0 databricks-connect 15.4.4
FYI, occasionally deleting and reinstalling the virtual environment can fix the issue, but it's not a consistent solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi @spiky001,
Could you please advise what is the DBR version of you cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
What version of pyspark is required? I did a clean install and got 3.5.3. I'm running Python 3.11.
$ pip freeze
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.0
databricks-connect==16.0.0
databricks-sdk==0.39.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
grpcio==1.68.1
grpcio-status==1.68.1
idna==3.10
numpy==1.26.4
packaging==24.2
pandas==2.2.3
protobuf==5.29.2
py4j==0.10.9.7
pyarrow==18.1.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pyspark==3.5.3
python-dateutil==2.9.0.post0
pytz==2024.2
requests==2.32.3
rsa==4.9
six==1.17.0
tzdata==2024.2
urllib3==2.2.3
I get the error just importing databricks.connect, so I don't see how a cluster property can matter.
$ python
Python 3.11.9 (main, Aug 13 2024, 12:21:18) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import databricks.connect
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/databricks/connect/__init__.py", line 20, in <module>
from .session import DatabricksSession
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/databricks/connect/session.py", line 28, in <module>
from .auth import DatabricksChannelBuilder
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/databricks/connect/auth.py", line 26, in <module>
from pyspark.sql.connect.client import ChannelBuilder
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/__init__.py", line 148, in <module>
from pyspark.sql import SQLContext, HiveContext, Row # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/__init__.py", line 43, in <module>
from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration, UDTFRegistration
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/context.py", line 39, in <module>
from pyspark.sql.session import _monkey_patch_RDD, SparkSession
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/session.py", line 48, in <module>
from pyspark.sql.functions import lit
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/functions/__init__.py", line 20, in <module>
from pyspark.sql.functions.builtin import * # noqa: F401,F403
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/functions/builtin.py", line 50, in <module>
from pyspark.sql.udtf import AnalyzeArgument, AnalyzeResult # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf' (/home/jim/.pyenv/versions/3.11/lib/python3.11/site-packages/pyspark/sql/udtf.py)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
I wonder if I need to install Java. 😁 I bet I do.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Right you are! I actually did install pyspark and that caused the error, until I installed java.
Sorry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
@unj1m yes, as Alberto said you don't need to install pyspark, it is included in your cluster configuration.