<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Error when importing PyDeequ package in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15907#M10176</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to do some tests regarding data quality and for that I pretend to use &lt;A href="https://github.com/awslabs/python-deequ" alt="https://github.com/awslabs/python-deequ" target="_blank"&gt;PyDeequ&lt;/A&gt; on a databricks notebook. Keep in mind that I'm very new to databricks and Spark.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First I created a cluster with the Runtime version "10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)" and added to the environment variable &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SPARK_VERSION=3.2&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;as referred in the repository's GitHub.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Since the available &lt;A href="https://pypi.org/project/pydeequ/" alt="https://pypi.org/project/pydeequ/" target="_blank"&gt;PyPI package&lt;/A&gt; is not up to date I tried installing the package through a notebook-scoped library with the following comand&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%pip install numpy==1.22
%pip install git+https://github.com/awslabs/python-deequ.git&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;(The first line is only to prevent a conflict on the numpy versions.)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Then, when doing&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import pydeequ&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I get&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
&amp;lt;command-3386600260354339&amp;gt; in &amp;lt;module&amp;gt;
----&amp;gt; 1 import pydeequ
&amp;nbsp;
/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    165             # Import the desired module. If you’re seeing this while debugging a failed import,
    166             # look at preceding stack frames for relevant error information.
--&amp;gt; 167             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    168 
    169             is_root_import = thread_local._nest_level == 1
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/__init__.py in &amp;lt;module&amp;gt;
     19 from pydeequ.analyzers import AnalysisRunner
     20 from pydeequ.checks import Check, CheckLevel
---&amp;gt; 21 from pydeequ.configs import DEEQU_MAVEN_COORD
     22 from pydeequ.profiles import ColumnProfilerRunner
     23 
&amp;nbsp;
/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    165             # Import the desired module. If you’re seeing this while debugging a failed import,
    166             # look at preceding stack frames for relevant error information.
--&amp;gt; 167             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    168 
    169             is_root_import = thread_local._nest_level == 1
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/configs.py in &amp;lt;module&amp;gt;
     35 
     36 
---&amp;gt; 37 DEEQU_MAVEN_COORD = _get_deequ_maven_config()
     38 IS_DEEQU_V1 = re.search("com\.amazon\.deequ\:deequ\:1.*", DEEQU_MAVEN_COORD) is not None
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/configs.py in _get_deequ_maven_config()
     26 
     27 def _get_deequ_maven_config():
---&amp;gt; 28     spark_version = _get_spark_version()
     29     try:
     30         return SPARK_TO_DEEQU_COORD_MAPPING[spark_version[:3]]
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/configs.py in _get_spark_version()
     21     ]
     22     output = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
---&amp;gt; 23     spark_version = output.stdout.decode().split("\n")[-2]
     24     return spark_version
     25 
&amp;nbsp;
IndexError: list index out of range&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Can you please help me finding the reason to this or an alternative to get the library without the PyPI.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Mon, 19 Dec 2022 17:01:00 GMT</pubDate>
    <dc:creator>hf_santos</dc:creator>
    <dc:date>2022-12-19T17:01:00Z</dc:date>
    <item>
      <title>Error when importing PyDeequ package</title>
      <link>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15907#M10176</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to do some tests regarding data quality and for that I pretend to use &lt;A href="https://github.com/awslabs/python-deequ" alt="https://github.com/awslabs/python-deequ" target="_blank"&gt;PyDeequ&lt;/A&gt; on a databricks notebook. Keep in mind that I'm very new to databricks and Spark.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First I created a cluster with the Runtime version "10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)" and added to the environment variable &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SPARK_VERSION=3.2&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;as referred in the repository's GitHub.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Since the available &lt;A href="https://pypi.org/project/pydeequ/" alt="https://pypi.org/project/pydeequ/" target="_blank"&gt;PyPI package&lt;/A&gt; is not up to date I tried installing the package through a notebook-scoped library with the following comand&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%pip install numpy==1.22
%pip install git+https://github.com/awslabs/python-deequ.git&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;(The first line is only to prevent a conflict on the numpy versions.)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Then, when doing&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import pydeequ&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I get&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
&amp;lt;command-3386600260354339&amp;gt; in &amp;lt;module&amp;gt;
----&amp;gt; 1 import pydeequ
&amp;nbsp;
/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    165             # Import the desired module. If you’re seeing this while debugging a failed import,
    166             # look at preceding stack frames for relevant error information.
--&amp;gt; 167             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    168 
    169             is_root_import = thread_local._nest_level == 1
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/__init__.py in &amp;lt;module&amp;gt;
     19 from pydeequ.analyzers import AnalysisRunner
     20 from pydeequ.checks import Check, CheckLevel
---&amp;gt; 21 from pydeequ.configs import DEEQU_MAVEN_COORD
     22 from pydeequ.profiles import ColumnProfilerRunner
     23 
&amp;nbsp;
/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    165             # Import the desired module. If you’re seeing this while debugging a failed import,
    166             # look at preceding stack frames for relevant error information.
--&amp;gt; 167             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    168 
    169             is_root_import = thread_local._nest_level == 1
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/configs.py in &amp;lt;module&amp;gt;
     35 
     36 
---&amp;gt; 37 DEEQU_MAVEN_COORD = _get_deequ_maven_config()
     38 IS_DEEQU_V1 = re.search("com\.amazon\.deequ\:deequ\:1.*", DEEQU_MAVEN_COORD) is not None
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/configs.py in _get_deequ_maven_config()
     26 
     27 def _get_deequ_maven_config():
---&amp;gt; 28     spark_version = _get_spark_version()
     29     try:
     30         return SPARK_TO_DEEQU_COORD_MAPPING[spark_version[:3]]
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-5ccb9322-9b7e-4caf-b370-843c10304472/lib/python3.8/site-packages/pydeequ/configs.py in _get_spark_version()
     21     ]
     22     output = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
---&amp;gt; 23     spark_version = output.stdout.decode().split("\n")[-2]
     24     return spark_version
     25 
&amp;nbsp;
IndexError: list index out of range&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Can you please help me finding the reason to this or an alternative to get the library without the PyPI.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 17:01:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15907#M10176</guid>
      <dc:creator>hf_santos</dc:creator>
      <dc:date>2022-12-19T17:01:00Z</dc:date>
    </item>
    <item>
      <title>Re: Error when importing PyDeequ package</title>
      <link>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15908#M10177</link>
      <description>&lt;P&gt;yes this is legit i am also facing the same, I am working on it will update you soon &lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 02:07:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15908#M10177</guid>
      <dc:creator>Aviral-Bhardwaj</dc:creator>
      <dc:date>2022-12-20T02:07:14Z</dc:date>
    </item>
    <item>
      <title>Re: Error when importing PyDeequ package</title>
      <link>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15909#M10178</link>
      <description>&lt;P&gt;hey @Humberto Santos​&amp;nbsp; I got this answer&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it is happening because the Numpy version is not compatible with your pydeequ&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;see it is working &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/982i75BD2B455155A8A1/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;numpy==1.20.1 is compatible with this package &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please like this or upvote this answer,you can select this as a best answer also &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Aviral Bhardwaj&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 02:30:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15909#M10178</guid>
      <dc:creator>Aviral-Bhardwaj</dc:creator>
      <dc:date>2022-12-20T02:30:05Z</dc:date>
    </item>
    <item>
      <title>Re: Error when importing PyDeequ package</title>
      <link>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15910#M10179</link>
      <description>&lt;P&gt;I assumed I wouldn't need to add the Deequ library. Apparently, all I had to do was add it via Maven coordinates and it solved the problem.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 18:07:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15910#M10179</guid>
      <dc:creator>hf_santos</dc:creator>
      <dc:date>2022-12-20T18:07:00Z</dc:date>
    </item>
    <item>
      <title>Re: Error when importing PyDeequ package</title>
      <link>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15911#M10180</link>
      <description>&lt;P&gt;That was not the problem. I hadn't installed the Deequ library from Maven&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 18:11:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-when-importing-pydeequ-package/m-p/15911#M10180</guid>
      <dc:creator>hf_santos</dc:creator>
      <dc:date>2022-12-20T18:11:50Z</dc:date>
    </item>
  </channel>
</rss>

