<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Fatal error: The Python kernel is unresponsive when attempting to query data from AWS Redshift within Jupyter notebook in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/fatal-error-the-python-kernel-is-unresponsive-when-attempting-to/m-p/6347#M2521</link>
    <description>&lt;P&gt;I am running jupyter notebook on a cluster with configuration: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Worker type: i3.xlarge 30.5gb memory, 4 cores&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Min 2 and max 8 workers &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;cursor = conn.cursor()
&amp;nbsp;
cursor.execute(
               """
               SELECT * FROM (
                  select *, random() as sample 
                  from tbl
               WHERE type = 'type1'
               AND created_at &amp;gt;= '2022-01-01 00:00:00' 
               AND created_at &amp;lt;= '2022-06-30 00:00:00') as samp
               WHERE sample &amp;lt; .05; -- return 5% of rows
               """
              )
&amp;nbsp;
# To Pandas DataFrame
df = DataFrame(cursor.fetchall())
&amp;nbsp;
# # # Get column names
field_names = [i[0] for i in cursor.description]
df.columns = field_names&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Fatal error: The Python kernel is unresponsive.
&amp;nbsp;
---------------------------------------------------------------------------
&amp;nbsp;
The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
The last 10 KB of the process's stderr and stdout can be found below. See driver logs for full logs.
&amp;nbsp;
---------------------------------------------------------------------------
&amp;nbsp;
Last messages on stderr:
&amp;nbsp;
ks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1112, in structured_traceback
&amp;nbsp;
    return FormattedTB.structured_traceback(
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1006, in structured_traceback
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/stack_data/core.py", line 649, in included_pieces
&amp;nbsp;
    pos = scope_pieces.index(self.executing_piece)
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
&amp;nbsp;
    value = obj.__dict__[self.func.__name__] = self.func(obj)
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/executing/executing.py", line 164, in only
&amp;nbsp;
    raise NotOneValueFound('Expected one value, found 0')
&amp;nbsp;
executing.executing.NotOneValueFound: Expected one value, found 0
&amp;nbsp;
Traceback (most recent call last):
&amp;nbsp;
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/redshift_connector/core.py", line 1631, in execute
&amp;nbsp;
    ps = cache["ps"][key]
&amp;nbsp;
KeyError: ('SELECT * \nFROM \n             SELECT * \n             FROM tbl\n             WHERE event_type = %s\n             AND created_at &amp;gt;= %s\n             AND created_at &amp;lt;= %s \n              \n LIMIT 5', ((&amp;lt;RedshiftOID.UNKNOWN: 705&amp;gt;, 0, &amp;lt;function text_out at 0x7f7a348d3790&amp;gt;), (&amp;lt;RedshiftOID.UNKNOWN: 705&amp;gt;, 0, &amp;lt;function text_out at 0x7f7a348d3790&amp;gt;), (&amp;lt;RedshiftOID.UNKNOWN: 705&amp;gt;, 0, &amp;lt;function text_out at 0x7f7a348d3790&amp;gt;)))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;The data is fairly large, probably 300-400m rows. what configuration do I need to modify? or optimize the query? parallel processing etc.? &lt;/P&gt;</description>
    <pubDate>Wed, 05 Apr 2023 22:13:10 GMT</pubDate>
    <dc:creator>kll</dc:creator>
    <dc:date>2023-04-05T22:13:10Z</dc:date>
    <item>
      <title>Fatal error: The Python kernel is unresponsive when attempting to query data from AWS Redshift within Jupyter notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/fatal-error-the-python-kernel-is-unresponsive-when-attempting-to/m-p/6347#M2521</link>
      <description>&lt;P&gt;I am running jupyter notebook on a cluster with configuration: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Worker type: i3.xlarge 30.5gb memory, 4 cores&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Min 2 and max 8 workers &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;cursor = conn.cursor()
&amp;nbsp;
cursor.execute(
               """
               SELECT * FROM (
                  select *, random() as sample 
                  from tbl
               WHERE type = 'type1'
               AND created_at &amp;gt;= '2022-01-01 00:00:00' 
               AND created_at &amp;lt;= '2022-06-30 00:00:00') as samp
               WHERE sample &amp;lt; .05; -- return 5% of rows
               """
              )
&amp;nbsp;
# To Pandas DataFrame
df = DataFrame(cursor.fetchall())
&amp;nbsp;
# # # Get column names
field_names = [i[0] for i in cursor.description]
df.columns = field_names&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Fatal error: The Python kernel is unresponsive.
&amp;nbsp;
---------------------------------------------------------------------------
&amp;nbsp;
The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
The last 10 KB of the process's stderr and stdout can be found below. See driver logs for full logs.
&amp;nbsp;
---------------------------------------------------------------------------
&amp;nbsp;
Last messages on stderr:
&amp;nbsp;
ks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1112, in structured_traceback
&amp;nbsp;
    return FormattedTB.structured_traceback(
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1006, in structured_traceback
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/stack_data/core.py", line 649, in included_pieces
&amp;nbsp;
    pos = scope_pieces.index(self.executing_piece)
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
&amp;nbsp;
    value = obj.__dict__[self.func.__name__] = self.func(obj)
&amp;nbsp;
  File "/databricks/python/lib/python3.9/site-packages/executing/executing.py", line 164, in only
&amp;nbsp;
    raise NotOneValueFound('Expected one value, found 0')
&amp;nbsp;
executing.executing.NotOneValueFound: Expected one value, found 0
&amp;nbsp;
Traceback (most recent call last):
&amp;nbsp;
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/redshift_connector/core.py", line 1631, in execute
&amp;nbsp;
    ps = cache["ps"][key]
&amp;nbsp;
KeyError: ('SELECT * \nFROM \n             SELECT * \n             FROM tbl\n             WHERE event_type = %s\n             AND created_at &amp;gt;= %s\n             AND created_at &amp;lt;= %s \n              \n LIMIT 5', ((&amp;lt;RedshiftOID.UNKNOWN: 705&amp;gt;, 0, &amp;lt;function text_out at 0x7f7a348d3790&amp;gt;), (&amp;lt;RedshiftOID.UNKNOWN: 705&amp;gt;, 0, &amp;lt;function text_out at 0x7f7a348d3790&amp;gt;), (&amp;lt;RedshiftOID.UNKNOWN: 705&amp;gt;, 0, &amp;lt;function text_out at 0x7f7a348d3790&amp;gt;)))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;The data is fairly large, probably 300-400m rows. what configuration do I need to modify? or optimize the query? parallel processing etc.? &lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 22:13:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/fatal-error-the-python-kernel-is-unresponsive-when-attempting-to/m-p/6347#M2521</guid>
      <dc:creator>kll</dc:creator>
      <dc:date>2023-04-05T22:13:10Z</dc:date>
    </item>
    <item>
      <title>Re: Fatal error: The Python kernel is unresponsive when attempting to query data from AWS Redshift within Jupyter notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/fatal-error-the-python-kernel-is-unresponsive-when-attempting-to/m-p/6348#M2522</link>
      <description>&lt;P&gt;Hi, Could you please confirm the usage of your cluster while running this job? you can monitor the performance here: &lt;A href="https://docs.databricks.com/clusters/clusters-manage.html#monitor-performance" alt="https://docs.databricks.com/clusters/clusters-manage.html#monitor-performance" target="_blank"&gt;https://docs.databricks.com/clusters/clusters-manage.html#monitor-performance&lt;/A&gt; with different metrics. &lt;/P&gt;&lt;P&gt;Also, please tag&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0053f000000WWwvAAG" alt="https://community.databricks.com/s/profile/0053f000000WWwvAAG" target="_blank"&gt;@Debayan&lt;/A&gt;​&amp;nbsp;with your next response which will notify me. Thank you!&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2023 05:41:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/fatal-error-the-python-kernel-is-unresponsive-when-attempting-to/m-p/6348#M2522</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-04-06T05:41:45Z</dc:date>
    </item>
  </channel>
</rss>

