<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Error handling - SQL states in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-handling-sql-states/m-p/108309#M43030</link>
    <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;Few questions please -&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Has anyone successfully used the below way of dealing with error handling in PySpark (example: that contains data frames) as well as SQL code based notebooks -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;from pyspark.errors import PySparkException

try:
  spark.sql("SELECT * FROM does_not_exist").show()
except PySparkException as ex:
  print("Error Class       : " + ex.getErrorClass())
  print("Message parameters: " + str(ex.getMessageParameters()))
  print("SQLSTATE          : " + ex.getSqlState())
  print(ex)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. With this approach, is it advisable to log errors into tables? I think of having an errors table with 4 columns to capture the date, error class, message parameter and the sqlstate.&lt;/P&gt;&lt;P&gt;3. Currently, we are logging all errors as ".txt" files in an ADLS storage account. The idea is to produce an operational dashboard on the top of the errors. I think table based error logging could be more simpler to report in contrast to profiling the ADLS storage account/containers/folders periodically and report then.&lt;/P&gt;&lt;P&gt;4. Also, I noticed that when we capture and log errors as ".txt" files, the error message at times is very detailed spanning into 100s of lines; not sure if it is the same on your end.&lt;/P&gt;&lt;P&gt;Appreciate a fruitful discussion on this.&lt;/P&gt;</description>
    <pubDate>Sat, 01 Feb 2025 17:39:03 GMT</pubDate>
    <dc:creator>noorbasha534</dc:creator>
    <dc:date>2025-02-01T17:39:03Z</dc:date>
    <item>
      <title>Error handling - SQL states</title>
      <link>https://community.databricks.com/t5/data-engineering/error-handling-sql-states/m-p/108309#M43030</link>
      <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;Few questions please -&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Has anyone successfully used the below way of dealing with error handling in PySpark (example: that contains data frames) as well as SQL code based notebooks -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;from pyspark.errors import PySparkException

try:
  spark.sql("SELECT * FROM does_not_exist").show()
except PySparkException as ex:
  print("Error Class       : " + ex.getErrorClass())
  print("Message parameters: " + str(ex.getMessageParameters()))
  print("SQLSTATE          : " + ex.getSqlState())
  print(ex)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. With this approach, is it advisable to log errors into tables? I think of having an errors table with 4 columns to capture the date, error class, message parameter and the sqlstate.&lt;/P&gt;&lt;P&gt;3. Currently, we are logging all errors as ".txt" files in an ADLS storage account. The idea is to produce an operational dashboard on the top of the errors. I think table based error logging could be more simpler to report in contrast to profiling the ADLS storage account/containers/folders periodically and report then.&lt;/P&gt;&lt;P&gt;4. Also, I noticed that when we capture and log errors as ".txt" files, the error message at times is very detailed spanning into 100s of lines; not sure if it is the same on your end.&lt;/P&gt;&lt;P&gt;Appreciate a fruitful discussion on this.&lt;/P&gt;</description>
      <pubDate>Sat, 01 Feb 2025 17:39:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-handling-sql-states/m-p/108309#M43030</guid>
      <dc:creator>noorbasha534</dc:creator>
      <dc:date>2025-02-01T17:39:03Z</dc:date>
    </item>
    <item>
      <title>Re: Error handling - SQL states</title>
      <link>https://community.databricks.com/t5/data-engineering/error-handling-sql-states/m-p/108328#M43037</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/124839"&gt;@noorbasha534&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;The approach you mentioned for error handling in PySpark using &lt;CODE&gt;PySparkException&lt;/CODE&gt; is a valid method. It allows you to catch specific exceptions related to PySpark operations and handle them accordingly.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;Logging errors into tables is advisable, especially if you plan to create an operational dashboard to monitor and analyze errors. Having an errors table with columns for the date, error class, message parameter, and SQL state can simplify the process of querying and reporting errors.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;Transitioning from logging errors as ".txt" files in an ADLS storage account to logging them into tables can indeed simplify reporting. Table-based error logging allows for more straightforward querying and analysis using SQL, which can be more efficient than periodically profiling storage accounts and containers.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Feb 2025 02:23:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-handling-sql-states/m-p/108328#M43037</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-02T02:23:10Z</dc:date>
    </item>
  </channel>
</rss>

