<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/108101#M42973</link>
    <description>&lt;DIV&gt;Hello,&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, but I keep seeig this Simba JDBC exception. I've even chunked out (offset) the query&amp;nbsp; to fetch/append 5000 rows at a time, with a sleep inbetween but I still see this error:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;SparkException: &lt;/SPAN&gt;Job aborted due to stage failure: Task 0 in stage 2947.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2947.0 (TID 15843) (10.21.40.215 executor 20): java.sql.SQLException: [Simba][JDBC](11380) Null pointer exception. at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTDataHandler.retrieveData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQResultSet.getData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.jdbc.common.SForwardResultSet.getData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.jdbc.common.SForwardResultSet.getString(Unknown Source) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13(JdbcUtils.scala:484) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13$adapted(JdbcUtils.scala:482) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:376) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFac...&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;&lt;A target="_blank"&gt;&amp;lt;command-6291825545273755&amp;gt;&lt;/A&gt;, line 88&lt;/SPAN&gt; &lt;SPAN&gt;85&lt;/SPAN&gt; df_chunk &lt;SPAN&gt;=&lt;/SPAN&gt; df_chunk&lt;SPAN&gt;.&lt;/SPAN&gt;withColumn(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;event_date&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, lit(event_date)) &lt;SPAN&gt;87&lt;/SPAN&gt; &lt;SPAN&gt;# Append chunk to Bronze table&lt;/SPAN&gt; &lt;SPAN class=""&gt;---&amp;gt; 88&lt;/SPAN&gt; df_chunk&lt;SPAN&gt;.&lt;/SPAN&gt;write&lt;SPAN&gt;.&lt;/SPAN&gt;option(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;mergeSchema&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;true&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;mode(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;append&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;saveAsTable(bronze_table) &lt;SPAN&gt;90&lt;/SPAN&gt; offset &lt;SPAN&gt;+&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; BATCH_SIZE&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Fri, 31 Jan 2025 15:19:40 GMT</pubDate>
    <dc:creator>KristiLogos</dc:creator>
    <dc:date>2025-01-31T15:19:40Z</dc:date>
    <item>
      <title>Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection</title>
      <link>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/108101#M42973</link>
      <description>&lt;DIV&gt;Hello,&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, but I keep seeig this Simba JDBC exception. I've even chunked out (offset) the query&amp;nbsp; to fetch/append 5000 rows at a time, with a sleep inbetween but I still see this error:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;SparkException: &lt;/SPAN&gt;Job aborted due to stage failure: Task 0 in stage 2947.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2947.0 (TID 15843) (10.21.40.215 executor 20): java.sql.SQLException: [Simba][JDBC](11380) Null pointer exception. at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTDataHandler.retrieveData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQResultSet.getData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.jdbc.common.SForwardResultSet.getData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.jdbc.common.SForwardResultSet.getString(Unknown Source) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13(JdbcUtils.scala:484) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13$adapted(JdbcUtils.scala:482) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:376) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFac...&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;&lt;A target="_blank"&gt;&amp;lt;command-6291825545273755&amp;gt;&lt;/A&gt;, line 88&lt;/SPAN&gt; &lt;SPAN&gt;85&lt;/SPAN&gt; df_chunk &lt;SPAN&gt;=&lt;/SPAN&gt; df_chunk&lt;SPAN&gt;.&lt;/SPAN&gt;withColumn(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;event_date&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, lit(event_date)) &lt;SPAN&gt;87&lt;/SPAN&gt; &lt;SPAN&gt;# Append chunk to Bronze table&lt;/SPAN&gt; &lt;SPAN class=""&gt;---&amp;gt; 88&lt;/SPAN&gt; df_chunk&lt;SPAN&gt;.&lt;/SPAN&gt;write&lt;SPAN&gt;.&lt;/SPAN&gt;option(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;mergeSchema&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, &lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;true&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;mode(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;append&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;saveAsTable(bronze_table) &lt;SPAN&gt;90&lt;/SPAN&gt; offset &lt;SPAN&gt;+&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; BATCH_SIZE&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 31 Jan 2025 15:19:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/108101#M42973</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2025-01-31T15:19:40Z</dc:date>
    </item>
    <item>
      <title>Re: Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection</title>
      <link>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/108304#M43029</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122107"&gt;@KristiLogos&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;The error you are encountering, &lt;CODE&gt;java.sql.SQLException: [Simba][JDBC](11380) Null pointer exception&lt;/CODE&gt;, is a known issue with the Simba JDBC driver for BigQuery. This error typically occurs when there is a problem with the data being fetched, such as null values or unexpected data types that the driver cannot handle. Could you please advise which JDBC version are you using?&lt;/P&gt;
&lt;P&gt;You might need to adjust settings such as &lt;CODE&gt;spark.sql.shuffle.partitions&lt;/CODE&gt; and &lt;CODE&gt;spark.executor.memory&lt;BR /&gt;&lt;BR /&gt;&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Feb 2025 16:11:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/108304#M43029</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-01T16:11:17Z</dc:date>
    </item>
    <item>
      <title>Re: Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection</title>
      <link>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/109659#M43362</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp;My cluster's JDBC URL&amp;nbsp; shows:&lt;STRONG&gt; 2.6.25 or later&lt;BR /&gt;&lt;/STRONG&gt;Also, where would I adjust the spark.sql.shuffle.partitions and spark.executor.memory? in the notebook?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2025 17:22:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/109659#M43362</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2025-02-10T17:22:35Z</dc:date>
    </item>
    <item>
      <title>Re: Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection</title>
      <link>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/109663#M43363</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp; In addition to my last comment:&lt;BR /&gt;For adjusting the spark.sql.shuffle.partitions and spark.executor.memory, I tried this but I was still seeing the same error&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; SparkSession.builder&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;appName&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"GA4 Bronze Table Ingestion"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;config&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"spark.sql.shuffle.partitions"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"100"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;config&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"spark.executor.memory"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"4g"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;config&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"spark.driver.memory"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"4g"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;config&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"spark.sql.execution.arrow.pyspark.enabled"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;getOrCreate&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 10 Feb 2025 18:15:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/109663#M43363</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2025-02-10T18:15:50Z</dc:date>
    </item>
    <item>
      <title>Re: Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection</title>
      <link>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/121851#M46574</link>
      <description>&lt;P&gt;I also have this issue, and I resolved it by cast all the records columns in bigquery to string before I dump the data.&lt;BR /&gt;&lt;BR /&gt;I first create a view like&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;create view xxx as 
select 
string_1,
string_2,
string_3,
to_json_string(record_1) as record_1,
to_json_string(record_2) as record_2,
.
.
.
from yyy&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;I don't know which record column has issue, so I cast them all.&lt;BR /&gt;&lt;BR /&gt;Then in databricks, I query the data only from the view xxx. instead of the original table yyy. With this method, I can dump millions of rows from bigquery view xxx at once.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Jun 2025 07:45:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simba-jdbc-exception-when-querying-tables-via-bigquery/m-p/121851#M46574</guid>
      <dc:creator>tsekityam_2</dc:creator>
      <dc:date>2025-06-16T07:45:35Z</dc:date>
    </item>
  </channel>
</rss>

