cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection

KristiLogos
Contributor
Hello, 
I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, but I keep seeig this Simba JDBC exception. I've even chunked out (offset) the query  to fetch/append 5000 rows at a time, with a sleep inbetween but I still see this error:
 
SparkException: Job aborted due to stage failure: Task 0 in stage 2947.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2947.0 (TID 15843) (10.21.40.215 executor 20): java.sql.SQLException: [Simba][JDBC](11380) Null pointer exception. at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQHTDataHandler.retrieveData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.googlebigquery.dataengine.BQResultSet.getData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.jdbc.common.SForwardResultSet.getData(Unknown Source) at bigquery.shaded.com.simba.googlebigquery.jdbc.common.SForwardResultSet.getString(Unknown Source) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13(JdbcUtils.scala:484) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13$adapted(JdbcUtils.scala:482) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:376) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFac...
File <command-6291825545273755>, line 88 85 df_chunk = df_chunk.withColumn("event_date", lit(event_date)) 87 # Append chunk to Bronze table ---> 88 df_chunk.write.option("mergeSchema", "true").mode("append").saveAsTable(bronze_table) 90 offset += BATCH_SIZE
3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @KristiLogos,

The error you are encountering, java.sql.SQLException: [Simba][JDBC](11380) Null pointer exception, is a known issue with the Simba JDBC driver for BigQuery. This error typically occurs when there is a problem with the data being fetched, such as null values or unexpected data types that the driver cannot handle. Could you please advise which JDBC version are you using?

You might need to adjust settings such as spark.sql.shuffle.partitions and spark.executor.memory

@Alberto_Umana My cluster's JDBC URL  shows: 2.6.25 or later
Also, where would I adjust the spark.sql.shuffle.partitions and spark.executor.memory? in the notebook? 

KristiLogos
Contributor

@Alberto_Umana  In addition to my last comment:
For adjusting the spark.sql.shuffle.partitions and spark.executor.memory, I tried this but I was still seeing the same error 

 

spark = (
    SparkSession.builder
    .appName("GA4 Bronze Table Ingestion")
    .config("spark.sql.shuffle.partitions", "100")
    .config("spark.executor.memory", "4g")
    .config("spark.driver.memory", "4g")
    .config("spark.sql.execution.arrow.pyspark.enabled", "true")
    .getOrCreate()
)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group