cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark read /write jdbc netsuite TIMESTAMP error NullPointerException

KristiLogos
Contributor

I've connected to Netsuite via Suite Analytics JDBC connection in a Databricks cluster. I'm seeing a NullPointerException because of the lastmodifieddate column in my netsuite classification table (ie. if I do Select * from classification I see the NullPointerException and I've narrowed it down to seeing this issue because of the


except Exception as e:
print("\nERROR: An exception occurred.")
# The full error will be raised, giving you the complete stack trace.
raise e

 

Error

SparkException: Job aborted due to stage failure: Task 0 in stage 31.0 failed 4 times, most recent failure: Lost task 0.3 in stage 31.0 (TID 44) (10.21.40.199 executor 0): java.lang.NullPointerException at com.netsuite.jdbc.base.hk.a(oajc:654) at com.netsuite.jdbc.base.dj.c(oajc:494) at com.netsuite.jdbc.base.ax.a(oajc:1926) at com.netsuite.jdbc.base.fv.getTimestamp(oajc:5211) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$15(JdbcUtils.scala:521) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$15$adapted(JdbcUtils.scala:520) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:385) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:366) at  scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32) at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1139) at com.databricks.photon.NativeRowBatchIterator.hasNext(NativeRowBatchIterator.java:44) at 0xc7c5392 <photon>.HasNext(external/workspace_spark_3_5/photon/jni-wrappers/jni-row-batch-iterator.cc:50)

-> 90 raise e
File  line 74 66 print(f"\nAttempting to write data to CSV at: {output_path_csv}") 67 # df_cleaned = df.na.fill('') 68 # print('cleaned the df') 70 df.write \ 71 .format("csv") \ 72 .option("header", "true") \ 73 .mode("overwrite") \ ---> 74 .save(output_path_csv) 76 print(f"SUCCESS: Wrote data to CSV.") 78 # # --- Write to Delta Table --- 79 # print(f"\nAttempting to write data to Delta table at: {output_path_delta}") 80 # df.write \ (...) 83 # .save(output_path_delta) 84 # print(f"SUCCESS: Wrote data to Delta table.")
3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi @KristiLogos ,

Could you check if you have null values in lastmodifieddate? If so, try to replace null values with some fallback value:

COALESCE(lastmodifieddate, '1900-01-01 00:00:00') as lastmodifieddate

 

Also, the TLDR of my last comment that I wrote to you includes the code I was noticing the error. I had also previously tested with coalesce and to_char() :

COALESCE(lastmodifieddate, '1900-01-01 00:00:00') as lastmodifieddate

I see the error:

java.sql.SQLException: [NetSuite][OpenAccess SDK JDBC Driver][OpenAccess SDK SQL Engine]

Failed to retrieve data. Error ticket# mcwb7tcx1eia537oa41ma

KristiLogos
Contributor

Hi @szymon_dybczak

I'm not sure why my question looks like this, I thought I fixed it. 

I've connected to Netsuite via Suite Analytics JDBC connection in a Databricks cluster. I'm seeing a NullPointerException because of the lastmodifieddate column in my netsuite classification table (ie. if I do Select * from classification I see the NullPointerException and I've narrowed it down to seeing this issue because of the lastmodifieddate TIMESTAMP column), I noticed the error was occuring at this column after writing out each column in the select statement.
I"ve tested with other tables as well (entity, department…), and this seems to be happening with all TIMESTAMP columns not just lastmodifieddate.

Also , that when I query the classification table in dbeaver I see NO null rows and I see no errors when using the below test queries. If I add coalesce or cast or etc. to lastmodifieddate field I see the "failed to retrieve data" error below:

(Note I have some test code commented out )

This is my code:

jdbc_url = "jdbc:ns://x.connect.api.netsuite.com:1708;ServerDataSource=NetSuite2.com;Encrypted=1;NegotiateSSLClose=false;CustomProperties=(AccountID=x;RoleID=1030)"
driver = "com.netsuite.jdbc.openaccess.OpenAccessDriver"

secret_scope = "fl-da-kv-app-scope"
secret_key_password = "netsuite-password"

user = "user"
password = dbutils.secrets.get(scope=secret_scope, key=secret_key_password)

netsuite_query = """
SELECT
lastmodifieddate
# CAST(lastmodifieddate AS VARCHAR(255)) AS lastmodifieddate_str
# COALESCE(lastmodifieddate, TO_DATE('1900-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')) AS lastmodifieddate
FROM "Fl - Accountant".classification
"""


# netsuite_query = """
# SELECT
# custrecord_nspbcs_class_planning_cat,
# custrecord_lmry_class_code,
# externalid,
# fullname,
# id,
# includechildren,
# isinactive,
# name,
# parent,
# subsidiary
# FROM "Fl - Accountant".classification
# """

# netsuite_query = """SELECT custrecord_lmry_class_code FROM "Fl- Accountant".classification"""
# netsuite_query = """
# SELECT
# COALESCE(custrecord_lmry_class_code, 'a') AS custrecord_lmry_class_code,
# COALESCE(custrecord_nspbcs_class_planning_cat, 0) AS custrecord_nspbcs_class_planning_cat,

# id as id

# FROM "Fl - Accountant".classification
# """

output_path_csv = "/tmp/netsuite_classification_full1.csv"

# output_path_delta = "/tmp/delta/netsuite_classification"


try:
print("Attempting to read full table from NetSuite...")
print(f"Executing query: {netsuite_query}")


df = spark.read \
.format("jdbc") \
.option("url", jdbc_url) \
.option("query", netsuite_query) \
.option("user", user) \
.option("password", password) \
.option("driver", driver) \
.load()

print("\nSUCCESS: Read data from NetSuite into a DataFrame.")

# if theres an error w. the count that means the query is wrong.
row_count = df.count()
print(f"SUCCESS: Read {row_count} rows from NetSuite... ")
# df.show(5)

print(f"\nAttempting to write data to CSV at: {output_path_csv}")
# df_cleaned = df.na.fill('')
# print('cleaned the df')

df.write \
.format("csv") \
.option("header", "true") \
.mode("overwrite") \
.save(output_path_csv)

print(f"SUCCESS: Wrote data to CSV.")

# # --- Write to Delta Table ---
# print(f"\nAttempting to write data to Delta table at: {output_path_delta}")
# df.write \
# .format("delta") \
# .mode("overwrite") \
# .save(output_path_delta)
# print(f"SUCCESS: Wrote data to Delta table.")


except Exception as e:
print("\nERROR: An exception occurred.")
# The full error will be raised, giving you the complete stack trace.
raise e

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now