Databricks Community

Maxi1693 · ‎01-26-2024

Hi!

I am pulling data from a Blob storage to Databrick using Autoloader. This process is working well for almost 10 resources, but for a specific one I am getting this error java.lang.NullPointerException.

Looks like this issue in when I connect to the blob storage, but when I try to connect to this resource using spark.read.parquet("/mnt/path/to/files/*.parquet") the process works well.

So the issue is when I am runninng the Structure Streaming with format "couldFiles".

Below the code used:

downtimeuptime_df = (

spark.readStream.format("cloudFiles")

.option("cloudFiles.format", "parquet")

.option("cloudFiles.schemaLocation", f"/mnt/hist_data_delta/hist_data_delta.db/checkpoints/table_name_data_hmc")

.option("cloudFiles.schemaEvolutionMode", None)

.load(f'/mnt/source_data_bu/table_name_data/')

.select(

"*",

lit(_bu).alias("_bu"),

col("_metadata.file_path").alias("_source_file"),

current_timestamp().alias("_processing_time"),

)

Error description:

Py4JJavaError: An error occurred while calling o2702.load. : java.lang.NullPointerException at com.databricks.sql.cloudfiles.options.CloudFilesOptionsBase.$anonfun$userProvidedEvolutionMode$1(CloudFilesOptionsBase.scala:162) at scala.Option.map(Option.scala:230) at com.databricks.sql.cloudfiles.options.CloudFilesOptionsBase.<init>(CloudFilesOptionsBase.scala:162) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceOptions.<init>(CloudFilesSourceOptions.scala:45) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceProvider.sourceSchema(CloudFilesSourceProvider.scala:84) at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:266) at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:150) at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:150) at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:40) at org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:223) at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:267) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195) at py4j.ClientServerConnection.run(ClientServerConnection.java:115) at java.lang.Thread.run(Thread.java:750)

shan_chandra · ‎01-26-2024

@Maxi1693 - The value for the schemaEvolutionMode should be a string. could you please try changing the below from

 .option("cloudFiles.schemaEvolutionMode", None)

to

 .option("cloudFiles.schemaEvolutionMode", "none")

and let us know.

Reference: https://docs.databricks.com/en/ingestion/auto-loader/schema.html#how-does-auto-loader-schema-evoluti...

View solution in original post

shan_chandra · ‎01-26-2024

@Maxi1693 - The value for the schemaEvolutionMode should be a string. could you please try changing the below from

 .option("cloudFiles.schemaEvolutionMode", None)

to

 .option("cloudFiles.schemaEvolutionMode", "none")

and let us know.

Reference: https://docs.databricks.com/en/ingestion/auto-loader/schema.html#how-does-auto-loader-schema-evoluti...

Databricks Community

Error java.lang.NullPointerException using Autoloader

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!