topic Re: How to load single line mode json file? in Data Engineering

How to load single line mode json file?

zmsoft — Tue, 15 Oct 2024 08:30:54 GMT

Hi there,

The activity log store in adls gen2 container is a single line mode json file.

How to load single line mode json file, save data to delta table?

Thanks & Regards,

zmsoft

Re: How to load single line mode json file?

zmsoft — Tue, 15 Oct 2024 08:43:59 GMT

My code :

import datetime from pyspark.sql.functions import lit now=datetime.datetime.now() tempTableName=f"xxx.xxx.xxxx"; stageDf = spark.read.format("json").load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json') stageDf=stageDf.withColumn("LastUpdateTime_",lit(now)) stageDf.write.format("delta").mode("overwrite").saveAsTable(tempTableName)

Error msg:

[DELTA_INVALID_FORMAT] Incompatible format detected.

Re: How to load single line mode json file?

Panda — Thu, 17 Oct 2024 01:01:03 GMT

@zmsoft
Since the JSON is a single-line file, ensure it is being read correctly. Try setting the multiLine option to false (it defaults to false, but explicitly setting it ensures correct handling).

stageDf = ( spark.read.format("json") .option("multiLine", "false") .load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json') )

If you are still encountering the issue after applying the above settings, then...

Check If there are schema mismatches, set the overwriteSchema option to allow the schema to be updated:

#Inspect the schema of the loaded DataFrame to ensure it is correct stageDf.printSchema() stageDf.show(truncate=False) stageDf.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable(tempTableName)