How to load single line mode json file?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2024 01:30 AM
Hi there,
The activity log store in adls gen2 container is a single line mode json file.
How to load single line mode json file, save data to delta table?
Thanks & Regards,
zmsoft
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2024 01:43 AM
My code :
import datetime
from pyspark.sql.functions import lit
now=datetime.datetime.now()
tempTableName=f"xxx.xxx.xxxx";
stageDf = spark.read.format("json").load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json')
stageDf=stageDf.withColumn("LastUpdateTime_",lit(now))
stageDf.write.format("delta").mode("overwrite").saveAsTable(tempTableName)
Error msg:
[DELTA_INVALID_FORMAT] Incompatible format detected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-16-2024 05:56 PM - edited 10-16-2024 06:01 PM
@zmsoft
Since the JSON is a single-line file, ensure it is being read correctly. Try setting the multiLine option to false (it defaults to false, but explicitly setting it ensures correct handling).
stageDf = (
spark.read.format("json")
.option("multiLine", "false")
.load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json')
)
If you are still encountering the issue after applying the above settings, then...
Check If there are schema mismatches, set the overwriteSchema option to allow the schema to be updated:
#Inspect the schema of the loaded DataFrame to ensure it is correct
stageDf.printSchema()
stageDf.show(truncate=False)
stageDf.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable(tempTableName)

