cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to load single line mode json file?

zmsoft
Contributor

Hi there,

The activity log store in adls gen2 container is a single line mode json file.

How to load single line mode json file, save data to delta table?

 

Thanks & Regards,

zmsoft

2 REPLIES 2

zmsoft
Contributor

My code :

import datetime
from pyspark.sql.functions import lit
now=datetime.datetime.now()
tempTableName=f"xxx.xxx.xxxx";
stageDf = spark.read.format("json").load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json')

stageDf=stageDf.withColumn("LastUpdateTime_",lit(now))
stageDf.write.format("delta").mode("overwrite").saveAsTable(tempTableName)

Error msg:

[DELTA_INVALID_FORMAT] Incompatible format detected.

Panda
Valued Contributor

@zmsoft 
Since the JSON is a single-line file, ensure it is being read correctly. Try setting the multiLine option to false (it defaults to false, but explicitly setting it ensures correct handling).

 

stageDf = (
    spark.read.format("json")
    .option("multiLine", "false")
    .load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json')
)

 

If you are still encountering the issue after applying the above settings, then...

Check If there are schema mismatches, set the overwriteSchema option to allow the schema to be updated:

#Inspect the schema of the loaded DataFrame to ensure it is correct
stageDf.printSchema()
stageDf.show(truncate=False)

stageDf.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable(tempTableName)