<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to load single line mode json file? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-load-single-line-mode-json-file/m-p/94356#M38881</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103629"&gt;@zmsoft&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Since the JSON is a single-line file, ensure it is being read correctly. Try setting the multiLine option to false (it defaults to false, but explicitly setting it ensures correct handling).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;stageDf = (
    spark.read.format("json")
    .option("multiLine", "false")
    .load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json')
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;If you are still encountering the issue after applying the above settings, then...&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Check If there are schema mismatches, set the overwriteSchema option to allow the schema to be updated:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;#Inspect the schema of the loaded DataFrame to ensure it is correct
stageDf.printSchema()
stageDf.show(truncate=False)

stageDf.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable(tempTableName)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 17 Oct 2024 01:01:03 GMT</pubDate>
    <dc:creator>Panda</dc:creator>
    <dc:date>2024-10-17T01:01:03Z</dc:date>
    <item>
      <title>How to load single line mode json file?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-load-single-line-mode-json-file/m-p/94016#M38803</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;The activity log store in adls gen2 container is a single line mode json file.&lt;/P&gt;&lt;P&gt;How to load single line mode json file, save data to delta table?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;zmsoft&lt;/P&gt;</description>
      <pubDate>Tue, 15 Oct 2024 08:30:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-load-single-line-mode-json-file/m-p/94016#M38803</guid>
      <dc:creator>zmsoft</dc:creator>
      <dc:date>2024-10-15T08:30:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to load single line mode json file?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-load-single-line-mode-json-file/m-p/94028#M38804</link>
      <description>&lt;P&gt;My code :&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;import datetime
from pyspark.sql.functions import lit
now=datetime.datetime.now()
tempTableName=f"xxx.xxx.xxxx";
stageDf = spark.read.format("json").load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json')

stageDf=stageDf.withColumn("LastUpdateTime_",lit(now))
stageDf.write.format("delta").mode("overwrite").saveAsTable(tempTableName)&lt;/LI-CODE&gt;&lt;P&gt;Error msg:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[DELTA_INVALID_FORMAT] Incompatible format detected.&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 15 Oct 2024 08:43:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-load-single-line-mode-json-file/m-p/94028#M38804</guid>
      <dc:creator>zmsoft</dc:creator>
      <dc:date>2024-10-15T08:43:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to load single line mode json file?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-load-single-line-mode-json-file/m-p/94356#M38881</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103629"&gt;@zmsoft&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Since the JSON is a single-line file, ensure it is being read correctly. Try setting the multiLine option to false (it defaults to false, but explicitly setting it ensures correct handling).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;stageDf = (
    spark.read.format("json")
    .option("multiLine", "false")
    .load('https://xxxx.blob.core.xxxx.xx/insights-activity-logs/xxxx/PT1H.json')
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;If you are still encountering the issue after applying the above settings, then...&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Check If there are schema mismatches, set the overwriteSchema option to allow the schema to be updated:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;#Inspect the schema of the loaded DataFrame to ensure it is correct
stageDf.printSchema()
stageDf.show(truncate=False)

stageDf.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable(tempTableName)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Oct 2024 01:01:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-load-single-line-mode-json-file/m-p/94356#M38881</guid>
      <dc:creator>Panda</dc:creator>
      <dc:date>2024-10-17T01:01:03Z</dc:date>
    </item>
  </channel>
</rss>

