"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

hare — Thu, 19 May 2022 12:40:47 GMT

Hi All,

We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".

We are getting an error "

AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned script.

df = spark.read.json(source_location,multiLine=True,pathGlobFilter='2022-05-18T02_50_01_914Z_student.json')

df.createOrReplaceTempView('v_df')

spark.sql("select count(*) from v_df").display()

can anyone please do let me know if we have any option to read JSON files which has the blob type "Append Blob"? - We are using "Databricks" - "PySpark"

Re: "Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

User16856839485 — Thu, 13 Oct 2022 16:25:54 GMT

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:

https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga=2.258782666.1514035379.1665677010-653321784.1587659507

topic Re: "Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB" in Data Engineering

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Re: "Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"