topic Re: read json files on unity catalog in Data Engineering

read json files on unity catalog

seefoods — Thu, 04 Sep 2025 09:09:30 GMT

Hello Guys,

I have some issue when i load several json files which have a same schema on databricks. when i do

2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_alice_out.json 516.13 KB

2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_bob_out.json 516.13 KB

2025_08_10_21_55_00_2025_08_24_21_55_00_17Q1D_alice_out.json 514.13 KB

2025_08_10_21_55_00_2025_08_24_21_55_00_17Q51D_bob_out.json 418.13 KB

options = {

"multiLine": True,

"inferSchema": True,

"allowUnquotedFieldNames": True,

"allowSingleQuotes": True,

"allowBackslashEscapingAnyCharacter": True,

"recursiveFileLookup": True,

}

df = spark.read.format("json").options(**options).load("Volumes/folder/dir1")
it pick up randomly two files

someone know how to solve this issue?

Cordially,

Re: read json files on unity catalog

seefoods — Thu, 04 Sep 2025 09:19:31 GMT

sometine the dataframe return nothing. To enforce i have add but doesnt load all files present on Volume

Pathglobfilter="*{*alice*}*.json"

Re: read json files on unity catalog

szymon_dybczak — Thu, 04 Sep 2025 09:49:10 GMT

Hi @seefoods ,

Could you also share with us how do you check if some json files are not loaded?

Re: read json files on unity catalog

seefoods — Thu, 04 Sep 2025 15:32:26 GMT

Hello @szymon_dybczak ,

Its Ok i have check the history of the table. I'm so confuse about the command display() output and the really output write operation

Thanx

Re: read json files on unity catalog

szymon_dybczak — Thu, 04 Sep 2025 15:56:46 GMT

Hi @seefoods ,

That's what I suspected. When you use display() method in Azure Databricks to view a DataFrame, the number of rows displayed is limited to prevent browser crashes.
The same applies to notebook cell outputs. Table results are limited to 10,000 rows or 2 MB, whichever is lower.

Known limitations Databricks notebooks | Databricks on AWS

So, more reliable way of checking is for example to perform count operation on dataframe.