Re: read json files on unity catalog

seefoods · ‎09-04-2025

Hello Guys,

I have some issue when i load several json files which have a same schema on databricks. when i do

2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_alice_out.json 516.13 KB

2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_bob_out.json 516.13 KB

2025_08_10_21_55_00_2025_08_24_21_55_00_17Q1D_alice_out.json 514.13 KB

2025_08_10_21_55_00_2025_08_24_21_55_00_17Q51D_bob_out.json 418.13 KB

options = {

"multiLine": True,

"inferSchema": True,

"allowUnquotedFieldNames": True,

"allowSingleQuotes": True,

"allowBackslashEscapingAnyCharacter": True,

"recursiveFileLookup": True,

}

df = spark.read.format("json").options(**options).load("Volumes/folder/dir1")
it pick up randomly two files

someone know how to solve this issue?

Cordially,

seefoods · ‎09-04-2025

sometine the dataframe return nothing. To enforce i have add but doesnt load all files present on Volume

Pathglobfilter="*{*alice*}*.json"

szymon_dybczak · ‎09-04-2025

Hi @seefoods ,

Could you also share with us how do you check if some json files are not loaded?

seefoods · ‎09-04-2025

Hello @szymon_dybczak ,

Its Ok i have check the history of the table. I'm so confuse about the command display() output and the really output write operation

Thanx

szymon_dybczak · ‎09-04-2025

Hi @seefoods ,

That's what I suspected. When you use display() method in Azure Databricks to view a DataFrame, the number of rows displayed is limited to prevent browser crashes.
The same applies to notebook cell outputs. Table results are limited to 10,000 rows or 2 MB, whichever is lower.

Known limitations Databricks notebooks | Databricks on AWS

So, more reliable way of checking is for example to perform count operation on dataframe.

View solution in original post