cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

read json files on unity catalog

seefoods
Valued Contributor

Hello Guys, 

 

I have some issue when i load several json files which have a same schema on databricks. when i do

2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_alice_out.json 516.13 KB

2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_bob_out.json 516.13 KB

2025_08_10_21_55_00_2025_08_24_21_55_00_17Q1D_alice_out.json 514.13 KB

2025_08_10_21_55_00_2025_08_24_21_55_00_17Q51D_bob_out.json 418.13 KB

 

options = {
"multiLine": True,
"inferSchema": True,
"allowUnquotedFieldNames": True,
"allowSingleQuotes": True,
"allowBackslashEscapingAnyCharacter": True,
"recursiveFileLookup": True,
}

 
df = spark.read.format("json").options(**options).load("Volumes/folder/dir1")
it pick up randomly two files 

someone know how to solve this issue? 


Cordially, 

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @seefoods ,

That's what I suspected. When you use display() method in Azure Databricks to view a DataFrame, the number of rows displayed is limited to prevent browser crashes.
The same applies to notebook cell outputs. Table results are limited to 10,000 rows or 2 MB, whichever is lower.

Known limitations Databricks notebooks | Databricks on AWS

So, more reliable way of checking is for example to perform count operation on dataframe. 

View solution in original post

4 REPLIES 4

seefoods
Valued Contributor

sometine the dataframe return nothing. To enforce i have add but doesnt load all files present on Volume

Pathglobfilter="*{*alice*}*.json"

szymon_dybczak
Esteemed Contributor III

Hi @seefoods ,

Could you also share with us how do you check if some json files are not loaded? 

seefoods
Valued Contributor

Hello @szymon_dybczak , 

Its Ok i have check the history of the table. I'm so confuse about the command display() output and the really output write operation

Thanx

szymon_dybczak
Esteemed Contributor III

Hi @seefoods ,

That's what I suspected. When you use display() method in Azure Databricks to view a DataFrame, the number of rows displayed is limited to prevent browser crashes.
The same applies to notebook cell outputs. Table results are limited to 10,000 rows or 2 MB, whichever is lower.

Known limitations Databricks notebooks | Databricks on AWS

So, more reliable way of checking is for example to perform count operation on dataframe.