cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

json file existing in volume but not showing in UI

amirabedhiafi
Contributor

I have some json files existing in a specific volume when I try to search for them they don't appear but when I query the the volume using python I am able to get them and read their content.

Any help ?

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP
3 ACCEPTED SOLUTIONS

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @amirabedhiafi,

Unity Catalog volumes are a storage layer for files, so itโ€™s normal that you can read JSON files from /Volumes/... with Python or SQL, but not have those same files show up as searchable document content in the workspace search experience. Databricks documents volume file access separately from workspace search and from working with files in Unity Catalog volumes.

The fact that the files are present and readable in the volume does not automatically mean their contents are indexed for search. JSON is absolutely supported as a file format in volumes and can be read programmatically, including with READ_FILES or standard Spark/Python workflows. But the built-in document-ingestion path for files in volumes, such as Knowledge Assistant, currently supports txt, pdf, md, ppt/pptx, and doc/docx for files-in-volume sources, which is why raw JSON files wonโ€™t behave like indexed documents there.

If the goal is to make the JSON content searchable, the pattern is to first load the JSON into a Delta table, for example, using read_files(..., format => 'json'), and then, if needed, build a Databricks AI Search index on that table. AI Search indexes are created from Delta tables rather than directly from raw files in a volume.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

szymon_dybczak
Esteemed Contributor III

Hi @amirabedhiafi ,

The workspace search bar simply doesn't crawl volume file contents or filenames - it's scoped to registered Unity Catalog metadata objects (tables, models) and notebook text.
Use Catalog Explorer to browse visually, or continue using Python enumerate and read files programmatically. So, there's nothing broken - it's just working as designed.

If my answer was helpful, please consider marking it as accepted solution.

View solution in original post

ShamenParis
New Contributor II

Hi @amirabedhiafi ,

Catalog Explorer search won't return these files. This is likely because raw files in Volumes can change rapidly and aren't tracked in the system tables in the same way structured data is.

Instead, I would suggest using a Genie Space for this. Check out the example below:

ShamenParis_2-1780867579183.png

 

View solution in original post

3 REPLIES 3

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @amirabedhiafi,

Unity Catalog volumes are a storage layer for files, so itโ€™s normal that you can read JSON files from /Volumes/... with Python or SQL, but not have those same files show up as searchable document content in the workspace search experience. Databricks documents volume file access separately from workspace search and from working with files in Unity Catalog volumes.

The fact that the files are present and readable in the volume does not automatically mean their contents are indexed for search. JSON is absolutely supported as a file format in volumes and can be read programmatically, including with READ_FILES or standard Spark/Python workflows. But the built-in document-ingestion path for files in volumes, such as Knowledge Assistant, currently supports txt, pdf, md, ppt/pptx, and doc/docx for files-in-volume sources, which is why raw JSON files wonโ€™t behave like indexed documents there.

If the goal is to make the JSON content searchable, the pattern is to first load the JSON into a Delta table, for example, using read_files(..., format => 'json'), and then, if needed, build a Databricks AI Search index on that table. AI Search indexes are created from Delta tables rather than directly from raw files in a volume.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

szymon_dybczak
Esteemed Contributor III

Hi @amirabedhiafi ,

The workspace search bar simply doesn't crawl volume file contents or filenames - it's scoped to registered Unity Catalog metadata objects (tables, models) and notebook text.
Use Catalog Explorer to browse visually, or continue using Python enumerate and read files programmatically. So, there's nothing broken - it's just working as designed.

If my answer was helpful, please consider marking it as accepted solution.

ShamenParis
New Contributor II

Hi @amirabedhiafi ,

Catalog Explorer search won't return these files. This is likely because raw files in Volumes can change rapidly and aren't tracked in the system tables in the same way structured data is.

Instead, I would suggest using a Genie Space for this. Check out the example below:

ShamenParis_2-1780867579183.png