cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Automatic schema rendering of files in unity catalog

dspatil
Visitor

Hi team,

Can anyone please confirm if Unity catalog supports automatic schema rendering from csv, json, pdfs, and structured/unstructured files?

Meaning, if i create a volume with path/location to folder (or S3 bucket) having such files, can unity catalog render and show schema on UI automatically?

Please advice.

Thanks,

Deepak

2 REPLIES 2

gchandra
Contributor III

UC is a Governance tool.

If you have structured or semi-structured data in Volume, you can view the data by using the select statement.

 

select * from csv.`/Volumes/folder/file.csv`;  (backtick)
select * from json.`/Volumes/folder/file.json`

Once you are convinced you can create a table 

create table <> as select * from csv.`/Volumes/folder/file.csv`; 

Its not possible to find the schema on unstructured data.



~

dspatil
Visitor

ok..thanks a lot @gchandra .

So, I am new to Unity Catalog and particularly interested (and evaluating) the open sourced version of unity catalog (https://www.unitycatalog.io/)

I know that, we can create volumes and those in turn can point to csv, json, pdf etc file.... and we can read the content of such files by using volume read commands as you mentioned. 

I was wondering, if unity catalog does schema discovery of such files on own? (i.e. automatically scanning csv and json files for structure and saved metadata about columns etc. and displays on UI?) This was my assumption. please correct me if I am wrong here.

If it can't auto detect schemas of files, whats the best way to detect and save file schemas in Unity catalog?

And on other side, how does data lineages are tracked and showed? (file changes happened in a path of volumes)

Thanks a lot in advance.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group