cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding Spark Architecture during Table Creation

Ramakrishnan83
New Contributor III

Team ,

I am trying understand how the parquet files and JSON under the delta log folder stores the data behind the scenes

Table Creation:

from delta.tables import *
DeltaTable.create(spark) \
.tableName("employee") \
.addColumn("id", "INT") \
.addColumn("name", "STRING") \
.addColumn("dept", "STRING")\
.addColumn("salary", "INT") \
.location("/FileStore/tables/delta/demo2") \
.execute()
Step 2: 
%sql
INSERT INTO employee values(100,"Ram","CSE",1000)
Step 3:
%sql
select * from delta.`/FileStore/tables/delta/demo2`
Ramakrishnan83_0-1710772217666.png

Note: I made 2 inserts , so 2 parquet files

Challenge:

I am trying to read the JSON, CRC and Parquet files to see the contents in it . But I am getting the errors

Ramakrishnan83_1-1710772318911.png

 Output of this command give me the structure of a JSON , not the actual data stored

Ramakrishnan83_2-1710772374126.png

Parquet file reading throws this error . 

Note: My cluster is running with DBR 12.2 LTS

1 ACCEPTED SOLUTION

Accepted Solutions

shan_chandra
Honored Contributor III
Honored Contributor III
1 REPLY 1

shan_chandra
Honored Contributor III
Honored Contributor III

@Ramakrishnan83  - Kindly go through the blog post - https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html which discuss in detail on delta's transaction log.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.