Databricks Community

Ramakrishnan83 · ‎03-18-2024

Team ,

I am trying understand how the parquet files and JSON under the delta log folder stores the data behind the scenes

Table Creation:

from delta.tables import *

DeltaTable.create(spark) \

.tableName("employee") \

.addColumn("id", "INT") \

.addColumn("name", "STRING") \

.addColumn("dept", "STRING")\

.addColumn("salary", "INT") \

.location("/FileStore/tables/delta/demo2") \

.execute()

Step 2:

%sql

INSERT INTO employee values(100,"Ram","CSE",1000)

Step 3:

%sql

select * from delta.`/FileStore/tables/delta/demo2`

Note: I made 2 inserts , so 2 parquet files

Challenge:

I am trying to read the JSON, CRC and Parquet files to see the contents in it . But I am getting the errors

Output of this command give me the structure of a JSON , not the actual data stored

Parquet file reading throws this error .

Note: My cluster is running with DBR 12.2 LTS

shan_chandra · ‎03-18-2024

shan_chandra · ‎03-18-2024

Understanding Spark Architecture during Table Creation