cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Reading snappy.parquet

Hritik_Moon
New Contributor II

I stored a dataframe as delta in the catalog. It created multiple folders with snappy.parquet files. Is there a way to read these snappy.parquet files.

it reads with pandas but with spark it gives error "incompatible format"

2 REPLIES 2

Khaja_Zaffer
Contributor III

Hello good day @Hritik_Moon 

That incompatible format is expected as when you try to read in parquet because of presence of delta_log created with delta format which follows acid principals its like AnalysisException.

recommended would be read in delta format only 

else: the alternative would be copy those .snappy.parquet files or file into a desired folder and read them seperately. 

Let me share a medium article I found for this issue: 
https://medium.com/%40ishanpradhan/how-to-read-a-snappy-parquet-file-in-databricks-696538cd0efc

 

Thank you. 
I am waiting for the solution from other contributors as well. they can share their approach. 

Prajapathy_NKR
New Contributor

@Hritik_Moon 

Try to read the file as delta. 

path/delta_file_name/
- parquet files
- delta_log/

since you are using spark, use this, spark.read.format("delta").load("path/delta_file_name").

Delta internally stores the data as parquet and delta log contains the metadata of transactions. You don't need to touch these files unless you are experimenting. ๐Ÿ™‚

For more info, please go through this, https://docs.databricks.com/aws/en/delta/tutorial.

Hope this solved your issue.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now