cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Hive Serde table vs Delta table

herry
New Contributor III

This might be stupid question. Does the Hive Serde table have the same features (e.g. transactions) comparing to the Delta table?

I tried to find the information in the Databricks documentation but I cannot find a clear answer.

I create the Hive Serde table using this SQL statement

CREATE EXTERNAL TABLE mydb.mytable (col1 string, col2 boolean)
ROW FORMAT serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://path/to/table/';

and I create the Delta table

CREATE mydb.mytable
  (col1 string, col2 boolean)
  USING DELTA
  LOCATION 's3://path/to/table'

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

serde is serializer / deserializer, in that case just to Parquet format . Delta is based on parquet (snapshot in delta is just regular parquet file) but have in addition commits etc. in separate files. So you saved file just as parquet but by using CREATE TABLE USING DELTA you converted it to delta umanaged table.

herry
New Contributor III

What does "delta unmanaged table" mean comparing to "delta managed table"?

-werners-
Esteemed Contributor III

it means where the actual data is stored: in your databricks account (managed, let databricks handle the data) or in an external storage (data lake, S3 etc) where you define how the data is stored.

herry
New Contributor III

How about the ACID transactions (commits) and Z-Ordering features? Are they available in the Hive Serde table?

-werners-
Esteemed Contributor III

AFAIK Hive SerDe is just Serializer and Deserializer (write and read data to/from storage).

Hive uses SerDe (and FileFormat) to read and write table rows. So it is not an actual file format like parquet, orc and also delta lake (which I consider a separate file format even though it is parquet on steroids).

So the comparison with delta lake is kinda awkward.

A better comparison would be Delta Lake vs Iceberg or Hudi.

https://databricks.com/session_na20/a-thorough-comparison-of-delta-lake-iceberg-and-hudi

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.