Databricks Community

herry · ‎12-15-2021

This might be stupid question. Does the Hive Serde table have the same features (e.g. transactions) comparing to the Delta table?

I tried to find the information in the Databricks documentation but I cannot find a clear answer.

I create the Hive Serde table using this SQL statement

CREATE EXTERNAL TABLE mydb.mytable (col1 string, col2 boolean)
ROW FORMAT serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://path/to/table/';

and I create the Delta table

CREATE mydb.mytable
  (col1 string, col2 boolean)
  USING DELTA
  LOCATION 's3://path/to/table'

Hubert-Dudek · ‎12-15-2021

serde is serializer / deserializer, in that case just to Parquet format . Delta is based on parquet (snapshot in delta is just regular parquet file) but have in addition commits etc. in separate files. So you saved file just as parquet but by using CREATE TABLE USING DELTA you converted it to delta umanaged table.

My blog: https://databrickster.medium.com/

herry · ‎12-15-2021

What does "delta unmanaged table" mean comparing to "delta managed table"?

-werners- · ‎12-15-2021

it means where the actual data is stored: in your databricks account (managed, let databricks handle the data) or in an external storage (data lake, S3 etc) where you define how the data is stored.

herry · ‎12-15-2021

How about the ACID transactions (commits) and Z-Ordering features? Are they available in the Hive Serde table?

-werners- · ‎12-16-2021

AFAIK Hive SerDe is just Serializer and Deserializer (write and read data to/from storage).

Hive uses SerDe (and FileFormat) to read and write table rows. So it is not an actual file format like parquet, orc and also delta lake (which I consider a separate file format even though it is parquet on steroids).

So the comparison with delta lake is kinda awkward.

A better comparison would be Delta Lake vs Iceberg or Hudi.

https://databricks.com/session_na20/a-thorough-comparison-of-delta-lake-iceberg-and-hudi