12-15-2021 04:47 AM
This might be stupid question. Does the Hive Serde table have the same features (e.g. transactions) comparing to the Delta table?
I tried to find the information in the Databricks documentation but I cannot find a clear answer.
I create the Hive Serde table using this SQL statement
CREATE EXTERNAL TABLE mydb.mytable (col1 string, col2 boolean)
ROW FORMAT serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://path/to/table/';
and I create the Delta table
CREATE mydb.mytable
(col1 string, col2 boolean)
USING DELTA
LOCATION 's3://path/to/table'
12-15-2021 05:01 AM
serde is serializer / deserializer, in that case just to Parquet format . Delta is based on parquet (snapshot in delta is just regular parquet file) but have in addition commits etc. in separate files. So you saved file just as parquet but by using CREATE TABLE USING DELTA you converted it to delta umanaged table.
12-15-2021 05:21 AM
What does "delta unmanaged table" mean comparing to "delta managed table"?
12-15-2021 06:08 AM
it means where the actual data is stored: in your databricks account (managed, let databricks handle the data) or in an external storage (data lake, S3 etc) where you define how the data is stored.
12-15-2021 06:28 AM
How about the ACID transactions (commits) and Z-Ordering features? Are they available in the Hive Serde table?
12-16-2021 01:35 AM
AFAIK Hive SerDe is just Serializer and Deserializer (write and read data to/from storage).
Hive uses SerDe (and FileFormat) to read and write table rows. So it is not an actual file format like parquet, orc and also delta lake (which I consider a separate file format even though it is parquet on steroids).
So the comparison with delta lake is kinda awkward.
A better comparison would be Delta Lake vs Iceberg or Hudi.
https://databricks.com/session_na20/a-thorough-comparison-of-delta-lake-iceberg-and-hudi
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group