You can use apache hudi in databricks without a problem: - in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0...

Hubert-Dudek
Databricks MVP

You can use apache hudi in databricks without a problem:

- in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 for Databricks 12.2 LTS

- in cluster spark config, add three lines:

spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.serializer org.apache.spark.serializer.KryoSerializer

Happy streaming with hudi!

hudi


My blog: https://databrickster.medium.com/

pvignesh92
Honored Contributor

Thanks @Hubert Dudek​. Is there any documentation available comparing the Hudi and Delta lake table formats ?

ros
New Contributor III

I tried installing library and configuring spark configs, restarted the cluster and then in notebook ran the create cmd but it gives me error stating

java.io.FileNotFoundException: No such file or directory: s3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl

My cmd in python notebook :

%sql
create table hudi_cow_pt_tbl (
id bigint,
name string,
ts bigint,
dt string,
hh string
) using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts'
)
partitioned by (dt, hh)
location 's3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl';

And also this doesn't work and gives error : ModuleNotFoundError: No module named 'org.apache.hudi'

import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._

Libraryconfigs12.2 LTS