Re: You can use apache hudi in databricks without ...

Hubert-Dudek · ‎03-28-2023

You can use apache hudi in databricks without a problem:

- in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 for Databricks 12.2 LTS

- in cluster spark config, add three lines:

spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.serializer org.apache.spark.serializer.KryoSerializer

Happy streaming with hudi!

My blog: https://databrickster.medium.com/

pvignesh92 · ‎03-28-2023

Thanks @Hubert Dudek. Is there any documentation available comparing the Hudi and Delta lake table formats ?

ros · ‎05-15-2023

I tried installing library and configuring spark configs, restarted the cluster and then in notebook ran the create cmd but it gives me error stating

java.io.FileNotFoundException: No such file or directory: s3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl

My cmd in python notebook :

%sql
create table hudi_cow_pt_tbl (
id bigint,
name string,
ts bigint,
dt string,
hh string
) using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts'
)
partitioned by (dt, hh)
location 's3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl';

And also this doesn't work and gives error : ModuleNotFoundError: No module named 'org.apache.hudi'

import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._

You can use apache hudi in databricks without a problem: - in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0...