You can use apache hudi in databricks without a problem: - in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-28-2023 04:46 AM
You can use apache hudi in databricks without a problem:
- in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 for Databricks 12.2 LTS
- in cluster spark config, add three lines:
spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.serializer org.apache.spark.serializer.KryoSerializer
Happy streaming with hudi!
- Labels:
-
Hudi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-28-2023 09:04 AM
Thanks @Hubert Dudek. Is there any documentation available comparing the Hudi and Delta lake table formats ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2023 11:34 PM
I tried installing library and configuring spark configs, restarted the cluster and then in notebook ran the create cmd but it gives me error stating
java.io.FileNotFoundException: No such file or directory: s3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl
My cmd in python notebook :
%sql
create table hudi_cow_pt_tbl (
id bigint,
name string,
ts bigint,
dt string,
hh string
) using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts'
)
partitioned by (dt, hh)
location 's3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl';
And also this doesn't work and gives error : ModuleNotFoundError: No module named 'org.apache.hudi'
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._

