I installed hudi maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0
in Dbricks Runtime Ver : 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)
with spark config :
spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCatalog
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension
I ran this python cmd :
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
which gave me error :
ModuleNotFoundError: No module named 'org.apache.hudi'
And then i ran sql command in notebook :
create table hudi_cow_pt_tbl (
id bigint,
name string,
ts bigint,
dt string,
hh string
) using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts'
)
partitioned by (dt, hh)
location 's3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl';
which gives me error :
java.io.FileNotFoundException: No such file or directory: s3://incred-databricks-data/hudi_dms_data/hudi_cow_pt_tbl
where as this path exists : s3://incred-databricks-data/hudi_dms_data/