Thanks @Shua42.
I am using the JDBC URL of a running SQL warehouse to write data directly to a Databricks table from my local machine. But its write performance is inferior. I tried adding `batchSize` and `numPartitions`, but the performance still did not improve at all. Below is the snippet I am using.
spark = SparkSession.builder()
.appName("JsonToDatabricksLocalJDBC")
.master("local[*]")
.config("spark.driver.memory", driverMemory)
.config("spark.sql.warehouse.dir", "spark-warehouse-" + System.currentTimeMillis())
.getOrCreate();
Dataset<Row> rawDf = spark.read().schema(expectedSchema).json(inputJsonPath);
Dataset<Row> transformedDf = rawDf.select(
coalesce(col("app_name"), lit("UnknownApp")).alias("APP_NAME"),
coalesce(col("event.event_name"), lit("UnknownEvent")).alias("EVENT_NAME"),
coalesce(col("event.event_code"), lit("")).alias("EVENT_CODE"),
to_json(col("event.event_attributes")).alias("EVENT_ATTRIBUTES"),
to_json(col("event.user_attributes")).alias("USER_ATTRIBUTES"),
to_json(col("event.device_attributes")).alias("DEVICE_ATTRIBUTES")
);
Properties dfWriteJdbcProperties = new Properties();
dfWriteJdbcProperties.put("user", "token");
dfWriteJdbcProperties.put("password", dbToken);
transformedDf.write()
.mode(SaveMode.Append)
.option("batchsize", String.valueOf(jdbcBatchSize))
.option("numPartitions", String.valueOf(jdbcNumPartitionsForWrite))
.jdbc(jdbcUrl, fullTableNameInDb, dfWriteJdbcProperties);
Please suggest how can I increase its performance, since I have to insert a large volume of data into the Databricks table. Also I have attached the screenshot of the SQL Warehouse