Re: Speed Up JDBC Write from Databricks Notebook t...

NandiniN · ‎01-01-2025

https://docs.databricks.com/en/connect/external-systems/jdbc.html#control-parallelism-for-jdbc-queri...

When writing to databases using JDBC, Apache Spark uses the number of partitions in memory to control parallelism. You can repartition data before writing to control parallelism. Avoid high number of partitions on large clusters to avoid overwhelming your remote database. The following example demonstrates repartitioning to eight partitions before writing:

(employees_table.repartition(8)
  .write
  .format("jdbc")
  .option("url", "<jdbc-url>")
  .option("dbtable", "<new-table-name>")
  .option("user", "<username>")
  .option("password", "<password>")
  .save()
)