Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-01-2025 11:32 AM
When writing to databases using JDBC, Apache Spark uses the number of partitions in memory to control parallelism. You can repartition data before writing to control parallelism. Avoid high number of partitions on large clusters to avoid overwhelming your remote database. The following example demonstrates repartitioning to eight partitions before writing:
(employees_table.repartition(8)
.write
.format("jdbc")
.option("url", "<jdbc-url>")
.option("dbtable", "<new-table-name>")
.option("user", "<username>")
.option("password", "<password>")
.save()
)