Databricks Community

jeremy98 · 8 hours ago

Hello everyone,
I am facing an issue with writing 100–500 million rows (partitioned by a column) into a newly created Delta table. I have set up a cluster with 256 GB of memory and 64 cores. However, the following code takes a considerable amount of time even when writing around 70 million rows:

            df.repartition(num_partitions*4, partition_col).write \
                .format("delta") \
                .mode("overwrite") \
                .partitionBy(partition_col) \
                .option("mergeSchema", "true") \
                .option("optimizeBucket", "true") \
                .option("maxRecordsPerFile", "1000000") \
                .option("autoOptimize.optimizeWrite", "true") \
                .option("autoOptimize.autoCompact", "true") \
                .option("autoOptimize.autoRepartition", "true") \
                .saveAsTable(f"{bronze_layer}.{table_name}")

What Do i need to adjust to speed up this written step?

jeremy98 · 3 hours ago

Someone can help me?

Databricks Community

Ways to write fast millions of rows inside a new delta table

Connect with Databricks Users in Your Area

Meet the Databricks MVPs

Databricks training invests in closing the data + AI skills gap across enterprises

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs

Data + AI Summit: Call for Presentations

Season's Speedings: Databricks SQL Delivers 4x Performance Boost Over Two Years