Databricks Community

Shiva3 · ‎10-29-2024

We are in the process of upgrading our notebooks to Unity Catalog. Previously, I was able to write data to an external Delta table using df.repartition(8).write. Save('path'), which correctly created multiple files. However, during the upgrade, in testing phase , this approach no longer produces the expected output.

I attempted to disable auto-compaction with spark.conf.set("spark.databricks.delta.autoCompact.enabled", "false"), but the operation still results in only one Parquet file being created in S3, rather than the intended 8. I need assistance to resolve this issue with partitioning and file output after the Unity Catalog upgrade.
Please help.

agallard · ‎10-29-2024

Hi @Shiva3,

Maybe you can try this option in Delta Lake in Unity Catalog may have optimizedWrites enabled by default, which can reduce the number of files by automatically coalescing partitions during writes.

# Disable auto-compaction and optimized writes

spark.conf.set("spark.databricks.delta.autoCompact.enabled", "false")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "false")

Setting both configurations to false ensures that Delta Lake doesn’t automatically combine files or reduce partitions, allowing df.repartition(8) to retain 8 distinct files, then you can change the config again.

Try and comment!

Regards

Alfonso Gallardo
-------------------
 I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark

View solution in original post