cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

In Unity Catalog repartition method issue

Shiva3
New Contributor III

We are in the process of upgrading our notebooks to Unity Catalog. Previously, I was able to write data to an external Delta table using df.repartition(8).write. Save('path'), which correctly created multiple files. However, during the upgrade, in testing phase , this approach no longer produces the expected output.

I attempted to disable auto-compaction with spark.conf.set("spark.databricks.delta.autoCompact.enabled", "false"), but the operation still results in only one Parquet file being created in S3, rather than the intended 8. I need assistance to resolve this issue with partitioning and file output after the Unity Catalog upgrade.
Please help.

1 REPLY 1

agallard
Contributor

Hi @Shiva3,

Maybe you can try this option in Delta Lake in Unity Catalog may have optimizedWrites enabled by default, which can reduce the number of files by automatically coalescing partitions during writes.

 

# Disable auto-compaction and optimized writes

spark.conf.set("spark.databricks.delta.autoCompact.enabled", "false")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "false")

 

Setting both configurations to false ensures that Delta Lake doesn’t automatically combine files or reduce partitions, allowing df.repartition(8) to retain 8 distinct files, then you can change the config again.

Try and comment!

Regards

 

 

Alfonso Gallardo
-------------------
 I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group