In Unity Catalog repartition method issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2024 04:44 AM
We are in the process of upgrading our notebooks to Unity Catalog. Previously, I was able to write data to an external Delta table using df.repartition(8).write. Save('path'), which correctly created multiple files. However, during the upgrade, in testing phase , this approach no longer produces the expected output.
I attempted to disable auto-compaction with spark.conf.set("spark.databricks.delta.autoCompact.enabled", "false"), but the operation still results in only one Parquet file being created in S3, rather than the intended 8. I need assistance to resolve this issue with partitioning and file output after the Unity Catalog upgrade.
Please help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2024 09:46 AM
Hi @Shiva3,
Maybe you can try this option in Delta Lake in Unity Catalog may have optimizedWrites enabled by default, which can reduce the number of files by automatically coalescing partitions during writes.
# Disable auto-compaction and optimized writes
spark.conf.set("spark.databricks.delta.autoCompact.enabled", "false")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "false")
Setting both configurations to false ensures that Delta Lake doesn’t automatically combine files or reduce partitions, allowing df.repartition(8) to retain 8 distinct files, then you can change the config again.
Try and comment!
Regards
-------------------
I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark
data:image/s3,"s3://crabby-images/d6be0/d6be025e52e1a61c30ea16a2fda1ef9155483c43" alt=""
data:image/s3,"s3://crabby-images/d6be0/d6be025e52e1a61c30ea16a2fda1ef9155483c43" alt=""