Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2023 07:02 AM
I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (and subsequent sequentialread) operations. Lowering amount of shuffle partitions helps solve this. On top of that, consider using spark.sql.streaming.noDataMicroBatches.enabled so that empty microbatches are ignored.