Default maximum spark streaming chunk size in delta files in each batch?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ04-02-2023 09:20 AM
working with delta files spark structure streaming , what is the maximum default chunk size in each batch?
How do identify this type of spark configuration in databricks?
#[Databricks SQL]โ #[Spark streaming]โ #[Spark structured streaming]โ #Sparkโ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ04-03-2023 07:26 AM
Hello @KARTHICK Nโ ,
The default value for spark.sql.files.maxPartitionBytes is 128 MB. These defaults are in the Apache Spark documentation https://spark.apache.org/docs/latest/sql-performance-tuning.html (unless there might be some overrides).
To check the configurations you can navigate to the Environment tab of the Spark UI and check for the config.
Hope that helps.
Thanks & Regards,
Nandini
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ04-04-2023 10:03 PM
Thanks @Nandini Nโ reply,
I couldn't see this configuration params in databricks job-cluster spark UI, We are using job-cluster for streaming jobs and I don't see this configuration in environment tab in spark UI page.
Is this applicable for streaming concept (because we are using streaming with foreachbatch concept in our project)?
Could you help me to figure it out?
#[Databricks SQL]โ #[Azure databricks]โ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ08-29-2024 08:24 AM
@NandiniN , I couldn't able to see the setting for structured read stream batch size control during the processing the data by using foreachbatch.
Is this possible to control the read stream by records count per each batch in structured streaming?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-31-2024 03:00 AM
maxFilesPerTrigger
: This option specifies how many new files should be considered in every micro-batch. The default value is 1000.maxBytesPerTrigger
: This option sets a soft maximum on the amount of data processed in each micro-batch. It is not set by default but can be configured to limit the data processed per batch.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-31-2024 03:02 AM
doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html
Also, what is the challenge while using foreachbatch?
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)