Data Engineering

Forum Posts

Sorted by:

by sanjay • Valued Contributor II

03-29-2023 11:59:29 PM

36153 Views
21 replies
18 kudos

Resolved! How to limit number of files in each batch in streaming batch processing

Hi,I am running batch job which processes incoming files. I am trying to limit number of files in each batch process so added maxFilesPerTrigger option. But its not working. It processes all incoming files at once.(spark.readStream.format("delta").lo...

Data Engineering

36153 Views
21 replies
18 kudos

03-29-2023 11:59:29 PM

View Replies

Latest Reply

mjedy7
New Contributor II

11-24-2024 10:50:17 PM

18 kudos

Hi @Sandeep ,Can we usespark.readStream.format("delta").option(""maxBytesPerTrigger", "50G").load(silver_path).writeStream.option("checkpointLocation", gold_checkpoint_path).trigger(availableNow=True).foreachBatch(foreachBatchFunction).start()

18 kudos

11-24-2024 10:50:17 PM

20 More Replies

by kk007 • New Contributor III

04-07-2023 10:19:36 AM

4962 Views
4 replies
4 kudos

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

I am reading a 83MB json file using " spark.read.json(storage_path)", when I display the data is seems displaying fine, but when I try command line count, it complains about file size , being more than 400MB, which is not true.Photon JSON reader erro...

Data Engineering

4962 Views
4 replies
4 kudos

04-07-2023 10:19:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 8:47:33 AM

4 kudos

@Kamal Kumar :The error message suggests that the JSON document size is exceeding the maximum allowed size of 400MB. This could be caused by one or more documents in your JSON file being larger than this limit. It is not a bug, but a limitation set ...

4 kudos

04-09-2023 8:47:33 AM

3 More Replies

by User16826992666 • Databricks Employee

06-16-2021 3:28:54 PM

2504 Views
1 replies
0 kudos

Resolved! Is there any file size overhead when I save models using MLflow?

Data Engineering

2504 Views
1 replies
0 kudos

06-16-2021 3:28:54 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 12:56:25 PM

0 kudos

There shouldn't be. Generally speaking, models will be serialized according to their 'native' format for well-known libraries like Tensorflow, xgboost, sklearn, etc. Custom model will be saved with pickle. The files exist on distributed storage as ar...

0 kudos

06-17-2021 12:56:25 PM

Databricks Community

Resolved! How to limit number of files in each batch in streaming batch processing

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

Resolved! Is there any file size overhead when I save models using MLflow?