Data Engineering

by sanjay • Valued Contributor II

03-29-2023 11:59:29 PM

31764 Views
21 replies
18 kudos

Resolved! How to limit number of files in each batch in streaming batch processing

Hi,I am running batch job which processes incoming files. I am trying to limit number of files in each batch process so added maxFilesPerTrigger option. But its not working. It processes all incoming files at once.(spark.readStream.format("delta").lo...

Data Engineering

31764 Views
21 replies
18 kudos

03-29-2023 11:59:29 PM

View Replies

Latest Reply

mjedy7
New Contributor II

11-24-2024 10:50:17 PM

18 kudos

Hi @Sandeep ,Can we usespark.readStream.format("delta").option(""maxBytesPerTrigger", "50G").load(silver_path).writeStream.option("checkpointLocation", gold_checkpoint_path).trigger(availableNow=True).foreachBatch(foreachBatchFunction).start()

18 kudos

11-24-2024 10:50:17 PM

20 More Replies

by Fred_F • New Contributor III

01-09-2023 6:57:28 AM

9898 Views
5 replies
5 kudos

JDBC connection timeout on workflow cluster

Hi there,I've a batch process configured in a workflow which fails due to a jdbc timeout on a Postgres DB.I checked the JDBC connection configuration and it seems to work when I query a table and doing a df.show() in the process and it displays th...

Data Engineering

9898 Views
5 replies
5 kudos

01-09-2023 6:57:28 AM

View Replies

Latest Reply

RKNutalapati
Valued Contributor

01-09-2023 11:51:10 AM

5 kudos

HI @Fred Foucart ,The above code looks good to me. Can you try with below code as well.spark.read\ .format("jdbc") \ .option("url", f"jdbc:postgresql://{host}/{database}") \ .option("driver", "org.postgresql.Driver") \ .option("user", username) ...

5 kudos

01-09-2023 11:51:10 AM

4 More Replies

Databricks Community

Forum Posts

Resolved! How to limit number of files in each batch in streaming batch processing

JDBC connection timeout on workflow cluster