by
sanjay
• Valued Contributor II
- 26181 Views
- 21 replies
- 18 kudos
Hi,I am running batch job which processes incoming files. I am trying to limit number of files in each batch process so added maxFilesPerTrigger option. But its not working. It processes all incoming files at once.(spark.readStream.format("delta").lo...
- 26181 Views
- 21 replies
- 18 kudos
Latest Reply
Hi @Sandeep ,Can we usespark.readStream.format("delta").option(""maxBytesPerTrigger", "50G").load(silver_path).writeStream.option("checkpointLocation", gold_checkpoint_path).trigger(availableNow=True).foreachBatch(foreachBatchFunction).start()
20 More Replies
- 8408 Views
- 6 replies
- 3 kudos
Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...
- 8408 Views
- 6 replies
- 3 kudos
Latest Reply
I totally agree that this is a gap in the Databricks solution. This gap exists between a static read and real time streaming. My problem (and suspect there are many use cases) is that I have slowly changing data coming into structured folders via ...
5 More Replies
- 1224 Views
- 1 replies
- 2 kudos
Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...
- 1224 Views
- 1 replies
- 2 kudos
Latest Reply
Hi @krisna math Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.
- 8365 Views
- 2 replies
- 0 kudos
I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.My CODE:df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx") .option("subscr...
- 8365 Views
- 2 replies
- 0 kudos
Latest Reply
You can try this approach:https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733ReadStream is running a thread in background so there's no easy way like df.show().
1 More Replies
by
huyd
• New Contributor III
- 1439 Views
- 0 replies
- 4 kudos
I am doing a batch load, using the JDBC driver from a database table. I am noticing in Sparkui, that there is both memory and disk spill, but only on one executor. I am also, noticing that when trying to use the JDBC parallel read, it seems to run sl...
- 1439 Views
- 0 replies
- 4 kudos