cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

iptkrisna
by New Contributor III
  • 802 Views
  • 1 replies
  • 2 kudos

Jobs Data Pipeline Runtime Increase Significantly

Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...

  • 802 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @krisna math​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
Valentin1
by New Contributor III
  • 4760 Views
  • 5 replies
  • 2 kudos

Delta Live Tables Incremental Batch Loads & Failure Recovery

Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...

  • 4760 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Valentin Rosca​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 2 kudos
4 More Replies
sanjay
by Valued Contributor II
  • 10729 Views
  • 20 replies
  • 17 kudos

Resolved! How to limit number of files in each batch in streaming batch processing

Hi,I am running batch job which processes incoming files. I am trying to limit number of files in each batch process so added maxFilesPerTrigger option. But its not working. It processes all incoming files at once.(spark.readStream.format("delta").lo...

  • 10729 Views
  • 20 replies
  • 17 kudos
Latest Reply
Anonymous
Not applicable
  • 17 kudos

Hi @Sanjay Jain​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 17 kudos
19 More Replies
SRK
by Contributor III
  • 5518 Views
  • 2 replies
  • 0 kudos

How to get the count of dataframe rows when reading through spark.readstream using batch jobs?

I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.My CODE:df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx") .option("subscr...

  • 5518 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

You can try this approach:https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733ReadStream is running a thread in background so there's no easy way like df.show().

  • 0 kudos
1 More Replies
huyd
by New Contributor III
  • 907 Views
  • 0 replies
  • 4 kudos

Optimizing a batch load process, reading with the JDBC driver

I am doing a batch load, using the JDBC driver from a database table. I am noticing in Sparkui, that there is both memory and disk spill, but only on one executor. I am also, noticing that when trying to use the JDBC parallel read, it seems to run sl...

  • 907 Views
  • 0 replies
  • 4 kudos
Labels