cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nolanlavender00
by New Contributor
  • 4217 Views
  • 4 replies
  • 1 kudos

Resolved! How to stop a Streaming Job based on time of the week

I have an always-on job cluster triggering Spark Streaming jobs. I would like to stop this streaming job once a week to run table maintenance. I was looking to leverage the foreachBatch function to check a condition and stop the job accordingly.

  • 4217 Views
  • 4 replies
  • 1 kudos
Latest Reply
mroy
Contributor
  • 1 kudos

You could also use the "Available-now micro-batch" trigger. It only processes one batch at a time, and you can do whatever you want in between batches (sleep, shut down, vacuum, etc.)

  • 1 kudos
3 More Replies
ArindamHalder
by New Contributor II
  • 1565 Views
  • 3 replies
  • 3 kudos

Resolved! Is there any performance result available for DeltaLake?

Specifically for write and read streaming data to HDFS or s3 etc. For IoT specific scenario how it performs on time series transactional data. Can we consider delta table as time series table?

  • 1565 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Arindam Halder​ , How is it going?Were you able to resolve your problem?

  • 3 kudos
2 More Replies
RajaLakshmanan
by New Contributor
  • 2504 Views
  • 3 replies
  • 1 kudos

Resolved! Spark StreamingQuery not processing all data from source directory

Hi,I have setup a streaming process that consumers files from HDFS staging directory and writes into target location. Input directory continuesouly gets files from another process.Lets say file producer produces 5 million records sends it to hdfs sta...

  • 2504 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16763506586
Contributor
  • 1 kudos

If it helps , you run try running the Left-Anti join on source and sink to identify missing records and see whether the record is in match with the schema provided or not

  • 1 kudos
2 More Replies
MithuWagh
by New Contributor
  • 5836 Views
  • 1 replies
  • 0 kudos

How to deal with column name with .(dot) in pyspark dataframe??

We are streaming data from kafka source with json but in some column we are getting .(dot) in column names.streaming json data: df1 = df.selectExpr("CAST(value AS STRING)") {"pNum":"A14","from":"telecom","payload":{"TARGET":"1","COUNTRY":"India"...

  • 5836 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Mithu Wagh you can use backticks to enclose the column name.df.select("`col0.1`")

  • 0 kudos
Labels