- 5639 Views
- 2 replies
- 1 kudos
I have an always-on job cluster triggering Spark Streaming jobs. I would like to stop this streaming job once a week to run table maintenance. I was looking to leverage the foreachBatch function to check a condition and stop the job accordingly.
- 5639 Views
- 2 replies
- 1 kudos
Latest Reply
You could also use the "Available-now micro-batch" trigger. It only processes one batch at a time, and you can do whatever you want in between batches (sleep, shut down, vacuum, etc.)
1 More Replies
- 3792 Views
- 2 replies
- 1 kudos
Hi,I have setup a streaming process that consumers files from HDFS staging directory and writes into target location. Input directory continuesouly gets files from another process.Lets say file producer produces 5 million records sends it to hdfs sta...
- 3792 Views
- 2 replies
- 1 kudos
Latest Reply
If it helps , you run try running the Left-Anti join on source and sink to identify missing records and see whether the record is in match with the schema provided or not
1 More Replies
- 2276 Views
- 1 replies
- 3 kudos
Specifically for write and read streaming data to HDFS or s3 etc. For IoT specific scenario how it performs on time series transactional data. Can we consider delta table as time series table?
- 2276 Views
- 1 replies
- 3 kudos
Latest Reply
Hi @Arindam Halder , Delta lake is more performant compared to a regular parquet table. pls check below for some stats on the performancehttps://docs.azuredatabricks.net/_static/notebooks/delta/optimize-python.htmlyes, you can use it for time series...
- 8147 Views
- 1 replies
- 0 kudos
We are streaming data from kafka source with json but in some column we are getting .(dot) in column names.streaming json data:
df1 = df.selectExpr("CAST(value AS STRING)")
{"pNum":"A14","from":"telecom","payload":{"TARGET":"1","COUNTRY":"India"...
- 8147 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Mithu Wagh you can use backticks to enclose the column name.df.select("`col0.1`")