When should I use ".start()" with writeStream?

Mado — Thu, 20 Oct 2022 07:44:25 GMT

Hi,

I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:

Without .start()

  spark.readStream
 
         .format("cloudFiles")
 
         .option("cloudFiles.format", source_format)
 
         .option("cloudFiles.schemaLocation", checkpoint_directory)
 
         .load(data_source)
 
         .writeStream
 
         .option("checkpointLocation", checkpoint_directory)
 
         .option("mergeSchema", "true")
 
         .table(table_name)

With .start()

(myDF
 
 .writeStream
 
 .format("delta")
 
 .option("checkpointLocation", checkpointPath)
 
 .outputMode("append")
 
 .start(path)
 
)

With .start()

query = (streaming_df.writeStream
                         .foreachBatch(streaming_merge.upsert_to_delta)
                         .outputMode("update")
                         .option("checkpointLocation", f"{DA.paths.checkpoints}/recordings")
                         .trigger(availableNow=True)
                         .start())
query.awaitTermination()

1) I didn't understand where should / shouldn't use ".start()" method. I appreciate it if you could guide me on this.

2) If I don't pass "path" to the "start()", where the data files will be written?

Thanks for your help.

Re: When should I use ".start()" with writeStream?

Anonymous — Sun, 27 Nov 2022 13:46:57 GMT

Hi @Mohammad Saber

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon.

Thanks

Re: When should I use ".start()" with writeStream?

Mado — Sun, 27 Nov 2022 20:39:09 GMT

Thanks for your message. I am still looking for the answer.

topic Re: When should I use ".start()" with writeStream? in Data Engineering

When should I use ".start()" with writeStream?

Re: When should I use ".start()" with writeStream?

Re: When should I use ".start()" with writeStream?