Hi,
I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:
Without .start()
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", source_format)
.option("cloudFiles.schemaLocation", checkpoint_directory)
.load(data_source)
.writeStream
.option("checkpointLocation", checkpoint_directory)
.option("mergeSchema", "true")
.table(table_name)
With .start()
(myDF
.writeStream
.format("delta")
.option("checkpointLocation", checkpointPath)
.outputMode("append")
.start(path)
)
With .start()
query = (streaming_df.writeStream
.foreachBatch(streaming_merge.upsert_to_delta)
.outputMode("update")
.option("checkpointLocation", f"{DA.paths.checkpoints}/recordings")
.trigger(availableNow=True)
.start())
query.awaitTermination()
1) I didn't understand where should / shouldn't use ".start()" method. I appreciate it if you could guide me on this.
2) If I don't pass "path" to the "start()", where the data files will be written?
Thanks for your help.