Hi,
I'm trying to do something that's probably considered a no-no. The documentation makes me believe it should be possible. But, I'm getting lots of weird errors when trying to make it work.
If anyone has managed to get something similar to work, please let me know.
Short version: I am trying to use structured streaming to read from a table and write it to file(s).
Long version: I am using Lakehouse Federation to manage a connection to Google BigQuery where I have daily event data tables "catalog.schema.event_xxxx". I am trying to see if I can use structured streaming to manage copying the data across from those tables to files on an Azure storage account, which I'm accessing through a Volume set up on an external location.
Currently I am getting a StreamingQueryException which is complaining about a key not being found, but the referenced key does not exist in the data. It makes me think the error is misleading and the problem is actually elsewhere.
Below is a rough example of how I'm trying to use this.
tbl = spark.readStream.table('catalog.schema.`events_20221224`')
# tbl = spark.readStream.table('catalog.schema.`events_2022122*`')
(
tbl.writeStream.format("delta")
.partitionBy('event_date')
.trigger(availableNow=True)
.option("path", "/Volumes/test/data_test/raw_data/events")
.option(
"checkpointLocation",
"/Volumes/test/data_test/raw_data/events/_checkpoint",
)
.start()
.awaitTermination()
)