cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Structure Streaming - Table(s) to File(s) - Is it possible?

DavidMooreZA
New Contributor II

Hi,

I'm trying to do something that's probably considered a no-no. The documentation makes me believe it should be possible. But, I'm getting lots of weird errors when trying to make it work.

If anyone has managed to get something similar to work, please let me know.

Short version: I am trying to use structured streaming to read from a table and write it to file(s).

Long version: I am using Lakehouse Federation to manage a connection to Google BigQuery where I have daily event data tables "catalog.schema.event_xxxx". I am trying to see if I can use structured streaming to manage copying the data across from those tables to files on an Azure storage account, which I'm accessing through a Volume set up on an external location.

Currently I am getting a StreamingQueryException which is complaining about a key not being found, but the referenced key does not exist in the data. It makes me think the error is misleading and the problem is actually elsewhere.

Below is a rough example of how I'm trying to use this.

 

 

tbl = spark.readStream.table('catalog.schema.`events_20221224`')
# tbl = spark.readStream.table('catalog.schema.`events_2022122*`')

(
    tbl.writeStream.format("delta")
    .partitionBy('event_date')
    .trigger(availableNow=True)
    .option("path", "/Volumes/test/data_test/raw_data/events")
    .option(
        "checkpointLocation",
        "/Volumes/test/data_test/raw_data/events/_checkpoint",
    )
    .start()
    .awaitTermination()
)

 

3 REPLIES 3

-werners-
Esteemed Contributor III

the sink is a delta lake table.  You don't have it defined somewhere in UC as a table?  Because it is impossible to define something as a table and in a volume at the same time.

DavidMooreZA
New Contributor II

I created a new schema and volume specifically for this testing and the names are all distinct from any other objects in the catalog.

I did a quick double check just in case I somehow missed a duplicate and there were none.

Kaniz
Community Manager
Community Manager

Hey there! Thanks a bunch for being part of our awesome community! 🎉 

We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution for you. And remember, if you ever need more help , we're here for you! 

Keep being awesome! 😊🚀

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.