<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Structure Streaming - Table(s) to File(s) - Is it possible? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/59589#M31452</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm trying to do something that's probably considered a no-no. The documentation makes me believe it should be possible. But, I'm getting lots of weird errors when trying to make it work.&lt;/P&gt;&lt;P&gt;If anyone has managed to get something similar to work, please let me know.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Short version:&lt;/STRONG&gt; I am trying to use structured streaming to read from a table and write it to file(s).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Long version:&lt;/STRONG&gt; I am using Lakehouse Federation to manage a connection to Google BigQuery where I have daily event data tables "catalog.schema.event_xxxx". I am trying to see if I can use structured streaming to manage copying the data across from those tables to files on an Azure storage account, which I'm accessing through a Volume set up on an external location.&lt;/P&gt;&lt;P&gt;Currently I am getting a StreamingQueryException which is complaining about a key not being found, but the referenced key does not exist in the data. It makes me think the error is misleading and the problem is actually elsewhere.&lt;/P&gt;&lt;P&gt;Below is a rough example of how I'm trying to use this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;tbl = spark.readStream.table('catalog.schema.`events_20221224`')
# tbl = spark.readStream.table('catalog.schema.`events_2022122*`')

(
    tbl.writeStream.format("delta")
    .partitionBy('event_date')
    .trigger(availableNow=True)
    .option("path", "/Volumes/test/data_test/raw_data/events")
    .option(
        "checkpointLocation",
        "/Volumes/test/data_test/raw_data/events/_checkpoint",
    )
    .start()
    .awaitTermination()
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 07 Feb 2024 12:18:30 GMT</pubDate>
    <dc:creator>DavidMooreZA</dc:creator>
    <dc:date>2024-02-07T12:18:30Z</dc:date>
    <item>
      <title>Structure Streaming - Table(s) to File(s) - Is it possible?</title>
      <link>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/59589#M31452</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm trying to do something that's probably considered a no-no. The documentation makes me believe it should be possible. But, I'm getting lots of weird errors when trying to make it work.&lt;/P&gt;&lt;P&gt;If anyone has managed to get something similar to work, please let me know.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Short version:&lt;/STRONG&gt; I am trying to use structured streaming to read from a table and write it to file(s).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Long version:&lt;/STRONG&gt; I am using Lakehouse Federation to manage a connection to Google BigQuery where I have daily event data tables "catalog.schema.event_xxxx". I am trying to see if I can use structured streaming to manage copying the data across from those tables to files on an Azure storage account, which I'm accessing through a Volume set up on an external location.&lt;/P&gt;&lt;P&gt;Currently I am getting a StreamingQueryException which is complaining about a key not being found, but the referenced key does not exist in the data. It makes me think the error is misleading and the problem is actually elsewhere.&lt;/P&gt;&lt;P&gt;Below is a rough example of how I'm trying to use this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;tbl = spark.readStream.table('catalog.schema.`events_20221224`')
# tbl = spark.readStream.table('catalog.schema.`events_2022122*`')

(
    tbl.writeStream.format("delta")
    .partitionBy('event_date')
    .trigger(availableNow=True)
    .option("path", "/Volumes/test/data_test/raw_data/events")
    .option(
        "checkpointLocation",
        "/Volumes/test/data_test/raw_data/events/_checkpoint",
    )
    .start()
    .awaitTermination()
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 12:18:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/59589#M31452</guid>
      <dc:creator>DavidMooreZA</dc:creator>
      <dc:date>2024-02-07T12:18:30Z</dc:date>
    </item>
    <item>
      <title>Re: Structure Streaming - Table(s) to File(s) - Is it possible?</title>
      <link>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/59592#M31454</link>
      <description>&lt;P&gt;the sink is a delta lake table.&amp;nbsp; You don't have it defined somewhere in UC as a table?&amp;nbsp; Because it is impossible to define something as a table and in a volume at the same time.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 13:03:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/59592#M31454</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-02-07T13:03:05Z</dc:date>
    </item>
    <item>
      <title>Re: Structure Streaming - Table(s) to File(s) - Is it possible?</title>
      <link>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/59655#M31466</link>
      <description>&lt;P&gt;I created a new schema and volume specifically for this testing and the names are all distinct from any other objects in the catalog.&lt;BR /&gt;&lt;BR /&gt;I did a quick double check just in case I somehow missed a duplicate and there were none.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 07:51:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/59655#M31466</guid>
      <dc:creator>DavidMooreZA</dc:creator>
      <dc:date>2024-02-08T07:51:42Z</dc:date>
    </item>
    <item>
      <title>Re: Structure Streaming - Table(s) to File(s) - Is it possible?</title>
      <link>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/101420#M40654</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Can you please share what the error stack trace looks like?&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;One possible cause of this error is that the schema of the table you are reading from does not match the schema of the data you are writing to.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 07:36:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structure-streaming-table-s-to-file-s-is-it-possible/m-p/101420#M40654</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2024-12-09T07:36:18Z</dc:date>
    </item>
  </channel>
</rss>

