<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DLT create_table vs create_streaming_table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-create-table-vs-create-streaming-table/m-p/90769#M37993</link>
    <description>&lt;P&gt;What is the difference between the create_table and create_streaming_table functions in dlt?&lt;BR /&gt;&lt;BR /&gt;For example, this is how I have created a table that streams data from kafka written as json files to a volume.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(
    name="raw_orders",
    table_properties={"quality": "bronze", "pipelines.reset.allowed": "false"},
    temporary=False,
)
def create_table():
    query = (
        spark.readStream.format("cloudFiles")
        ...&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;But I see this in the documentation a lot and don't really understand when to use each.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;dlt.create_streaming_table("raw_orders")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 17 Sep 2024 17:36:18 GMT</pubDate>
    <dc:creator>ggsmith</dc:creator>
    <dc:date>2024-09-17T17:36:18Z</dc:date>
    <item>
      <title>DLT create_table vs create_streaming_table</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-create-table-vs-create-streaming-table/m-p/90769#M37993</link>
      <description>&lt;P&gt;What is the difference between the create_table and create_streaming_table functions in dlt?&lt;BR /&gt;&lt;BR /&gt;For example, this is how I have created a table that streams data from kafka written as json files to a volume.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(
    name="raw_orders",
    table_properties={"quality": "bronze", "pipelines.reset.allowed": "false"},
    temporary=False,
)
def create_table():
    query = (
        spark.readStream.format("cloudFiles")
        ...&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;But I see this in the documentation a lot and don't really understand when to use each.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;dlt.create_streaming_table("raw_orders")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Sep 2024 17:36:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-create-table-vs-create-streaming-table/m-p/90769#M37993</guid>
      <dc:creator>ggsmith</dc:creator>
      <dc:date>2024-09-17T17:36:18Z</dc:date>
    </item>
    <item>
      <title>Re: DLT create_table vs create_streaming_table</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-create-table-vs-create-streaming-table/m-p/90785#M37995</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115999"&gt;@ggsmith&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.&lt;BR /&gt;As per &lt;A href="https://docs.databricks.com/en/delta-live-tables/python-ref.html#create-target-fn" target="_self"&gt;documentation&lt;/A&gt;:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="filipniziol_1-1726597509582.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11293i2D770990393E9D11/image-size/medium?v=v2&amp;amp;px=400" role="button" title="filipniziol_1-1726597509582.png" alt="filipniziol_1-1726597509582.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Check this example:&lt;BR /&gt;&lt;A href="https://www.reddit.com/r/databricks/comments/1b9jg3t/deduping_a_table_created_via_delta_live/" target="_blank" rel="noopener"&gt;https://www.reddit.com/r/databricks/comments/1b9jg3t/deduping_a_table_created_via_delta_live/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;What you can observe:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;@dlt.table /@dlt.create_table -&amp;gt; it will be used for your source (with readStream is used)&lt;/LI&gt;&lt;LI&gt;dlt.create_streaming_table to define your target and then to run&amp;nbsp;dlt.apply_changes specifying source and target&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;In general&amp;nbsp;@dlt.table / @dlt.create_table is more robust, whereas dlt.create_streaming_table is a form of syntax sygar designed so it is easier to define streaming targets.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Sep 2024 18:37:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-create-table-vs-create-streaming-table/m-p/90785#M37995</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-09-17T18:37:53Z</dc:date>
    </item>
  </channel>
</rss>

