- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2024 10:34 AM - edited 09-17-2024 10:36 AM
What is the difference between the create_table and create_streaming_table functions in dlt?
For example, this is how I have created a table that streams data from kafka written as json files to a volume.
@Dlt.table(
name="raw_orders",
table_properties={"quality": "bronze", "pipelines.reset.allowed": "false"},
temporary=False,
)
def create_table():
query = (
spark.readStream.format("cloudFiles")
...
But I see this in the documentation a lot and don't really understand when to use each.
dlt.create_streaming_table("raw_orders")
- Labels:
-
Delta Lake
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2024 11:37 AM - edited 09-17-2024 11:37 AM
Hi @ggsmith ,
If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.
As per documentation:
Check this example:
https://www.reddit.com/r/databricks/comments/1b9jg3t/deduping_a_table_created_via_delta_live/
What you can observe:
- @dlt.table /@dlt.create_table -> it will be used for your source (with readStream is used)
- dlt.create_streaming_table to define your target and then to run dlt.apply_changes specifying source and target
In general @dlt.table / @dlt.create_table is more robust, whereas dlt.create_streaming_table is a form of syntax sygar designed so it is easier to define streaming targets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2024 11:37 AM - edited 09-17-2024 11:37 AM
Hi @ggsmith ,
If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.
As per documentation:
Check this example:
https://www.reddit.com/r/databricks/comments/1b9jg3t/deduping_a_table_created_via_delta_live/
What you can observe:
- @dlt.table /@dlt.create_table -> it will be used for your source (with readStream is used)
- dlt.create_streaming_table to define your target and then to run dlt.apply_changes specifying source and target
In general @dlt.table / @dlt.create_table is more robust, whereas dlt.create_streaming_table is a form of syntax sygar designed so it is easier to define streaming targets.

