Databricks Community

ggsmith · ‎09-17-2024

What is the difference between the create_table and create_streaming_table functions in dlt?

For example, this is how I have created a table that streams data from kafka written as json files to a volume.

@Dlt.table(
    name="raw_orders",
    table_properties={"quality": "bronze", "pipelines.reset.allowed": "false"},
    temporary=False,
)
def create_table():
    query = (
        spark.readStream.format("cloudFiles")
        ...

But I see this in the documentation a lot and don't really understand when to use each.

dlt.create_streaming_table("raw_orders")

filipniziol · ‎09-17-2024

Hi @ggsmith ,

If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.
As per documentation:

Check this example:
https://www.reddit.com/r/databricks/comments/1b9jg3t/deduping_a_table_created_via_delta_live/

What you can observe:

@dlt.table /@dlt.create_table -> it will be used for your source (with readStream is used)
dlt.create_streaming_table to define your target and then to run dlt.apply_changes specifying source and target

In general @dlt.table / @dlt.create_table is more robust, whereas dlt.create_streaming_table is a form of syntax sygar designed so it is easier to define streaming targets.

View solution in original post

filipniziol · ‎09-17-2024

Hi @ggsmith ,

If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.
As per documentation:

Check this example:
https://www.reddit.com/r/databricks/comments/1b9jg3t/deduping_a_table_created_via_delta_live/

What you can observe:

@dlt.table /@dlt.create_table -> it will be used for your source (with readStream is used)
dlt.create_streaming_table to define your target and then to run dlt.apply_changes specifying source and target

In general @dlt.table / @dlt.create_table is more robust, whereas dlt.create_streaming_table is a form of syntax sygar designed so it is easier to define streaming targets.