DLT create_table vs create_streaming_table

ggsmith — Tue, 17 Sep 2024 17:36:18 GMT

What is the difference between the create_table and create_streaming_table functions in dlt?

For example, this is how I have created a table that streams data from kafka written as json files to a volume.

@Dlt.table( name="raw_orders", table_properties={"quality": "bronze", "pipelines.reset.allowed": "false"}, temporary=False, ) def create_table(): query = ( spark.readStream.format("cloudFiles") ...

But I see this in the documentation a lot and don't really understand when to use each.

dlt.create_streaming_table("raw_orders")

Re: DLT create_table vs create_streaming_table

filipniziol — Tue, 17 Sep 2024 18:37:53 GMT

Hi @ggsmith ,

If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.
As per documentation:

Check this example:
https://www.reddit.com/r/databricks/comments/1b9jg3t/deduping_a_table_created_via_delta_live/

What you can observe:

@dlt.table /@dlt.create_table -> it will be used for your source (with readStream is used)
dlt.create_streaming_table to define your target and then to run dlt.apply_changes specifying source and target

In general @dlt.table / @dlt.create_table is more robust, whereas dlt.create_streaming_table is a form of syntax sygar designed so it is easier to define streaming targets.

topic DLT create_table vs create_streaming_table in Data Engineering

DLT create_table vs create_streaming_table

Re: DLT create_table vs create_streaming_table