Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Showing results for 
Search instead for 
Did you mean: 

Data is not loaded when creating two different streaming table from one delta live table pipeline

New Contributor III

 i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.
where as when i try to run each table individually they execute perfectly

is it because DLT cannot process two different streaming table at once.?

DF = spark.readStream.format("json") \
      .schema(schema) \
      .option("header", True) \
      .option("nullValue", "") \
      .load(source_path + "/*.json")

Community Manager
Community Manager

Hi @zero234It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where you’re trying to create two streaming tables from different sources with distinct schemas.

Let’s dive into this!

DLT is a powerful feature in Databricks that allows you to manage change data capture (CDC) and maintain historical versions of your data. However, there are some considerations when dealing with multiple streaming tables in a single pipeline.

  1. DLT and Multiple Streaming Tables:

    • DLT can indeed process multiple streaming tables simultaneously, but there are certain rules to follow.
    • Each streaming table in your pipeline should adhere to either SCD Type 1 (overwrite) or SCD Type 2 (historical tracking) behavior.
    • If you’re using SCD Type 1, the target table will be overwritten with the latest data from all sources.
    • If you’re using SCD Type 2, historical changes are tracked, and new records are appended.
    • Identity columns are not supported for tables that are the target of APPLY CHANGES INTO. Additionally, recomputation during updates for materialized views might occur.
  2. Your Code and Possible Issue:

    • In your code snippet, you’re reading JSON data from different locations and loading it into a streaming DataFrame (DF).
    • You mentioned that running each table individually works fine, but combining them in the same pipeline doesn’t insert data into both tables.
    • Let’s troubleshoot this:
      • First, try removing one of the apply_changes declarations (either unified_events_pv_raw or unified_events_wc_raw). Test with only one declaration to see if the issue persists.

      • Second, consider changing your code to declare a single apply_changes function with both sources. You can specify multiple sources by providing a list of tables or views as the source parameter.

      • Here’s an example of how you can structure your code:

        import dlt
        def source1():
        def source2():
        def target():
            return dlt.apply_changes(
                source=["source1", "source2"],
  3. Append Flow Method:

    • Another approach is to use the @append_flow decorator. This allows you to combine data from multiple streams into a single Delta table.

    • Example:

      def my_combined_stream():
          return"source1") +"source2")
  4. Further Troubleshooting:

    • Check your schema definitions, ensure that column names and data types match between sources.
    • Verify that your streaming sources are producing data consistently.
    • Monitor the logs for any errors or warnings during pipeline execution.

Remember that DLT is a powerful tool, but understanding its behavior and following best practices will help you achieve the desired results. Happy coding! 🚀🔍

For more details, refer to the official Databricks documentation on Delta Live Tables.123

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.