cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Data is not loaded when creating two different streaming table from one delta live table pipeline

zero234
New Contributor III

 i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.
where as when i try to run each table individually they execute perfectly


is it because DLT cannot process two different streaming table at once.?

DF = spark.readStream.format("json") \
      .schema(schema) \
      .option("header", True) \
      .option("nullValue", "") \
      .load(source_path + "/*.json")
1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @zero234It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where you’re trying to create two streaming tables from different sources with distinct schemas.

Let’s dive into this!

DLT is a powerful feature in Databricks that allows you to manage change data capture (CDC) and maintain historical versions of your data. However, there are some considerations when dealing with multiple streaming tables in a single pipeline.

  1. DLT and Multiple Streaming Tables:

    • DLT can indeed process multiple streaming tables simultaneously, but there are certain rules to follow.
    • Each streaming table in your pipeline should adhere to either SCD Type 1 (overwrite) or SCD Type 2 (historical tracking) behavior.
    • If you’re using SCD Type 1, the target table will be overwritten with the latest data from all sources.
    • If you’re using SCD Type 2, historical changes are tracked, and new records are appended.
    • Identity columns are not supported for tables that are the target of APPLY CHANGES INTO. Additionally, recomputation during updates for materialized views might occur.
  2. Your Code and Possible Issue:

    • In your code snippet, you’re reading JSON data from different locations and loading it into a streaming DataFrame (DF).
    • You mentioned that running each table individually works fine, but combining them in the same pipeline doesn’t insert data into both tables.
    • Let’s troubleshoot this:
      • First, try removing one of the apply_changes declarations (either unified_events_pv_raw or unified_events_wc_raw). Test with only one declaration to see if the issue persists.

      • Second, consider changing your code to declare a single apply_changes function with both sources. You can specify multiple sources by providing a list of tables or views as the source parameter.

      • Here’s an example of how you can structure your code:

        import dlt
        
        @dlt.table
        def source1():
            return dlt.read("unified_events_pv_raw")
        
        @dlt.table
        def source2():
            return dlt.read("unified_events_wc_raw")
        
        @dlt.table
        def target():
            return dlt.apply_changes(
                target="unified_events_test_11",
                source=["source1", "source2"],
                keys=["event_id"]
            )
        
  3. Append Flow Method:

    • Another approach is to use the @append_flow decorator. This allows you to combine data from multiple streams into a single Delta table.

    • Example:

      @dlt.append_flow
      def my_combined_stream():
          return dlt.read("source1") + dlt.read("source2")
      
  4. Further Troubleshooting:

    • Check your schema definitions, ensure that column names and data types match between sources.
    • Verify that your streaming sources are producing data consistently.
    • Monitor the logs for any errors or warnings during pipeline execution.

Remember that DLT is a powerful tool, but understanding its behavior and following best practices will help you achieve the desired results. Happy coding! 🚀🔍

For more details, refer to the official Databricks documentation on Delta Live Tables.123

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group