<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Error updating tables in DLT in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110184#M43515</link>
    <description>&lt;P&gt;Thank you for your explanation! That makes a lot of sense, since DLT manages streaming tables and checkpointing, I now understand why I can’t just update an existing table outside the pipeline.&lt;BR /&gt;Given this, what would you recommend in my case?&lt;BR /&gt;Would it be best to modify those records within the same DLT pipeline that initially creates and populates the table? Or is there any workaround that would allow me to load and update the table in a separate DLT pipeline? I appreciate your insights! Thanks again for your help.&lt;/P&gt;</description>
    <pubDate>Fri, 14 Feb 2025 08:49:25 GMT</pubDate>
    <dc:creator>Radix95</dc:creator>
    <dc:date>2025-02-14T08:49:25Z</dc:date>
    <item>
      <title>Error updating tables in DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110121#M43493</link>
      <description>&lt;P&gt;I'm working on a Delta Live Tables (DLT) pipeline in Databricks Serverless mode.&lt;/P&gt;&lt;P&gt;I receive a stream of data from &lt;STRONG&gt;Event Hubs&lt;/STRONG&gt;, where each incoming record contains a unique identifier (uuid) along with some attributes (code1, code2).&lt;/P&gt;&lt;P&gt;My goal is to update an existing table (data_event) using these records: if a record with the same uuid already exists (matching between&lt;STRONG&gt; uuid &lt;/STRONG&gt;log and table record&lt;STRONG&gt; uuid&lt;/STRONG&gt;), I want to update its values (code1, code2, and fill the column timestamp_update with the current time of the table update).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;schema_update = StructType([
    StructField("uuid", StringType(), False),
    StructField("code1", StringType(), True),
    StructField("code2", StringType(), True)
])

@dlt.view
def raw_data_view():
    df_raw = (
        spark.readStream
             .format("kafka")
             .options(**KAFKA_OPTIONS)
             .load()
    )

    df_parsed = (
        df_raw.selectExpr("CAST(value AS STRING) AS json_data")
              .select(from_json(col("json_data"), schema_update).alias("data"))
              .select("data.*")
              .withColumn("timestamp_update", current_timestamp())
    )
    return df_parsed

dlt.apply_changes(
    target="data_event",         
    source="raw_data_view",          
    keys=["uuid"],
    sequence_by="timestamp_update",
    stored_as_scd_type=1
)​&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I get this error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;com.databricks.pipelines.common.errors.DLTSparkException: [STREAMING_TARGET_NOT_DEFINED] Cannot found target table `test`.`testtables`.`data_event` for the APPLY CHANGES command. Target table `test`.`testtables`.`data_event` is not defined in the pipeline.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Feb 2025 14:15:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110121#M43493</guid>
      <dc:creator>Radix95</dc:creator>
      <dc:date>2025-02-13T14:15:17Z</dc:date>
    </item>
    <item>
      <title>Re: Error updating tables in DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110177#M43510</link>
      <description>&lt;P&gt;All the tables that DLT writes to or updates needs to be managed by DLT. The reason is that these tables are streaming tables and hence DLT needs to manage the checkpointing. It also does the optimization for such tables. So in your scenario, you cannot just update another existing table. It should be defined and updated in the same pipeline.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Feb 2025 08:02:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110177#M43510</guid>
      <dc:creator>Edthehead</dc:creator>
      <dc:date>2025-02-14T08:02:20Z</dc:date>
    </item>
    <item>
      <title>Re: Error updating tables in DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110184#M43515</link>
      <description>&lt;P&gt;Thank you for your explanation! That makes a lot of sense, since DLT manages streaming tables and checkpointing, I now understand why I can’t just update an existing table outside the pipeline.&lt;BR /&gt;Given this, what would you recommend in my case?&lt;BR /&gt;Would it be best to modify those records within the same DLT pipeline that initially creates and populates the table? Or is there any workaround that would allow me to load and update the table in a separate DLT pipeline? I appreciate your insights! Thanks again for your help.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Feb 2025 08:49:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110184#M43515</guid>
      <dc:creator>Radix95</dc:creator>
      <dc:date>2025-02-14T08:49:25Z</dc:date>
    </item>
    <item>
      <title>Re: Error updating tables in DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110187#M43517</link>
      <description>&lt;P&gt;You should only update the table in the same pipeline(DLT1) that creates it as that is the pipeline that maintains it. With the assumption that the table data_event is indeed created and maintained in another DLT pipeline, from what you've shared,&amp;nbsp; you should just move that read of the kafka source into&amp;nbsp; that pipeline. So everything is in the same DLT pipeline.&amp;nbsp; In cases where you have to have separate pipelines, you need to trigger 1 after the other (incase of batch mode) or run them continuously (in real time mode), And DLT1 should create a view with a read stream from the event table and finally update the data_event table. Hope this helps.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Feb 2025 09:02:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-updating-tables-in-dlt/m-p/110187#M43517</guid>
      <dc:creator>Edthehead</dc:creator>
      <dc:date>2025-02-14T09:02:18Z</dc:date>
    </item>
  </channel>
</rss>

