<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Handling Changing Schema in CDC DLT in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12004#M6877</link>
    <description>&lt;P&gt;Yes, upon further inspection in the apply_changes() call sql is not escaping column names with "-". When we replaced that character it worked. Feels like a databricks bug&lt;/P&gt;</description>
    <pubDate>Fri, 29 Jul 2022 16:00:01 GMT</pubDate>
    <dc:creator>pmt</dc:creator>
    <dc:date>2022-07-29T16:00:01Z</dc:date>
    <item>
      <title>Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12002#M6875</link>
      <description>&lt;P&gt;We are building a DLT pipeline and the autoloader is handling schema evolution fine. However, further down the pipeline we are trying to load that streamed data with the apply_changes() function into a new table and, from the looks of it, doesn't seem to handle row updates with a new schema. However, during "Setting Up Tables" it fails with an "org.apache.spark.sql.catalyst.parser.ParseException" error.  The only explanation I can think of is it doesn't like replacing a column field of type "Null" with "Struct".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here is the code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;    @dlt.view(
      name = "authenticators_stream"
    )
    @dlt.expect_all_or_drop({"valid_doc": "doc IS NOT NULL"})
    def stream_table():
        return (
            spark.readStream \
                .format("cloudFiles") \
                .option("cloudFiles.useNotifications", "true") \
                .option("cloudFiles.queueUrl", "https://sqs.us-east-1.amazonaws.com/********/mongo-data-queue-testing") \
                .option("cloudFiles.includeExistingFiles", "true") \
                .option("cloudFiles.format", "json") \
                .option("cloudFiles.inferColumnTypes", "true")
                .option("cloudFiles.schemaEvolutionMode", "addNewColumns") \
                .option("multiline","false") \
                .option("cloudFiles.schemaHints", "_id STRING, ot STRING, ts TIMESTAMP, year INT, month INT, day INT")
                .load(json_path))
&amp;nbsp;
    dlt.create_streaming_live_table(
        name = "authenticators_raw",
        spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"}
    )
&amp;nbsp;
    dlt.apply_changes(
      target = "authenticators_raw",
      source = "authenticators_stream",
      keys = ["_id"],
      sequence_by = F.col("ts"),
      stored_as_scd_type = 2
    )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And here is the full error message:&lt;/P&gt;&lt;P&gt;&lt;I&gt;org.apache.spark.sql.catalyst.parser.ParseException: &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;[PARSE_SYNTAX_ERROR] Syntax error at or near '&amp;lt;'(line 1, pos 6)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;== SQL ==&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;struct&amp;lt;__v:bigint,_id:string,buttonlabel:string,company:string,configuration:struct&amp;lt;parameters:struct&amp;lt;company-id:string,cyberarkurl:string,duo-sso-url:string,email:string,google-oauth-url:string,login-success-text:string,login-url:string,microsofturl:string,okta-url:string,oktasubdomain:string,onelogin-url:string,password:string,payroll-cookies-wait-for-url:string,payroll-provider-selector:string,ping-identity-url:string,request-id:string,secureid-url:string,subdomain:string,target-computing-resources-url:string,username:string,usersname:string,wait-for-milliseconds-param-key:string,wait-for-xpath-after-navigate:string,workday-organization-group-name:string&amp;gt;&amp;gt;,connector:string,createdat:string,optional:boolean,updatedat:string&amp;gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;------^^^&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jul 2022 18:44:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12002#M6875</guid>
      <dc:creator>pmt</dc:creator>
      <dc:date>2022-07-28T18:44:47Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12003#M6876</link>
      <description>&lt;P&gt;If you leave only the authenticators_stream table, is the code running ok?&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 09:54:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12003#M6876</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-07-29T09:54:18Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12004#M6877</link>
      <description>&lt;P&gt;Yes, upon further inspection in the apply_changes() call sql is not escaping column names with "-". When we replaced that character it worked. Feels like a databricks bug&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 16:00:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12004#M6877</guid>
      <dc:creator>pmt</dc:creator>
      <dc:date>2022-07-29T16:00:01Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12005#M6878</link>
      <description>&lt;P&gt;ahhh, it is hive metastore limitation. It will be solved with migration to the unity catalog soon.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 18:15:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12005#M6878</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-07-29T18:15:57Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12006#M6879</link>
      <description>&lt;P&gt;really? that is great news. Do you know if it will also help with the auto-loader schema evolution? Our current pipeline runtime is ridiculously long because the cluster is forced to restart every schema change detected.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 18:23:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12006#M6879</guid>
      <dc:creator>pmt</dc:creator>
      <dc:date>2022-07-29T18:23:19Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12007#M6880</link>
      <description>&lt;P&gt;Yes, it should solve that issue. It was mentioned at the last Data+AI conference a month ago&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 18:26:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12007#M6880</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-07-29T18:26:03Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12008#M6881</link>
      <description>&lt;P&gt;that would be great. I'm going to look for that video &lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 18:46:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12008#M6881</guid>
      <dc:creator>pmt</dc:creator>
      <dc:date>2022-07-29T18:46:55Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Changing Schema in CDC DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12009#M6882</link>
      <description>&lt;P&gt;Hey there @Palani Thangaraj​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2022 11:48:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/handling-changing-schema-in-cdc-dlt/m-p/12009#M6882</guid>
      <dc:creator>Vidula</dc:creator>
      <dc:date>2022-09-06T11:48:19Z</dc:date>
    </item>
  </channel>
</rss>

