<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Handling Schema Mismatch in DLT Pipeline with from_protobuf Function in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/handling-schema-mismatch-in-dlt-pipeline-with-from-protobuf/m-p/50718#M6092</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I finally&amp;nbsp; found the appropriate method for configuring the PERMISSIVE mode. With this setup, corrupted protobuf messages will be processed without throwing an exception&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;"msg"&lt;/SPAN&gt;&lt;SPAN&gt;,from_protobuf(col(&lt;/SPAN&gt;&lt;SPAN&gt;"value"&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;message_name&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;desc_file_path&lt;/SPAN&gt;&lt;SPAN&gt;, {&lt;/SPAN&gt;&lt;SPAN&gt;"mode"&lt;/SPAN&gt;&lt;SPAN&gt; : &lt;/SPAN&gt;&lt;SPAN&gt;"PERMISSIVE"&lt;/SPAN&gt;&lt;SPAN&gt;}) )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 09 Nov 2023 09:03:31 GMT</pubDate>
    <dc:creator>serelk</dc:creator>
    <dc:date>2023-11-09T09:03:31Z</dc:date>
    <item>
      <title>Handling Schema Mismatch in DLT Pipeline with from_protobuf Function</title>
      <link>https://community.databricks.com/t5/get-started-discussions/handling-schema-mismatch-in-dlt-pipeline-with-from-protobuf/m-p/50567#M6090</link>
      <description>&lt;P&gt;Hello Databricks Community,&lt;/P&gt;&lt;P&gt;I'm working with a DLT pipeline where I consume Protobuf-serialized messages and attempt to decode them using the Spark "from_protobuf" function. My function kafka_msg_dlt_view is outlined as follows:&lt;/P&gt;&lt;P&gt;def kafka_msg_dlt_view():&lt;BR /&gt;desc_file_path = "xxxxxx"&lt;BR /&gt;message_name = "yyyyyyyy"&lt;/P&gt;&lt;P&gt;df = spark.readStream.format("kafka").options(**KAFKA_OPTIONS).load()&lt;BR /&gt;try:&lt;BR /&gt;dfr = (df.select(from_protobuf(df.value, message_name, desc_file_path).alias("msg")))&lt;BR /&gt;logging.info("Finished parsing")&lt;BR /&gt;return dfr.withColumn("p_status", lit("ok"))&lt;BR /&gt;except Exception as e:&lt;BR /&gt;logging.error(f"Got exception {e}")&lt;BR /&gt;return df.withColumn("p_status", lit("parsing_error"))&lt;/P&gt;&lt;P&gt;The challenge arises when there is a schema mismatch: the DLT pipeline fails and a Spark exception is thrown, which seems not to be caught by the Python try-except block. The error message suggests switching the mode to PERMISSIVE, but upon trying this, it appears to have no effect on the behavior of the from_protobuf functionality.&lt;/P&gt;&lt;P&gt;org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 1cd0bea3-22d3-4567-975b-9f71efd54c64, runId = 8667b3c9-6665-47fb-8e51-cb7563e6819c] terminated with exception: Job aborted due to stage failure: Task 0 in stage 286.0 failed 4 times, most recent failure: Lost task 0.3 in stage 286.0 (TID 419) (10.0.128.12 executor 0): org.apache.spark.SparkException: [MALFORMED_PROTOBUF_MESSAGE] Malformed Protobuf messages are detected in message deserialization. Parse Mode: FAILFAST. To process malformed protobuf message as null result, try setting the option 'mode' as 'PERMISSIVE'.&lt;/P&gt;&lt;P&gt;Has anyone encountered this issue or have insight into handling schema mismatches gracefully within a DLT pipeline when using from_protobuf? Any advice on making the error handling work as intended would be greatly appreciated.&lt;/P&gt;&lt;P&gt;Thank you in advance for your help!&lt;/P&gt;</description>
      <pubDate>Tue, 07 Nov 2023 13:31:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/handling-schema-mismatch-in-dlt-pipeline-with-from-protobuf/m-p/50567#M6090</guid>
      <dc:creator>serelk</dc:creator>
      <dc:date>2023-11-07T13:31:36Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Schema Mismatch in DLT Pipeline with from_protobuf Function</title>
      <link>https://community.databricks.com/t5/get-started-discussions/handling-schema-mismatch-in-dlt-pipeline-with-from-protobuf/m-p/50625#M6091</link>
      <description>&lt;P&gt;One possible solution could be to handle the deserialization of the Protobuf messages differently.&amp;nbsp;Instead of using a deserializer, you could use a&amp;nbsp;ByteArrayDeserializer&amp;nbsp;and convert it in your listener instead.&amp;nbsp;Then, you could use a&amp;nbsp;ByteArraySerializer. This approach might allow you to handle schema mismatches more gracefully.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2023 10:44:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/handling-schema-mismatch-in-dlt-pipeline-with-from-protobuf/m-p/50625#M6091</guid>
      <dc:creator>Faisal</dc:creator>
      <dc:date>2023-11-08T10:44:30Z</dc:date>
    </item>
    <item>
      <title>Re: Handling Schema Mismatch in DLT Pipeline with from_protobuf Function</title>
      <link>https://community.databricks.com/t5/get-started-discussions/handling-schema-mismatch-in-dlt-pipeline-with-from-protobuf/m-p/50718#M6092</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I finally&amp;nbsp; found the appropriate method for configuring the PERMISSIVE mode. With this setup, corrupted protobuf messages will be processed without throwing an exception&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;"msg"&lt;/SPAN&gt;&lt;SPAN&gt;,from_protobuf(col(&lt;/SPAN&gt;&lt;SPAN&gt;"value"&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;message_name&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;desc_file_path&lt;/SPAN&gt;&lt;SPAN&gt;, {&lt;/SPAN&gt;&lt;SPAN&gt;"mode"&lt;/SPAN&gt;&lt;SPAN&gt; : &lt;/SPAN&gt;&lt;SPAN&gt;"PERMISSIVE"&lt;/SPAN&gt;&lt;SPAN&gt;}) )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 09 Nov 2023 09:03:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/handling-schema-mismatch-in-dlt-pipeline-with-from-protobuf/m-p/50718#M6092</guid>
      <dc:creator>serelk</dc:creator>
      <dc:date>2023-11-09T09:03:31Z</dc:date>
    </item>
  </channel>
</rss>

