<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading a protobuf file in a Databricks notebook in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/38529#M26662</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/85231"&gt;@Fiona&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;To use Protobuf with a descriptor file, you can reference the file that is available to your compute cluster. Here are the steps to do so:&lt;/P&gt;&lt;P&gt;1. Import the necessary functions:&lt;/P&gt;&lt;PRE&gt;from pyspark.sql.protobuf.functions import to_protobuf, from_protobuf&lt;/PRE&gt;&lt;P&gt;2. Specify the path to the descriptor file:&lt;/P&gt;&lt;PRE&gt;descriptor_file = "/path/to/proto_descriptor.desc"&lt;/PRE&gt;&lt;P&gt;3. Use&amp;nbsp;from_protobuf()&amp;nbsp;to cast a binary column to a struct:&lt;/P&gt;&lt;PRE&gt;proto_events_df = input_df.select(from_protobuf(input_df.value, "BasicMessage", descFilePath=descriptor_file).alias("proto"))&lt;/PRE&gt;&lt;P&gt;4. Use&amp;nbsp;to_protobuf()&amp;nbsp;to cast a struct column to binary:&lt;/P&gt;&lt;PRE&gt;proto_binary_df = proto_events_df.select(to_protobuf(proto_events_df.proto, "BasicMessage", descriptor_file).alias("bytes"))&lt;/PRE&gt;&lt;P&gt;Sources:&lt;BR /&gt;-&amp;nbsp;&lt;A href="https://docs.databricks.com/structured-streaming/protocol-buffers.html" target="_blank" rel="noopener noreferrer"&gt;https://docs.databricks.com/structured-streaming/protocol-buffers.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jul 2023 22:27:02 GMT</pubDate>
    <dc:creator>Priyanka_Biswas</dc:creator>
    <dc:date>2023-07-26T22:27:02Z</dc:date>
    <item>
      <title>Reading a protobuf file in a Databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/38295#M26595</link>
      <description>&lt;P&gt;I have proto files (offline data storage) that I'd like to read from a Databricks notebook. I found this documentation (&lt;A href="https://docs.databricks.com/structured-streaming/protocol-buffers.html" target="_blank"&gt;https://docs.databricks.com/structured-streaming/protocol-buffers.html&lt;/A&gt;), but it only covers how to read the protobuf data once the binary is already in a DataFrame. How do I read the binary data in in the first place?&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jul 2023 16:01:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/38295#M26595</guid>
      <dc:creator>Fiona</dc:creator>
      <dc:date>2023-07-24T16:01:11Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a protobuf file in a Databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/38529#M26662</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/85231"&gt;@Fiona&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;To use Protobuf with a descriptor file, you can reference the file that is available to your compute cluster. Here are the steps to do so:&lt;/P&gt;&lt;P&gt;1. Import the necessary functions:&lt;/P&gt;&lt;PRE&gt;from pyspark.sql.protobuf.functions import to_protobuf, from_protobuf&lt;/PRE&gt;&lt;P&gt;2. Specify the path to the descriptor file:&lt;/P&gt;&lt;PRE&gt;descriptor_file = "/path/to/proto_descriptor.desc"&lt;/PRE&gt;&lt;P&gt;3. Use&amp;nbsp;from_protobuf()&amp;nbsp;to cast a binary column to a struct:&lt;/P&gt;&lt;PRE&gt;proto_events_df = input_df.select(from_protobuf(input_df.value, "BasicMessage", descFilePath=descriptor_file).alias("proto"))&lt;/PRE&gt;&lt;P&gt;4. Use&amp;nbsp;to_protobuf()&amp;nbsp;to cast a struct column to binary:&lt;/P&gt;&lt;PRE&gt;proto_binary_df = proto_events_df.select(to_protobuf(proto_events_df.proto, "BasicMessage", descriptor_file).alias("bytes"))&lt;/PRE&gt;&lt;P&gt;Sources:&lt;BR /&gt;-&amp;nbsp;&lt;A href="https://docs.databricks.com/structured-streaming/protocol-buffers.html" target="_blank" rel="noopener noreferrer"&gt;https://docs.databricks.com/structured-streaming/protocol-buffers.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2023 22:27:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/38529#M26662</guid>
      <dc:creator>Priyanka_Biswas</dc:creator>
      <dc:date>2023-07-26T22:27:02Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a protobuf file in a Databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/38671#M26708</link>
      <description>&lt;P&gt;Hi! Yeah, I think I understand everything about that, but I don't know how to create "input_df" given a file of multiple protobuf records, if that makes sense&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jul 2023 13:36:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/38671#M26708</guid>
      <dc:creator>Fiona</dc:creator>
      <dc:date>2023-07-28T13:36:50Z</dc:date>
    </item>
    <item>
      <title>Re: Reading a protobuf file in a Databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/44438#M27651</link>
      <description>&lt;P&gt;If you have proto files in offline data storage, you should be able to read them with:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;input_df = spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"binaryFile"&lt;/SPAN&gt;&lt;SPAN&gt;).load(data_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Sep 2023 20:00:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-protobuf-file-in-a-databricks-notebook/m-p/44438#M27651</guid>
      <dc:creator>StephanK</dc:creator>
      <dc:date>2023-09-11T20:00:36Z</dc:date>
    </item>
  </channel>
</rss>

