cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error during deserializing protobuf data

Sambit_S
New Contributor III

I am receiving protobuf data in a json attribute and along with it I receive a descriptor file.

I am using from_protobuf to deserialize the data as below,

Sambit_S_0-1713966940987.png

It works most of the time but giving error when there are some recursive fields within the protobuf.

I tried the property as suggested by the spark doc https://spark.apache.org/docs/latest/sql-data-sources-protobuf.html#:~:text=A%20recursive.,issues%20....

Even if I set the propertyrecursive.fields.max.depth to its max value 10 and my protobuf data contains 4 levels of recursive data, it errors out.

Any help here will be appreciated.

 

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Sambit_S, Handling recursive fields in Protobuf can indeed be tricky, especially when deserializing data.

Let’s explore some potential solutions to address this issue:

  1. Casting Issue with Recursive Fields: The error you’re encountering might be related to casting issues when dealing with recursive fields. When the object being treated changes from the class generated through protoc to a RepeatedField, the recursive mechanism fails. One approach to address this is to handle repeated fields (and maps) as separate branches in your ge...1.

  2. Delta Table Error: If you’re using Delta tables, there’s an issue related to recursive field handling. When writing to a Delta table, you might encounter an error related to nested NullType in a column. The proposed fix is to drop the recursive field when the limit is reached instead of using NullType2.

  3. Workaround for Protobuf Import Error in Python 3.6: If you’re facing import errors after compilation in Python 3.6, there’s a workaround. After running protoc, use a sed script to add relative imports to the generated Python files3.

  4. GRPC Build Issue with MSVC: If you encounter build issues related to GRPC and error C2370, there’s an open issue on GitHub. You can follow the repro steps provided by the community to address this problem4.

Remember to adapt these solutions to your specific use case and verify whether they resolve your issue. If you need further assistance, feel free to ask! 😊

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!