Databricks Community

yusufd · ‎06-21-2024

Hi,

I was looking for comprehensive documentation on implementing serialization in pyspark, most of the places I have seen is all about serialization with scala. Could you point out where I can get a detailed explanation on it?

yusufd · ‎06-26-2024

This is awesome. Thank you for replying.

I want to ask one more thing before we close this, in Scala-spark java serialization is default and additionally we have kryo serialization as well which is better. So these are not applicable in pyspark if i get correctly. Kindly confirm.

yusufd · ‎07-01-2024

This is great to know!

Thank you for the explanation.

yusufd · ‎06-26-2024

This is awesome. Thank you for replying.

I want to ask one more thing before we close this, in Scala-spark java serialization is default and additionally we have kryo serialization as well which is better. So, can we use them in pyspark as well?

Another important thing, the code below creates a sparkcontext on local, that doesnt work on databricks. When I try to change the sparkcontext arguments, i get an error , attached screenshot, how can we resolve this, ultimately i dont want to run spark locally, but on databricks. Would appreciate if you answer this.

Thanks for the support.

yusufd · ‎07-01-2024

@Retired_mod Could you clarify on my query? Eagerly awaiting response.

yusufd · ‎07-01-2024

Thank you @Retired_mod for the prompt reply. This clears the things and also distinguishes between spark-scala and pyspark. Appreciate your explanation. Will apply this and also share any findings based on this which will help the community!

Databricks Community

Pyspark serialization

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences