cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

KrishZ
by Contributor
  • 10041 Views
  • 4 replies
  • 3 kudos

[Pyspark.Pandas] PicklingError: Could not serialize object (this error is happening only for large datasets)

Context: I am using pyspark.pandas in a Databricks jupyter notebook and doing some text manipulation within the dataframe..pyspark.pandas is the Pandas API on Spark and can be used exactly the same as usual PandasError: PicklingError: Could not seria...

  • 10041 Views
  • 4 replies
  • 3 kudos
Latest Reply
ryojikn
New Contributor III
  • 3 kudos

@Krishna Zanwar​ , i'm receiving the same error.​For me, the behavior is when trying to broadcast a random forest (sklearn 1.2.0) recently loaded from mlflow, and using Pandas UDF to predict a model.​However, the same code works perfectly on Spark 2....

  • 3 kudos
3 More Replies
parthibsg
by New Contributor II
  • 709 Views
  • 2 replies
  • 2 kudos

When to use Dataframes API over Spark SQL

Hello Experts,I am new to Databricks. Building data pipelines, I have both batch and streaming data.Should I use Dataframes API to read csv files then convert to parquet format then do the transformation? orwrite to table using CSV then use Spark SQL...

  • 709 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Parthib Rathnam​, Thank you for reaching out!Let us look into this for you, and we'll follow up with an update.

  • 2 kudos
1 More Replies
Labels