cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Callum
by New Contributor II
  • 4829 Views
  • 3 replies
  • 2 kudos

Pyspark Pandas column or index name appears to persist after being dropped or removed.

So, I have this code for merging dataframes with pyspark pandas. And I want the index of the left dataframe to persist throughout the joins. So following suggestions from others wanting to keep the index after merging, I set the index to a column bef...

  • 4829 Views
  • 3 replies
  • 2 kudos
Latest Reply
Serlal
New Contributor III
  • 2 kudos

Hi!I tried debugging your code and I think that the error you get is simply because the column exists in two instances of your dataframe within your loop.I tried adding some extra debug lines in your merge_dataframes function:and after executing that...

  • 2 kudos
2 More Replies
Mado
by Valued Contributor II
  • 3021 Views
  • 4 replies
  • 2 kudos

Resolved! Pandas API on Spark, Does it run on a multi-node cluster?

Hi, I have a few questions about "Pandas API on Spark". Thanks for your time to read my questions1) Input to these functions are Pandas DataFrame or PySpark DataFrame?2) When I use any pandas function (like isna, size, apply, where, etc ), does it ru...

  • 3021 Views
  • 4 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi @Mohammad Saber​ , Pandas dataset lives in the single machine, and is naturally iterable locally within the same machine. However, pandas-on-Spark dataset lives across multiple machines, and they are computed in a distributed manner. It is difficu...

  • 2 kudos
3 More Replies
Labels