Working with pyspark dataframe with machine learning libraries / statistical model libraries
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-11-2025 10:10 PM
Hi Team,
I am working with huge volume of data (50GB) and i decompose the time series data using the statsmodel.
Having said that the major challenge i am facing is the compatibility of the pyspark dataframe with the machine learning algorithms. although the pysaprk.ml is available but many of the models are not available.
My question is on a broader level ,How can we handle this compatibility issue ?
Converting pyspark to pandas has not been proven to be very efficient.
0 REPLIES 0

