Hi Team,
I am working with huge volume of data (50GB) and i decompose the time series data using the statsmodel.
Having said that the major challenge i am facing is the compatibility of the pyspark dataframe with the machine learning algorithms. although the pysaprk.ml is available but many of the models are not available.
My question is on a broader level ,How can we handle this compatibility issue ?
Converting pyspark to pandas has not been proven to be very efficient.