The dataframe 'big_df' looks like the below
| id| index| timestamp|
|:---- |:------:| -----:|
| abc| 1| 11:00:00|
| abc| 1| 11:00:10|
| abc| 1| 11:00:20|
| abc| 1| 11:00:30|
| abc| 1| 11:00:40|
| abc| 1| 11:00:50|
| abc| 2| 11:01:00|
| abc| 2| 11:01:10|
| abc| 2| 11:01:20|
| def| 1| 23:00:00|
| def| 1| 23:01:00|
| xyz| 1| 15:00:00|
| xyz| 1| 15:01:00|
| xyz| 1| 15:02:00|
| xyz| 1| 15:03:00|
| xyz| 1| 15:04:00|
| xyz| 1| 15:05:00|
| xyz| 2| 15:06:00|
| xyz| 2| 15:07:00|
| xyz| 3| 15:10:00|
There is a function 'fun1' which takes a dataframe as input.
Each unique combination of columns 'id' and 'index' in big_df is a small dataframe that needs to be passed to the function fun1.
How can this function be applied across multiple of the small dataframes in parallel?
Can it be achieved using the foreachpartition and if so how?