Filter not using partition
I have the following code:spark.sparkContext.setCheckpointDir("dbfs:/mnt/lifestrategy-blob/checkpoints") result_df.repartitionByRange(200, "IdStation") result_df_checked = result_df.checkpoint(eager=True) unique_stations = result_df.select("IdStation...
- 2655 Views
- 3 replies
- 1 kudos
it seems like there is a filter being apply according to this. Filter (isnotnull(IdStation#2678) AND (IdStation#2678 = 1119844)) I would like to share the following notebook that covers in detail this topic, in case you would like to check it out h...
- 1 kudos