Part of the problem here is that .gz files are not splittable. If you have 1 huge 100GB .gz file, it can only be processed by one task. Can you change your input to use a splittable compression like .bz2? it'll work much better.
Welcome to Databricks Community: Lets learn, network and celebrate together
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.