Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
Part of the problem here is that .gz files are not splittable. If you have 1 huge 100GB .gz file, it can only be processed by one task. Can you change your input to use a splittable compression like .bz2? it'll work much better.
Join Us as a Local Community Builder!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!