Hello Databricks Community,
in an IOT context we plan to ingest a large amount of JSON files (~2 Million per Day). The JSON files are in json lines format und need to be compressed on the IOT devices. We can provide suggestions for the type of compression that is optimal for ingesting these files.
The internet resources that we found suggest different compression formats that all have their pros and cons. We have currently looked at gzip and bzip2 compressions and it looks like bzip2 could be more performant than gzip.
Does anyone have experience with such a usecase and could provide some arguments in favor of a certain compression format or recommend other compression formats?
Thanks in advance!