Hi,as you have many files, I have a suggestion do not use spark to read them in all at once as it will slow down greatly.instead use boto3 for the file listing, distribute the list across the cluster and again use boto3 to fetch the files and compact...