Hello,In the past I used rdd.mapPartitions(lambda ...) to call functions that access third party APIs like azure ai translate text to batch call the API and return the batched data.How would one do this now?
Hi,as you have many files, I have a suggestion do not use spark to read them in all at once as it will slow down greatly.instead use boto3 for the file listing, distribute the list across the cluster and again use boto3 to fetch the files and compact...