But isn’t that a hard disadvantage compared to yarn clusters?And the way I understood workflows (and the team behind the UI component among other things), we clearly shall reuse the same compute cluster and run parallel tasks.If I would run spark-sub...
Hello,In the past I used rdd.mapPartitions(lambda ...) to call functions that access third party APIs like azure ai translate text to batch call the API and return the batched data.How would one do this now?
Hi,as you have many files, I have a suggestion do not use spark to read them in all at once as it will slow down greatly.instead use boto3 for the file listing, distribute the list across the cluster and again use boto3 to fetch the files and compact...