Thank you for the thorough response -- I'll look into some of these. For extra context, here are some constraints on the data:
All of the tables should be _mostly_ the same size and same keys as they are generated from processes that generate time series data (that is, the data source is a series of parquet tables in S3 that are sorted by time and each timestamp will _usually_ exist in all N tables, but not always). Before ingesting, these tables are sorted by timestamp, but Spark shuffles on ingesting because I run a cast from the unix timestamp to an ISO timestamp (and I'd really like to avoid this, if possible, but I'm not sure how aside from the `bucketBy` and `sortBy` you suggested earlier).
Unfortunately, I don't have control of the way these tables are generated right now, but in the future I intend these tables to be generated pre-joined rather than separated, since there's no value in having them separated.