Okay so there are a few options I could find:
1. Create a store procedure on that when creating CDC tables, which creates an intermediary clean table and runs CDC off of that. That table can use triggers to keep data in sync (better for lower volume).
2. If creating pipelines through DABS/API, you can actually change your compute in more detail. Although I have not tested it since doing option #1, you might be able to enable column mapping in the spark_conf section, and run with that.