A proven pattern is to first land your new/changed data into a data lake (ADLS, S3,...) in f.e. parquet format. So for this you need some ETL tool which can read either the CDC tool, or by use of queries (based on a timestamp, date). The main goal is to get the new data in your data lake as fast as possible. This can be done with multiple tools (with Databricks one of them, or AWS Glue).
So I would advise to first have a pipeline which lands the raw data, and then a second one (and maybe even more) which processes that data. You could put the whole flow into one single pipeline, but the downside is that you do not have clear distinction between extracting the raw data and actually processing this data.
A good guideline is to think about which parts of the whole data flow should be able to run separately
(multiple pipelines might use the same data, but extracting this data should not run multiple times, so the extraction is a separate step).
Because you use incremental updates, delta lake is certainly a good option!
Concerning the Pii data: IMO the easiest approach is to tackle this as soon as possible in the data flow. f.e. directly in etl tool (by means of a select query or a view that does not display pii data)
If you have no need for streaming data, I'd start with batch processing. Get the hang of it (Databricks/Spark). If the need for streaming data arises later on, you can migrate from batch to streaming, which is possible.
Redshift or not... It depends. You can certainly serve data in databricks itself (certainly with the new SQL analytical environment). So I think it would be a matter of price/performance.
I myself f.e. use Azure Synapse (comparable to Redshift). But I am quite sure the same could be achieved with Databricks. So we might change this in the future. This will depend on price/performance.
Hope you can do something with this answer.