Two Issues:
1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?
2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint? How does that affect the behavior of cloudFiles.inferColumnTypes?
Discussion:
1. I see example notebooks from databricks that use inferColumnTypes both WITH inferSchema: delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databrick... and WITHOUT inferSchema: delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databrick...
What is the use case for using both or only one of them? I would think that using both together is redundant and just creates unnecessary compute overhead. Except I find that's not necessarily true from my explorations on the behavior of these options.
2. Schema checkpoints: are they necessary or not?
All the documentation I find on cloudFiles.inferColumnTypes says that when using it, you must also define a schema checkpoint: Configure schema inference and evolution in Auto Loader - Azure Databricks | Microsoft Learn
However, I see some example notebooks from databricks that depict using cloudFiles.inferColumnTypes = True without ever defining a schema checkpoint:
- delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databrick...
- delta-live-tables-notebooks/change-data-capture-example/notebooks/2-Retail_DLT_CDC_Python.py at ma...