Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

BF7
Contributor

Two Issues:

1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?

2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?  How does that affect the behavior of cloudFiles.inferColumnTypes?

Discussion:

1. I see example notebooks from databricks that use inferColumnTypes both WITH inferSchema: delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databrick...    and WITHOUT inferSchema: delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databrick...

What is the use case for using both or only one of them? I would think that using both together is redundant and just creates unnecessary compute overhead. Except I find that's not necessarily true from my explorations on the behavior of these options.

2. Schema checkpoints: are they necessary or not?

All the documentation I find on cloudFiles.inferColumnTypes says that when using it, you must also define a schema checkpoint: Configure schema inference and evolution in Auto Loader - Azure Databricks | Microsoft Learn

However, I see some example notebooks from databricks that depict using cloudFiles.inferColumnTypes = True without ever defining a schema checkpoint:  

delta-live-tables-notebooks/dms-dlt-cdc-demo/resources/dlt/dms-mysql-cdc-demo.py at main · databrick...

- delta-live-tables-notebooks/change-data-capture-example/notebooks/2-Retail_DLT_CDC_Python.py at ma...