Monday
I have a few doubts regarding AutoLoader behavior and capabilities. Please check and correct wherever my assumptions or understanding are incorrect, much appreciated. Below is my specific code
Example scenario:
Target Managed Delta Table (Type Widening enabled) has a field - 'Quantity' , Type - int.
My code:
Let's say on our first run we ingest a file which has all rows with quantity field as int, so my assumption our stored schema 0 should have quantity inferred as an int.
On our next run we load a file which has float values in the quantity column. So tell me, in this case will our target delta table automatically widen to accomodate this new file data which is a float? Before our writestream happens in the readstream portion itself my understanding is that since the latest schema it has mentions quantity with type int, the readstream will try to parse the float values against the int type and will not be able to do so these will end in rescued data and hence the target table quantity field will never see these float values. Please confirm this behavior.
Also as an additional question, does Autoloader ever parse floats to int (Ex : 4.5 -> 4 OR 2.0 -> 2) and if not, why?
Thanks,
Parth
2 hours ago
1. No, Auto Loader does not provide an option to automatically widen column types. Its schema evolution modes are:
addNewColumns → Adds new nullable columns.
rescue → Captures unexpected fields in _rescued_data.
failOnNewColumns → Stops the stream on schema drift.
Type widening is a Delta Lake feature, not an Auto Loader feature. You can enable it on the target table with
ALTER TABLE <table_name> SET TBLPROPERTIES ('delta.enableTypeWidening' = 'true');
2. Yes, Delta Live Tables (DLT) uses Auto Loader under the hood for streaming ingestion. The default behavior is the same:
3. Suggested approach for unstable upstream schemas 100s of columns
When explicit schemas or hints are impractical, consider these strategies:
yesterday
yesterday
Thanks for the response. As a final verification, my understanding is that Autoloader schema inference and schema evolution only tracks when colums are added in our schema, which then get get stored as the new schema version. In all the rest cases - column drops, type mismatches (such as float to an int col) the schema remains the same and will get rescued or ignored as per our rescue column settings. Therefore even if our target delta table has type widening enabled - since our schema doesn’t widen itself our target delta table will not get widened via autoloader unless we manually update our schema column data type to allow accurate parsing.
1) Could you tell me if there is any configuration/option/parameter/setting to override this behavior and force schema data type widening to columns?
2) Is a same behavior seen even if we are working with Delta Live Tables?
3) And as sort of a inspiration is there any approach you would suggest to handle such schema changes when we are working with upstream data (100s of columns) that keeps on changing so providing an explicit schema or schema hints is counterproductive, and our initial inference may result in too narrow types causing mismatches in the future.
Apologies for the long questions, I am trying to understand Autoloader functions and capabilities to handle such edge cases when upstream is not stable and clean.
Thanks,
Parth
2 hours ago
1. No, Auto Loader does not provide an option to automatically widen column types. Its schema evolution modes are:
addNewColumns → Adds new nullable columns.
rescue → Captures unexpected fields in _rescued_data.
failOnNewColumns → Stops the stream on schema drift.
Type widening is a Delta Lake feature, not an Auto Loader feature. You can enable it on the target table with
ALTER TABLE <table_name> SET TBLPROPERTIES ('delta.enableTypeWidening' = 'true');
2. Yes, Delta Live Tables (DLT) uses Auto Loader under the hood for streaming ingestion. The default behavior is the same:
3. Suggested approach for unstable upstream schemas 100s of columns
When explicit schemas or hints are impractical, consider these strategies:
2 hours ago
This provides alot of clarity. Thank you.
an hour ago
Thank you @nayan_wylde for the details. This is really useful.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now