01-02-2023 05:12 AM
When using Delta Live Tables with SQL-syntax ignoreMissingFiles-option is not working and you are getting error anyway.(See picture below)
Link to feature:
https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/options#generic-options
Here is the SQL –code for Bronze -layer
CREATE OR REFRESH TEMPORARY STREAMING LIVE TABLE bronze_InventSite
AS
SELECT
_c0 as LastProcessedChange_DateTime
,_c1 as DataLakeModified_DateTime
,_c2 as Start_LSN
,_c3 as End_LSN
,_c4 as DML_Action
,_c5 as Seq_Val
,_c6 as Update_Mask
,_c7 as RECID
,_c8 as DefaultDimension
,_c9 as DefaultInventStatusId
,_c10 as Name
,_c11 as OrderEntryDeadlineGroupId
,_c12 as SiteId
,_c13 as TaxBranchRefRecId
,_c14 as Timezone
,_c15 as IsReceivingWarehouseOverrideAllowed
,_c16 as DataAreaId
,_c17 as PARTITION
,_c18 as RECVERSION
,_c19 as SYSROWVERSIONNUMBER
FROM cloud_files("/mnt/d365fscm/ChangeFeed/InventSite", "csv",map("delimiter", ",", "header", "false", "ignoreMissingFiles", "true"))
And here is SQL- code for SCD2 table:
CREATE OR REFRESH STREAMING LIVE TABLE silver_InventSite
;
APPLY CHANGES INTO
live.silver_InventSite
FROM
stream(live.bronze_InventSite)
KEYS
(RECID)
IGNORE NULL UPDATES
SEQUENCE BY
Start_LSN /*Start_LSN*/
STORED AS
SCD TYPE 2;
01-02-2023 07:28 AM
You could also use inferSchema. ignoreMissingFiles option is to handle files that were accidentally deleted before being fully processed, so it has nothing related to the schema.
01-02-2023 05:41 AM
It looks like AutoLoader needs to have schema even if ignoreMissingFiles is set to true (if there's no file then it can't read one).
Try to specify schema using schemaLocation.
01-02-2023 05:51 AM
Thank you, I get it work like this:
CREATE OR REFRESH TEMPORARY STREAMING LIVE TABLE bronze_InventSite
AS
SELECT
_c0 as LastProcessedChange_DateTime
,_c1 as DataLakeModified_DateTime
,_c2 as Start_LSN
,_c3 as End_LSN
,_c4 as DML_Action
,_c5 as Seq_Val
,_c6 as Update_Mask
,_c7 as RECID
,_c8 as DefaultDimension
,_c9 as DefaultInventStatusId
,_c10 as Name
,_c11 as OrderEntryDeadlineGroupId
,_c12 as SiteId
,_c13 as TaxBranchRefRecId
,_c14 as Timezone
,_c15 as IsReceivingWarehouseOverrideAllowed
,_c16 as DataAreaId
,_c17 as PARTITION
,_c18 as RECVERSION
,_c19 as SYSROWVERSIONNUMBER
FROM cloud_files("/mnt/d365fscm/ChangeFeed/InventSite", "csv",map("delimiter", ",", "header", "false","schema","_c0 STRING,_c1 STRING,_c2 STRING,_c3 STRING,_c4 STRING,_c5 STRING,_c6 STRING,_c7 STRING,_c8 STRING,_c9 STRING,_c10 STRING,_c11 STRING,_c12 STRING,_c13 STRING,_c14 STRING,_c15 STRING,_c16 STRING,_c17 STRING,_c18 STRING,_c19 STRING"))
01-02-2023 07:28 AM
You could also use inferSchema. ignoreMissingFiles option is to handle files that were accidentally deleted before being fully processed, so it has nothing related to the schema.