cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Bug in Delta Live Tables when missing files option?

Juhani
New Contributor II

When using Delta Live Tables with SQL-syntax ignoreMissingFiles-option is not working and you are getting error anyway.(See picture below)

imageLink to feature:

https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/options#generic-options

Here is the SQL –code for Bronze -layer

CREATE OR REFRESH TEMPORARY STREAMING LIVE TABLE bronze_InventSite
 
 
 
AS 
 
 
 
SELECT 
 
 
 
_c0 as LastProcessedChange_DateTime
 
,_c1 as DataLakeModified_DateTime
 
,_c2 as Start_LSN
 
,_c3 as End_LSN
 
,_c4 as DML_Action
 
,_c5 as Seq_Val
 
,_c6 as Update_Mask
 
,_c7 as RECID
 
,_c8 as DefaultDimension
 
,_c9 as DefaultInventStatusId
 
,_c10 as Name
 
,_c11 as OrderEntryDeadlineGroupId
 
,_c12 as SiteId
 
,_c13 as TaxBranchRefRecId
 
,_c14 as Timezone
 
,_c15 as IsReceivingWarehouseOverrideAllowed
 
,_c16 as DataAreaId
 
,_c17 as PARTITION
 
,_c18 as RECVERSION
 
,_c19 as SYSROWVERSIONNUMBER
 
FROM cloud_files("/mnt/d365fscm/ChangeFeed/InventSite", "csv",map("delimiter", ",", "header", "false", "ignoreMissingFiles", "true"))

And here is SQL- code for SCD2 table:

CREATE OR REFRESH STREAMING LIVE TABLE silver_InventSite
 
;
 
 
 
APPLY CHANGES INTO
 
 live.silver_InventSite
 
FROM
 
 stream(live.bronze_InventSite)
 
KEYS
 
 (RECID)
 
IGNORE NULL UPDATES
 
SEQUENCE BY
 
 Start_LSN /*Start_LSN*/
 
STORED AS
 
 SCD TYPE 2;

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

You could also use inferSchema. ignoreMissingFiles option is to handle files that were accidentally deleted before being fully processed, so it has nothing related to the schema.

View solution in original post

3 REPLIES 3

daniel_sahal
Esteemed Contributor

It looks like AutoLoader needs to have schema even if ignoreMissingFiles is set to true (if there's no file then it can't read one).

Try to specify schema using schemaLocation.

Juhani
New Contributor II

Thank you, I get it work like this:

CREATE OR REFRESH TEMPORARY STREAMING LIVE TABLE bronze_InventSite
 
 
 
AS 
 
 
 
SELECT 
 
 
 
_c0 as LastProcessedChange_DateTime
 
,_c1 as DataLakeModified_DateTime
 
,_c2 as Start_LSN
 
,_c3 as End_LSN
 
,_c4 as DML_Action
 
,_c5 as Seq_Val
 
,_c6 as Update_Mask
 
,_c7 as RECID
 
,_c8 as DefaultDimension
 
,_c9 as DefaultInventStatusId
 
,_c10 as Name
 
,_c11 as OrderEntryDeadlineGroupId
 
,_c12 as SiteId
 
,_c13 as TaxBranchRefRecId
 
,_c14 as Timezone
 
,_c15 as IsReceivingWarehouseOverrideAllowed
 
,_c16 as DataAreaId
 
,_c17 as PARTITION
 
,_c18 as RECVERSION
 
,_c19 as SYSROWVERSIONNUMBER
 
FROM cloud_files("/mnt/d365fscm/ChangeFeed/InventSite", "csv",map("delimiter", ",", "header", "false","schema","_c0 STRING,_c1 STRING,_c2 STRING,_c3 STRING,_c4 STRING,_c5 STRING,_c6 STRING,_c7 STRING,_c8 STRING,_c9 STRING,_c10 STRING,_c11 STRING,_c12 STRING,_c13 STRING,_c14 STRING,_c15 STRING,_c16 STRING,_c17 STRING,_c18 STRING,_c19 STRING"))

Hubert-Dudek
Esteemed Contributor III

You could also use inferSchema. ignoreMissingFiles option is to handle files that were accidentally deleted before being fully processed, so it has nothing related to the schema.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.