Hi Team,
Recently we had created new Databricks project/solution (based on Medallion architecture) having Bronze-Silver-Gold Layer based tables. So we have created Delta-Live-Table based pipeline for Bronze-Layer implementation. Source files are Parquet files located on ADLS location ( External Location ). DLT-Pipeline reads PARQUET files from this External Location and imports data into _RAW and _APPEND_RAW ( Streaming tables ).
What we found that Parquet files are getting created serially at External Location but Bronze-Job ( a DLT based pipeline ) , running in Continuous mode, is Not able to import data from Parquet files into _raw tables.
As alternative approach, I did row-count on _RAW table, as shown below, and found that records are present for the date when we Turned-ON Bronze-DLT-Pipeline ( which is running Continuously ).
SELECT bronze_landing_date, Count(*)
FROM abc_raw
GROUP BY bronze_landing_date
As Job is running since last 10 days, we should get 10 rows of 10 Dates but I am only getting 1 row ( the date on which Job got started).
So I would like to know that How to find that given Parquet file got imported into Bronze Layer !!!
Also Is there anything we are missing in settings part for Bronze-DLT-Pipeline ?
Any pointers would be greatly appreciated.