cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to find that given Parquet file got imported into Bronze Layer ?

Devsql
New Contributor

Hi Team,

Recently we had created new Databricks project/solution (based on Medallion architecture) having Bronze-Silver-Gold Layer based tables. So we have created Delta-Live-Table based pipeline for Bronze-Layer implementation. Source files are Parquet files located on ADLS location ( External Location ). DLT-Pipeline reads PARQUET files from this External Location and imports data into _RAW and _APPEND_RAW ( Streaming tables ).

What we found that Parquet files are getting created serially at External Location but Bronze-Job ( a DLT based pipeline ) , running in Continuous mode, is Not able to import data from Parquet files into _raw tables.

As alternative approach, I did row-count on _RAW table, as shown below, and found that records are present for the date when we Turned-ON Bronze-DLT-Pipeline ( which is running Continuously ).

SELECT bronze_landing_date, Count(*)

FROM abc_raw

GROUP BY bronze_landing_date

As Job is running since last 10 days, we should get 10 rows of 10 Dates but I am only getting 1 row ( the date on which Job got started).

So I would like to know that How to find that given Parquet file got imported into Bronze Layer !!!

Also Is there anything we are missing in settings part for Bronze-DLT-Pipeline ?

Any pointers would be greatly appreciated.

1 REPLY 1

raphaelblg
New Contributor III
New Contributor III

Hello @Devsql ,

It appears that you are creating DLT bronze tables using a standard spark.read operation. This may explain why the DLT table doesn't include "new files" during a REFRESH operation.

For incremental ingestion of bronze layer data into your DLT pipeline and tables, we recommend using Autoloader. You can find more information in the following documents:

- DLT Update Modes (Full Refresh/Refresh): https://docs.databricks.com/en/delta-live-tables/updates.html
- Autoloader: https://docs.databricks.com/en/ingestion/auto-loader/index.html#what-is-auto-loader

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks