Description:
As a data engineer, I need to implement an automated process to ingest data from multiple files in a subdirectory and create corresponding bronze tables. This process should handle full file refreshes and consider strategies to limit the growth of the bronze streaming tables.
Acceptance Criteria:
Identify Files: The process can identify all files in a specified subdirectory, handling both CSV
Build Table Definitions: The process can automatically generate table creation SQL statements based on the schema inferred from the input files.
Implement Bronze Table Ingestion: The process can ingest data from each file into a corresponding bronze table, handling full file refreshes (streaming then Materialized views)
Optimize Bronze Table Growth: The process includes a strategy to limit the growth of the bronze tables, such as materialized views, bronze table with truncate/merge, or partitioning.
Provide Reusable and Maintainable Code: The ingestion process is implemented as a reusable Python script Can someone help on this