cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Databricks Academy Learners
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Help needed with automated file ingestion process for full refresh mode using autoloader preferable

DB_Learner_3
New Contributor

Description:
As a data engineer, I need to implement an automated process to ingest data from multiple files in a subdirectory and create corresponding bronze tables. This process should handle full file refreshes and consider strategies to limit the growth of the bronze streaming tables.

Acceptance Criteria:

  1. Identify Files: The process can identify all files in a specified subdirectory, handling both CSV

  2. Build Table Definitions: The process can automatically generate table creation SQL statements based on the schema inferred from the input files.

  3. Implement Bronze Table Ingestion: The process can ingest data from each file into a corresponding bronze table, handling full file refreshes (streaming then Materialized views)

  4. Optimize Bronze Table Growth: The process includes a strategy to limit the growth of the bronze tables, such as materialized views, bronze table with truncate/merge, or partitioning.

  5. Provide Reusable and Maintainable Code: The ingestion process is implemented as a reusable Python script Can someone help on this

1 REPLY 1

BigRoux
Databricks Employee
Databricks Employee

What do you mean by "full file refreshes"?  Does this refer to the fact that file names will be reused?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group