cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practice on how to set up a medallion architecture pipelines inside DAB

jeremy98
Contributor III

Hi Community,

My team and I are working on refactoring our folder repository structure. Currently, I have been placing pipelines related to the Medallion architecture inside a folder named notebook/. However, I believe they should be moved to src/ since we have also developed other pipelines that do not follow the Medallion architecture (Bronze, Silver, and Gold layers).

My main point is that some pipelines transform data from an old architecture into this new Medallion-based architecture, while others are developed exclusively within Databricks.

What do you think? Does this folder restructuring make sense?

 

src/
  - green_inference_pipeline/ (only new in the new architecture)
  - water_inference_pipeline/ (only new in the new architecture)
  - 00_bronze_to_01_silver_pt1.py (this is the pipeline used for ingesting data from the old architecture to our new one structure, HOW TO REFACTOR IT?)
  - 00_bronze_to_01_silver_pt2.py (this is the pipeline used for ingesting data from the old architecture to our new one structure, HOW TO REFACTOR IT?)
  - etc. with silver->gold and gold->portal

 

What would be intuitive names for pipelines that start from the Bronze layer but process only one table on a scheduled basis, with different schedules for different Bronze pipelines?

For example:

  • 00_bronze_fir_data_pipeline.py (runs daily at 1 AM)
  • 00_bronze_tiny_data_pipeline.py (runs daily at 2 AM)
  • 00_bronze_huge_data_pipeline.py (runs daily at 4 AM)

Do these naming conventions make sense, or would you suggest a more intuitive approach?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group