cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Source to Bronze Organization + Partition

ChristianRRL
Valued Contributor

Hi there, I hope I have what is effectively a simple question. I'd like to ask for a bit on guidance if I am structuring my source-to-bronze auto loader data properly. Here's what I have currently:

/adls_storage/<data_source_name>/<category>/autoloader/<oem_shortname>/<linted_database_shortname>/<linted_table_name>/source/<project_id_partition>/<full_file_name>.csv

For this example, I'm trying to set up a source-to-*bronze* pipeline. Some examples I've seen online are a bit closer to the following:

.../autoloader/<oem>/<database>/<table>/source/project_id={x}/*.csv (aka: original raw data)
.../autoloader/<oem>/<database>/<table>/bronze (aka: the ingested bronze data)
.../autoloader/<oem>/<database>/<table>/checkpoint (still a bit unfamiliar with this one)
.../autoloader/<oem>/<database>/<table>/schema (i.e. keep track of current or evolving schema)

At the end of the day, I'm essentially wanting to have a clean database.table format that looks like the following:

{data_source_name}_{category}.bronze_{oem}_{database}_{table}
...
{data_source_name}_{category}.silver_{database}_{table}

Does this seem right or am I missing anything?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group