cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why aren't my Delta Live Tables stored in the expected folder structure in ADLS?

AR3
New Contributor

I set up an Azure Data Lake Storage (ADLS) account with containers named metastore, bronze, silver, gold, and source. I created a Unity Catalog metastore in Databricks via the admin console, and I created a container called metastore in my Data Lake. I defined external locations for each container (e.g., abfss://bronze@<storage_account>.dfs.core.windows.net/) and created a catalog without specifying a location, assuming it would use the metastore's default location. I also created schemas (bronze, silver, gold) and assigned each schema to the corresponding container's external location (e.g., bronze schema mapped to the bronze container).

In my source container, I have a folder structure: customers/customers.csv.

I built a Delta Live Tables (DLT) pipeline with the following configuration:

-- Bronze table

CREATE OR REFRESH STREAMING TABLE my_catalog.bronze.customers

AS

SELECT *, current_timestamp() AS ingest_ts, _metadata.file_name AS source_file

FROM STREAM read_files(

'abfss://source@<storage_account>.dfs.core.windows.net/customers',

format => 'csv'

);

-- Silver table

CREATE OR REFRESH STREAMING TABLE my_catalog.silver.customers

AS

SELECT *, current_timestamp() AS process_ts

FROM STREAM my_catalog.bronze.customers

WHERE email IS NOT NULL;

-- Gold materialized view

CREATE OR REFRESH MATERIALIZED VIEW my_catalog.gold.customers

AS

SELECT count(*) AS total_customers

FROM my_catalog.silver.customers

GROUP BY country;

  • Why are my tables stored under this unity/schemas/<schema_id>/tables/<table_id> structure instead of directly in customers/parquet_files with a _delta_log folder in the respective containers?

  • How can I configure my DLT pipeline or Unity Catalog setup to ensure the tables are stored in the bronze, silver, and gold containers with a folder structure like customers/parquet_files and _delta_log?

  • In industry-level projects, how do teams typically manage table storage locations and folder structures in ADLS when using Unity Catalog and Delta Live Tables? Are there best practices or common configurations to ensure a clean, predictable folder structure for bronze, silver, and gold layers?

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @AR3 ,

I think DLT up until recently supported only a managed tables. Now they rebranded it to Lakeflow Declarative Pipelines and add option called Lakeflow Declarative Pipelines sinks. 

Lakeflow Declarative Pipelines sinks are targets for Lakeflow Declarative Pipelines flows. By default Lakeflow Declarative Pipelines flows emit data to either a streaming table or materialized view target. These are both Azure Databricks managed Delta tables. Lakeflow Declarative Pipelines sinks are an alternative target that you use to write transformed data to targets such as event streaming services like Apache Kafka or Azure Event Hubs, and external tables managed by Unity Catalog.

But as of now, only Python API is supported:

szymon_dybczak_0-1752741093857.png

 

Use sinks to stream records to external services with Lakeflow Declarative Pipelines - Azure Databri...