<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lakeflow SDP (DLT) produce external tables, or only UC-managed in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-dlt-produce-external-tables-or-only-uc-managed/m-p/159664#M54814</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Streaming Tables &amp;amp; Materialized Views supported in SDP are a form of Managed Tables. More details &lt;A href="https://docs.databricks.com/aws/en/ldp/concepts#datasets" target="_self"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;create_streaming_table()&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;has a&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;path&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;parameter but specifying it creates a managed table with a custom &lt;STRONG&gt;storage location&lt;/STRONG&gt; (not an external table). It uses the managed storage location for the schema containing the table if its not set in the code.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;SDP is designed exclusively for managed table output. You need a post-processing step&amp;nbsp;to get external tables or any of below.&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Downstream Job with CREATE EXTERNAL TABLE - &lt;/STRONG&gt;You can create an external table on top of the gold table at the designated external location&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Schema Level Managed Storage -&amp;nbsp;&lt;/STRONG&gt;You can use pre configured schema level paths if the goal is predictable storage location. Its still managed tables you can set at catalog/schema level but stored at designated location&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;SDP to Staging with External Final Write -&amp;nbsp;&lt;/STRONG&gt;You can use SDP to create managed staging tables and use a separate job with create external tables to read from staging with appropriate changes&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Gold SCD 2 Dimension&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;You can&amp;nbsp;&lt;STRONG&gt;stay with managed tables&lt;/STRONG&gt; unless you have specific requirements as its the best option. It gives automatic optimization, faster queries &amp;amp; uc managed maintenance.&lt;BR /&gt;You can use external tables only if you have l&lt;SPAN&gt;egacy systems requiring direct file access from cloud storage paths or data shared with non-UC systems. You can use&amp;nbsp;Downstream Job approach&amp;nbsp;if external tables are required.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 18 Jun 2026 09:17:39 GMT</pubDate>
    <dc:creator>balajij8</dc:creator>
    <dc:date>2026-06-18T09:17:39Z</dc:date>
    <item>
      <title>Lakeflow SDP (DLT) produce external tables, or only UC-managed</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-dlt-produce-external-tables-or-only-uc-managed/m-p/159656#M54813</link>
      <description>&lt;P&gt;As I understand it, streaming tables and materialized views produced by Lakeflow Spark Declarative Pipelines (DLT) are always Unity Catalog managed tables , there's no LOCATION/path option on create_streaming_table or apply_changes.&lt;/P&gt;&lt;P&gt;Is that correct? A few questions:&lt;/P&gt;&lt;P&gt;Is there any supported way to make an SDP output a true external table?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If not, what are the recommended design patterns when you need the data as external — e.g. a designated&lt;/P&gt;&lt;P&gt;managed storage location at the catalog/schema level, a pipeline sink writing to an external path, or a downstream job that re-writes to CREATE TABLE ... LOCATION?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Any reason to prefer one pattern over another for a Gold SCD2 dimension?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jun 2026 08:02:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-dlt-produce-external-tables-or-only-uc-managed/m-p/159656#M54813</guid>
      <dc:creator>nidhin</dc:creator>
      <dc:date>2026-06-18T08:02:48Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP (DLT) produce external tables, or only UC-managed</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-dlt-produce-external-tables-or-only-uc-managed/m-p/159664#M54814</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Streaming Tables &amp;amp; Materialized Views supported in SDP are a form of Managed Tables. More details &lt;A href="https://docs.databricks.com/aws/en/ldp/concepts#datasets" target="_self"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;create_streaming_table()&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;has a&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;path&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;parameter but specifying it creates a managed table with a custom &lt;STRONG&gt;storage location&lt;/STRONG&gt; (not an external table). It uses the managed storage location for the schema containing the table if its not set in the code.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;SDP is designed exclusively for managed table output. You need a post-processing step&amp;nbsp;to get external tables or any of below.&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Downstream Job with CREATE EXTERNAL TABLE - &lt;/STRONG&gt;You can create an external table on top of the gold table at the designated external location&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Schema Level Managed Storage -&amp;nbsp;&lt;/STRONG&gt;You can use pre configured schema level paths if the goal is predictable storage location. Its still managed tables you can set at catalog/schema level but stored at designated location&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;SDP to Staging with External Final Write -&amp;nbsp;&lt;/STRONG&gt;You can use SDP to create managed staging tables and use a separate job with create external tables to read from staging with appropriate changes&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Gold SCD 2 Dimension&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;You can&amp;nbsp;&lt;STRONG&gt;stay with managed tables&lt;/STRONG&gt; unless you have specific requirements as its the best option. It gives automatic optimization, faster queries &amp;amp; uc managed maintenance.&lt;BR /&gt;You can use external tables only if you have l&lt;SPAN&gt;egacy systems requiring direct file access from cloud storage paths or data shared with non-UC systems. You can use&amp;nbsp;Downstream Job approach&amp;nbsp;if external tables are required.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jun 2026 09:17:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-dlt-produce-external-tables-or-only-uc-managed/m-p/159664#M54814</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-06-18T09:17:39Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP (DLT) produce external tables, or only UC-managed</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-dlt-produce-external-tables-or-only-uc-managed/m-p/159687#M54815</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/18143"&gt;@nidhin&lt;/a&gt;,&lt;/P&gt;
&lt;P data-pm-slice="1 1 []"&gt;What you’re saying is basically correct for a Unity Catalog-enabled Lakeflow Spark Declarative Pipelines setup. In that model, pipelines publish streaming tables and materialized views into the target catalog and schema, the data is stored using the managed storage location for the containing schema or catalog, and the docs explicitly say that the LOCATION property is not supported when defining the table. That means there isn’t a supported way to make the pipeline-managed output itself a true Unity Catalog external table with its own LOCATION clause. See &lt;A href="https://docs.databricks.com/aws/en/ldp/unity-catalog" rel="noopener noreferrer nofollow" target="_blank"&gt;Use Unity Catalog with pipelines&lt;/A&gt;.&lt;/P&gt;
&lt;P class="wnfdntt _1ibi0s3f5 _1ibi0s3ce _1ibi0s3ea"&gt;So if the requirement is really "I want control over where the managed data lives," the recommended pattern is to set a managed storage location at the schema or catalog level and let the pipeline write there. That keeps you on the happy path for SDP while still giving you predictable storage placement. The same Unity Catalog pipelines doc above is the key reference for that behaviour.&lt;/P&gt;
&lt;P class="wnfdntt _1ibi0s3f5 _1ibi0s3ce _1ibi0s3ea"&gt;If the requirement is "I need external systems to consume this dataset," I would usually not jump straight to rewriting the table as an external table. Databricks now recommends either &lt;A href="https://docs.databricks.com/aws/en/external-access/external-for-pipelines" rel="noopener noreferrer nofollow" target="_blank"&gt;external data access for streaming tables and materialized views&lt;/A&gt; for modern clients, or &lt;A href="https://docs.databricks.com/aws/en/external-access/compatibility-mode" rel="noopener noreferrer nofollow" target="_blank"&gt;Compatibility Mode&lt;/A&gt; for older or path-oriented clients. External data access is the nicer option when the client supports the REST APIs because it avoids a full data copy and gives read-after-write consistency, whereas Compatibility Mode creates a read-only copy at a chosen location and is better when broad client compatibility matters more than immediacy or storage efficiency. The high-level guidance is also summarised in &lt;A href="https://docs.databricks.com/aws/en/external-access/" rel="noopener noreferrer nofollow" target="_blank"&gt;Access Databricks data using external systems&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;If you truly need an actual external Delta table as the contract, I'd frame that as a downstream serving pattern rather than as the SDP-owned output. In other words, let SDP own the authoritative managed table, then populate a separate external table from it with a downstream job if you have a hard requirement for CREATE TABLE ... LOCATION. That keeps the pipeline aligned with the documented UC model and isolates the "external table" decision to a serving layer instead of your core transformation layer.&lt;/P&gt;
&lt;P class="wnfdntt _1ibi0s3f5 _1ibi0s3ce _1ibi0s3ea"&gt;For a Gold SCD2 dimension, my default preference would be to keep the authoritative Gold table as the native UC-managed SDP output and only add an external-facing representation if there is a concrete downstream need. If users just need to read it externally, external data access is usually the best first choice, with Compatibility Mode as the fallback for older clients. I'd only choose a downstream rewrite into a separate external table when you specifically need cloud-path-based ownership or an external-table contract that other platforms depend on. This will give you the best balance of correctness, governance, and operational simplicity for an SCD2 dimension.&lt;/P&gt;
&lt;P class="wnfdntt _1ibi0s3f5 _1ibi0s3ce _1ibi0s3ea"&gt;Hope this helps.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jun 2026 10:20:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-dlt-produce-external-tables-or-only-uc-managed/m-p/159687#M54815</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-06-18T10:20:28Z</dc:date>
    </item>
  </channel>
</rss>

