<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Dataflow Gen2 Timeout When Loading Databricks Tables in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142014#M2439</link>
    <description>&lt;P&gt;Our data engineering team already worked in theses actions. It worked when I filtered the tables in Microsoft Fabriq using the Power Query below:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Table.SelectRows(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Table.SelectColumns( #"Navigation 2"{[Name = &lt;/SPAN&gt;&lt;SPAN&gt;"defects"&lt;/SPAN&gt;&lt;SPAN&gt;, Kind = &lt;/SPAN&gt;&lt;SPAN&gt;"Table"&lt;/SPAN&gt;&lt;SPAN&gt;]}[Data], {&lt;/SPAN&gt;&lt;SPAN&gt;"column1"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column2"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column3"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column4"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column5"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column6"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column7"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column8"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column9"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column10"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column11"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column12"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column13"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column14"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column15"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column16"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column17"&lt;/SPAN&gt;&lt;SPAN&gt;} ),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;each&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;[station] = &lt;/SPAN&gt;&lt;SPAN&gt;"North"&lt;/SPAN&gt;&lt;SPAN&gt; or [station] = &lt;/SPAN&gt;&lt;SPAN&gt;"South"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;Do you know why can't I get the whole table??&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Tue, 16 Dec 2025 17:59:50 GMT</pubDate>
    <dc:creator>viniciusmartins</dc:creator>
    <dc:date>2025-12-16T17:59:50Z</dc:date>
    <item>
      <title>Dataflow Gen2 Timeout When Loading Databricks Tables</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/141986#M2434</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I created a Dataflow Gen2 to get data from Databricks. I can see the preview data very quickly (around 5 seconds). But when I run the dataflow, it takes 8 hours and then cancels with a timeout. I’m trying to get 8 tables with the same schema. Six of them work fine with no problems, but with two of them I’m experiencing the issue I just described. The table sizes are around 50 MB.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;What can I do to solve this issue?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Dec 2025 13:32:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/141986#M2434</guid>
      <dc:creator>viniciusmartins</dc:creator>
      <dc:date>2025-12-16T13:32:42Z</dc:date>
    </item>
    <item>
      <title>Re: Dataflow Gen2 Timeout When Loading Databricks Tables</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142006#M2438</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Here are a list of the likely causes and some steps to remediate.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;1. Table-Specific Data and File Layout Issues&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Small file problem: If the two problematic tables are comprised of many very small underlying Parquet files, Databricks may spend most of the execution time opening and closing files, leading to latency not observable in preview mode.&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Non-Delta/Parquet format: Unity Catalog documentation confirms non-Delta tables often suffer from slow partition discovery—simple queries like &lt;CODE class="p8i6j0f"&gt;count(*)&lt;/CODE&gt; take significantly longer due to metastore and partition lookups.&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Partitioning scheme: If these tables are heavily or inefficiently partitioned—or use partition columns with high cardinality or many small partitions—scan times increase dramatically.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;2. Connector/Networking Misconfiguration&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Dataflow tools often rely on optimized connectors and correct network permissions. Errors like “connection reset,” “connection timeout,” or firewall rules blocking VNET access can cause timeouts at runtime but may not impact the lightweight preview fetch.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;3. Access Controls and Storage Account Settings&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;If underlying ADLS Gen2 (Azure Data Lake Storage Gen2) permissions, hierarchical namespace, or managed identity setups differ between tables/schemas, data access retries may cause long delays. The preview typically samples small chunks and doesn’t trigger full scans, so permission or path issues may only appear on larger reads.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;4. Databricks Runtime or Cluster Configuration&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Outdated Databricks Runtime versions, insufficient compute, or missing Photon acceleration can impede processing, especially for queries involving many files or requiring shuffling data.
&lt;DIV class="_1ibi0s314 _1ibi0s3cl tk0j8o2 tk0j8o0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;5. Table Statistics and Query Plan Optimization&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;If statistics on the two tables are missing or stale, the Databricks optimizer may not generate an efficient query plan. Lack of statistics especially impacts read speed for larger tables.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;6. Metadata/Partition Discovery Overhead&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;When using non-Delta tables or tables with legacy formats, Unity Catalog incurs significant overhead fetching partition information and metadata, which can cripple runtime performance on partitioned tables.&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H4 class="_9k2iva0 p8i6j0c _1ibi0s312 heading4 _9k2iva1"&gt;Targeted Solutions and Recommendations&lt;/H4&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;A. Convert Tables to Delta Format and Optimize File Sizes&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Convert the two tables to Delta format if they’re not already; Delta offers substantial query speed-ups by limiting metadata scan and supporting auto-compaction.&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Run the &lt;CODE class="p8i6j0f"&gt;OPTIMIZE&lt;/CODE&gt; command regularly on Delta tables to merge small files and use &lt;CODE class="p8i6j0f"&gt;ZORDER BY&lt;/CODE&gt; on frequently filtered/joined columns.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;B. Review Partitioning and File Layout&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Check the number, size, and distribution of files underlying these tables. Ideally, files should be between 16MB and 1GB.&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Review partition columns: use low cardinality columns, and avoid partitioning under 1TB per Databricks guidance.&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;If partition count is very high, consider repartitioning with more appropriate column(s).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;C. Update Table Statistics&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Run &lt;CODE class="p8i6j0f"&gt;ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS;&lt;/CODE&gt; after any large table update to aid query planning and reduce scan times.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;D. Check Dataflow and Connector Configuration&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Ensure your Dataflow Gen2 uses the most efficient connector for Databricks (preferably with native Delta support).&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Confirm networking, firewall, and VNET configurations allow for rapid access to all tables (including paths, storage credentials, and managed identity assignments). Preview requests may not exercise all network paths or permissions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;E. Upgrade Databricks Runtime and Use Photon&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Use the latest Databricks Runtime for all jobs/clusters. Photon acceleration significantly boosts scan performance for SQL read and join operations.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;F. Compare Schema, Partitioning, and File Layout with “Working” Tables&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Investigate the six working tables versus the two slow ones. Differences in file counts, partition schemes, table format (Delta vs Parquet), or statistics can reveal root causes.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 16 Dec 2025 16:49:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142006#M2438</guid>
      <dc:creator>emma_s</dc:creator>
      <dc:date>2025-12-16T16:49:48Z</dc:date>
    </item>
    <item>
      <title>Re: Dataflow Gen2 Timeout When Loading Databricks Tables</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142014#M2439</link>
      <description>&lt;P&gt;Our data engineering team already worked in theses actions. It worked when I filtered the tables in Microsoft Fabriq using the Power Query below:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Table.SelectRows(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Table.SelectColumns( #"Navigation 2"{[Name = &lt;/SPAN&gt;&lt;SPAN&gt;"defects"&lt;/SPAN&gt;&lt;SPAN&gt;, Kind = &lt;/SPAN&gt;&lt;SPAN&gt;"Table"&lt;/SPAN&gt;&lt;SPAN&gt;]}[Data], {&lt;/SPAN&gt;&lt;SPAN&gt;"column1"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column2"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column3"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column4"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column5"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column6"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column7"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column8"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column9"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column10"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column11"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column12"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column13"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column14"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column15"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column16"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"column17"&lt;/SPAN&gt;&lt;SPAN&gt;} ),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;each&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;[station] = &lt;/SPAN&gt;&lt;SPAN&gt;"North"&lt;/SPAN&gt;&lt;SPAN&gt; or [station] = &lt;/SPAN&gt;&lt;SPAN&gt;"South"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;Do you know why can't I get the whole table??&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 16 Dec 2025 17:59:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142014#M2439</guid>
      <dc:creator>viniciusmartins</dc:creator>
      <dc:date>2025-12-16T17:59:50Z</dc:date>
    </item>
    <item>
      <title>Re: Dataflow Gen2 Timeout When Loading Databricks Tables</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142016#M2440</link>
      <description>&lt;P&gt;My suspicion is it's timing out as the data is not well optimized or too big to retrieve. When you filter down it makes it easier to read the data.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Dec 2025 18:03:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142016#M2440</guid>
      <dc:creator>emma_s</dc:creator>
      <dc:date>2025-12-16T18:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: Dataflow Gen2 Timeout When Loading Databricks Tables</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142018#M2441</link>
      <description>&lt;P&gt;Do you think the Databricks cluster that Microsoft Fabric is connected to needs more capacity?&lt;/P&gt;</description>
      <pubDate>Tue, 16 Dec 2025 18:55:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/dataflow-gen2-timeout-when-loading-databricks-tables/m-p/142018#M2441</guid>
      <dc:creator>viniciusmartins</dc:creator>
      <dc:date>2025-12-16T18:55:57Z</dc:date>
    </item>
  </channel>
</rss>

