3 weeks ago
I created a Dataflow Gen2 to get data from Databricks. I can see the preview data very quickly (around 5 seconds). But when I run the dataflow, it takes 8 hours and then cancels with a timeout. Iโm trying to get 8 tables with the same schema. Six of them work fine with no problems, but with two of them Iโm experiencing the issue I just described. The table sizes are around 50 MB.
What can I do to solve this issue?
3 weeks ago
Hi,
Here are a list of the likely causes and some steps to remediate.
1. Table-Specific Data and File Layout Issues
count(*) take significantly longer due to metastore and partition lookups.2. Connector/Networking Misconfiguration
3. Access Controls and Storage Account Settings
4. Databricks Runtime or Cluster Configuration
5. Table Statistics and Query Plan Optimization
6. Metadata/Partition Discovery Overhead
A. Convert Tables to Delta Format and Optimize File Sizes
OPTIMIZE command regularly on Delta tables to merge small files and use ZORDER BY on frequently filtered/joined columns.B. Review Partitioning and File Layout
C. Update Table Statistics
ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS; after any large table update to aid query planning and reduce scan times.D. Check Dataflow and Connector Configuration
E. Upgrade Databricks Runtime and Use Photon
F. Compare Schema, Partitioning, and File Layout with โWorkingโ Tables
3 weeks ago
Our data engineering team already worked in theses actions. It worked when I filtered the tables in Microsoft Fabriq using the Power Query below:
3 weeks ago
My suspicion is it's timing out as the data is not well optimized or too big to retrieve. When you filter down it makes it easier to read the data.
3 weeks ago
Do you think the Databricks cluster that Microsoft Fabric is connected to needs more capacity?
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now