โ07-03-2025 05:33 AM
Hi,
We have recently moved from GKE to GCE, it is taking forever to load the sample data in the manged delta tables.
Even running simple select sql statements are taking forever. Totally clueless here, any help will be appreciated
Thanks
โ07-03-2025 06:19 AM
I recommend checking the Spark UI to identify where the bottleneck is, or, if you are using Databricks SQL, reviewing the query analyzer.
Lou
โ07-03-2025 06:41 AM - edited โ07-03-2025 06:45 AM
Immediate Things to Check:
Resource Allocation: GCE instances might be under-provisioned compared to your GKE cluster. Check if your current GCE instances have sufficient CPU, memory, and disk I/O capacity. Delta Lake operations are typically memory-intensive.
Network Configuration: Ensure your GCE instances are in the same region/zone as your data storage (Cloud Storage buckets). Cross-region data access can dramatically slow performance.
Monitor system resources during query execution:
Optimize Table Structure: Run OPTIMIZE on your Delta tables to consolidate small files:
OPTIMIZE your_table_name;
Update Statistics: Ensure table statistics are current:
ANALYZE TABLE your_table_name COMPUTE STATISTICS;
โ07-03-2025 06:45 AM
โ07-03-2025 06:58 AM
Hi @MBV3 ,
Could you also attach compute logs? Go to Compute -> Click on your compute -> Driver logs
โ07-03-2025 07:50 AM
Hi szymon_dybczak,
Thanks for looking into this. PFA various Driver logs
โ07-04-2025 11:48 AM
Hi All,
Strangely after struggle for 2 days we figured out that we can't run the cluster in scalable mode, so after selecting single node mode we are able to execute queries and job. It seems there is a bug in the Databrick's GKE to GCE migration. Wondering if anyone else had faced this kind of issue?
Thanks
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now