cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to see sample data in Hive Metastore after moving to GCE

MBV3
Contributor

Hi,

We have recently moved from GKE to GCE, it is taking forever to load the sample data in the manged delta tables.

Even running simple select sql statements are taking forever. Totally clueless here, any help will be appreciated

Thanks

6 REPLIES 6

Louis_Frolio
Databricks Employee
Databricks Employee

I recommend checking the Spark UI to identify where the bottleneck is, or, if you are using Databricks SQL, reviewing the query analyzer.

Lou

nayan_wylde
Honored Contributor III

Immediate Things to Check:  

Resource Allocation: GCE instances might be under-provisioned compared to your GKE cluster. Check if your current GCE instances have sufficient CPU, memory, and disk I/O capacity. Delta Lake operations are typically memory-intensive.

Network Configuration: Ensure your GCE instances are in the same region/zone as your data storage (Cloud Storage buckets). Cross-region data access can dramatically slow performance.

Monitor system resources during query execution:

  • CPU utilization
  • Memory usage
  • Disk I/O wait times
  • Network throughput

     

     

    Quick Wins

    Optimize Table Structure: Run OPTIMIZE on your Delta tables to consolidate small files:

     


    OPTIMIZE your_table_name;

     

    Update Statistics: Ensure table statistics are current:

     


    ANALYZE TABLE your_table_name COMPUTE STATISTICS;



MBV3
Contributor

Hi BigRoux,

Thanks for your reply.

spark UI stage shows the following message, although query has been running for over 5 min

 

Details for Stage 17 (Attempt 0)

Summary Metrics

No tasks have started yet

Tasks

No tasks have started yet

szymon_dybczak
Esteemed Contributor III

Hi @MBV3 ,

Could you also attach compute logs? Go to Compute -> Click on your compute -> Driver logs

Hi szymon_dybczak,

Thanks for looking into this. PFA various Driver logs

MBV3
Contributor

Hi All,

Strangely after struggle for 2 days we figured out that we can't run the cluster in scalable mode, so after selecting single node mode we are able to execute queries and job. It seems there is a bug in the Databrick's GKE to GCE migration. Wondering if anyone else had faced this kind of issue?

Thanks

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now