cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Optimising Clusters in Databricks on GCP

ashraf1395
Contributor II

Hi there everyone,

We are trying to get hands on Databricks Lakehouse for a prospective client's project.

Our Major aim for the project is to Compare Datalakehosue on Databricks and Bigquery Datawarehouse in terms of Costs and time to setup and run queries.

We have created projects and tested in multiple data sizes (250 Gb and 1.3 Tb), and we had a great experience and are looking to build our expertise around Databricks Lakehouse.

We had some questions regarding cluster configurations. While working with 1.3 Tb data , using cluster size of 32 Gb , 4 Cores ,Personal Compute cluster. the time taken to read data(parquet) from gcp bucket and converting it into a delta table was 5+ hours. Then we did some optimisations with code and partitioned it and read it in multiple chunks and reduced the time to 3.5 hours but still when compared to Bigquery which takes 15 mins there is a huge difference.

We figured out that bigquery uses serverless compute while in databricks we are using cluster of very less size So, is there any way 

- to find correct cluster configurations for specific amount of data (like calculators or rough estimates)

- any technical blogs where we can get more idea about this

- or any other tips for reducing time. 

We found about serverless databricks clusters both for SQL and notebooks but I think they are supported in Paid account and we are still in our trial period.

 

 

 

 

1 REPLY 1

Thankyou so much Kaniz.
These resources will really help to optimise my clusters. Will reach out if I face any issues

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group