05-06-2024 01:18 AM
Hi there everyone,
We are trying to get hands on Databricks Lakehouse for a prospective client's project.
Our Major aim for the project is to Compare Datalakehosue on Databricks and Bigquery Datawarehouse in terms of Costs and time to setup and run queries.
We have created projects and tested in multiple data sizes (250 Gb and 1.3 Tb), and we had a great experience and are looking to build our expertise around Databricks Lakehouse.
We had some questions regarding cluster configurations. While working with 1.3 Tb data , using cluster size of 32 Gb , 4 Cores ,Personal Compute cluster. the time taken to read data(parquet) from gcp bucket and converting it into a delta table was 5+ hours. Then we did some optimisations with code and partitioned it and read it in multiple chunks and reduced the time to 3.5 hours but still when compared to Bigquery which takes 15 mins there is a huge difference.
We figured out that bigquery uses serverless compute while in databricks we are using cluster of very less size So, is there any way
- to find correct cluster configurations for specific amount of data (like calculators or rough estimates)
- any technical blogs where we can get more idea about this
- or any other tips for reducing time.
We found about serverless databricks clusters both for SQL and notebooks but I think they are supported in Paid account and we are still in our trial period.
05-06-2024 01:30 AM
Hi @ashraf1395, Comparing Databricks Lakehouse and Google BigQuery is essential to make an informed decision for your project.
Let’s address your questions:
Cluster Configurations for Databricks:
Technical Blogs and Resources:
Reducing Query Time:
Serverless Databricks Clusters:
Remember that Databricks and BigQuery have different architectures and trade-offs. Databricks emphasizes flexibility, while BigQuery prioritizes ease of use and performance. Consider your specific use case and requirements when making your decision3.
Good luck with your project! 🚀
05-06-2024 01:30 AM
Hi @ashraf1395, Comparing Databricks Lakehouse and Google BigQuery is essential to make an informed decision for your project.
Let’s address your questions:
Cluster Configurations for Databricks:
Technical Blogs and Resources:
Reducing Query Time:
Serverless Databricks Clusters:
Remember that Databricks and BigQuery have different architectures and trade-offs. Databricks emphasizes flexibility, while BigQuery prioritizes ease of use and performance. Consider your specific use case and requirements when making your decision3.
Good luck with your project! 🚀
05-06-2024 06:18 AM
Thankyou so much Kaniz.
These resources will really help to optimise my clusters. Will reach out if I face any issues
05-06-2024 06:30 AM
Hi @ashraf1395, You're welcome! I'm glad the resources are helpful for optimizing your clusters. If you encounter any issues or have any questions in the future, feel free to reach out. I'm here to help.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group