Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Once you've selected a cluster that makes sense, run it and check your ganglia metrics to see whether you need a compute, memory, or storage optimized cluster and then iterate from there.
To just see if your code works, starting with a small set of data on a single node is best practice.
Personnaly, once my data processing is optimize, i benchmark different setโups to find the one that respect my process time goal for the less dbu. (Sorry for my english)
Once you've selected a cluster that makes sense, run it and check your ganglia metrics to see whether you need a compute, memory, or storage optimized cluster and then iterate from there.
To just see if your code works, starting with a small set of data on a single node is best practice.
Great article. In the future serverless option will make it easier for newbies.
Connect with Databricks Users in Your Area
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.