cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job running on Attacama Profiler takes long to complete or crashes

anardinelli
Databricks Employee
Databricks Employee

If you're trying to run the Ataccama profiler on tables with multiple joins or are incredibly large, please note that there are some processes on the Ataccama profiler that will lead to bad performance issues.

If you are having jobs that are crashing or running during multiple hours, please check:

1. Running aggregations profiles could lead to grouping data in single partitions which can shuffle a lot of data through the worker nodes. Operations such as groupByKey and sortByKey are costly and not optimized on the Ataccama tool. Please increase worker memory size if you see too much data is being shuffled on the sparkUI stages tab.

2. Run OPTIMIZE on your Delta tables that are being profiled.

3. If you're running multiple joins during the profiling process, please join the tables first, outside the profiling data flow, and run the profile on the joined table, after running OPTIMIZE on the final Delta Table.

4. Please check for spot instance terminations on the cluster "Event Log" page. If there are isntances being terminated, please use another instance type or transform them to On-demand.

5. Disable some of the profiling processes on the Ataccama tool.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group