If you're trying to run the Ataccama profiler on tables with multiple joins or are incredibly large, please note that there are some processes on the Ataccama profiler that will lead to bad performance issues.
If you are having jobs that are crashing or running during multiple hours, please check:
1. Running aggregations profiles could lead to grouping data in single partitions which can shuffle a lot of data through the worker nodes. Operations such as groupByKey and sortByKey are costly and not optimized on the Ataccama tool. Please increase worker memory size if you see too much data is being shuffled on the sparkUI stages tab.
2. Run OPTIMIZE on your Delta tables that are being profiled.
3. If you're running multiple joins during the profiling process, please join the tables first, outside the profiling data flow, and run the profile on the joined table, after running OPTIMIZE on the final Delta Table.
4. Please check for spot instance terminations on the cluster "Event Log" page. If there are isntances being terminated, please use another instance type or transform them to On-demand.
5. Disable some of the profiling processes on the Ataccama tool.