Hi Alessandro,
Thank you for your help and suggestion!
For the second point, Iโm looking to analyze the memory utilization over the duration of the job. Specifically, I want to know the average & total memory used during a single job run compared to the total memory available in that specific cluster - set by prior configuration. However, any additional useful metrics (like per worker) that I can access in the notebook would also be appreciated.
I'm thinking of creating a Delta table to save these statistics to. I'd like to run performance tests for specific use cases and want to see how certain metrics change with different types of clusters used for a certain amounts of records to have a baseline. Later, we plan to find a way to integrate this into our CI/CD pipeline to optionally track how much our changes could affect the baseline performance on an "approximate" level.