During office hours, I asked one year ago to add purge cluster logs to API, and it was not even considered. I thought to set selenium to do that.
You can limit logging for the cluster by playing with log4j, for example, to put something like .sh script below on dbfs as the start script for the cluster (you need additionally specify log properties to be adjusted for drivers and executors):
#!/bin/bash
echo "Executing on Driver: $DB_IS_DRIVER"
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties"
else
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties"
fi
echo "Adjusting log4j.properties here: ${LOG4J_PATH}"
echo "log4j.<custom-prop>=<value>" >> ${LOG4J_PATH}
In the notebook, you can disable logging by using:
sc.setLogLevel("OFF");
Additionally, for cluster config, you can set for delta files:
spark.databricks.delta.logRetentionDuration 3 days
spark.databricks.delta.deletedFileRetentionDuration 3 days