Hi databricks experts. I am currently facing a problem with a submitted job run on Azure Databricks. Any help on this is very welcome. See below for details:
Problem Description:
I submitted a python spark task via the databricks cli (v0.16.4) to Azure Databricks REST API (v2.0) to run on a new job cluster. See atteched job.json for the cluster configuration. The job runs successfully and all outputs are generated as expected. Despite that, the job failes with an error message saying that "The output of the notebook is too large".
My questions regarding this problem are:
- Why is the job submitted as a spark python task displaying an error message related to notebook tasks ?
- Why is the job failing even though the log output does not exceed to limit ? (See below for details)
What did I expect to see:
Successful completion of the job with no errors
What did I see:
The job failed with an Error Message displaying "Run result unavailable: job failed with error message The output of the notebook is too large."
Already done steps:
1. Consulted Azure and databricks documentation for a possible error cause. See:
According to documentation this error occurs, if the stdout logs exceed 20 MB.
Actual stdout log output size: 1.8 MB
2. Increased py4j log level to reduce stdout log output
logging.getLogger("py4j.java_gateway").setLevel(logging.ERROR)
Reduced stdout log output size: 390 KB
3. Used log4j to write application logs