- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2023 06:15 AM
What is the problem?
I am getting this error every time I run a python notebook on my Repo in Databricks.
Background
The notebook where I am getting the error is a notebook that creates a dataframe and the last step is to write the dataframe to a Delta table already created in Databricks.
The dataframe created has approximately 16,000,000 records.
In thenotebook I don't have any display(), print(), ... command, only the creation of this dataframe through other created dataframes.
This notebook with the same amount of records was working a few days ago but now I am getting that error. I have been reading in other discussions in the chat and have seen that it could be a memory problem so I have taken the following steps:
- I have changed the configuration of the Cluster where I am running it. This configuration includes:
- Worker type: Standard_DS4_V2 28GB Memory, 8 Cores
- Driver type: Standard_DS5_V2 56GB Memory, 16 Cores
- Min workers: 2 and Max workers:8
- spark.databricks.io.cache.enabled true
- spark.databricks.driver.disableScalaOutput true
- I have run the notebook as part of a Job in order to use a Job Cluster.
- I deleted the part of the code where the data is copied into the existing delta table to check that the problem was not in that part and I still got the same error.
- I have tried restarting the cluster, stop attaching it and attach it to my notebook.
Could you help me? I don't know if the problem comes from the cluster configuration or from where, because a few days ago I was able to run the notebook without any problem.
Thank you so much in advance, I look forward to hearing from you.
- Labels:
-
Memory Size