cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Limitations of committing ipynb notebook output with Repos

adrianna2942842
New Contributor III

During my experimentation with the latest feature that allows including notebook output in a commit, I ran into a specific issue. While attempting to commit my recent changes, I encountered an error message stating "Error fetching Git status." Interestingly, this error vanishes when I reduce the number of tables generated using the "display(df)" function or limit the number of columns and rows within a table. It appears that there might be a restriction on the size of the output, but I couldn't locate any specific information about this limitation, apart from the file size limit (which is not applicable in this case).

I would greatly appreciate it if anyone who has faced a similar issue could provide further insights or additional information on this matter.

 

1 ACCEPTED SOLUTION

Accepted Solutions

adrianna2942842
New Contributor III

I've found that the restriction I've encountered isn't related to the file size within Repos, but rather the maximum file size that can be shown in the Azure Databricks UI. You can find this limitation documented at https://learn.microsoft.com/en-us/azure/databricks/repos/limits, and it's set at 10 MB.

View solution in original post

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi, @adrianna2942842The error message you are encountering, "Error fetching Git status," could be related to the size of the notebook output.

Databricks have a limitation on the notebook output size. Specifically, job clusters have a maximum notebook output size of 20 MB. If the output is more significant, it results in an error. This limit might be why reducing the number of tables generated using the "display(df)" function or limiting the number of columns and rows within a table helps mitigate the error.

When you are using multiple display() commands in your notebook, this increases the amount of output. Once the output exceeds 20 MB, the error occurs.

To avoid this, you can remove any unnecessary display() commands in your notebook. These can be useful for debugging but are not recommended for production jobs.

However, it's important to note that when you commit changes to a notebook, Databricks does not include the results with the notebook commit. All results are cleared before the commit is made. Therefore, the output size should not affect the commit process directly.

adrianna2942842
New Contributor III

I appreciate your quick answer! I am indeed aware of the 20MB size limit, although I initially believed it didn't apply here, given that my file is only 13.2 MB in size.

adrianna2942842
New Contributor III

I've found that the restriction I've encountered isn't related to the file size within Repos, but rather the maximum file size that can be shown in the Azure Databricks UI. You can find this limitation documented at https://learn.microsoft.com/en-us/azure/databricks/repos/limits, and it's set at 10 MB.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group