cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Managing IPYNB cell timestamps in source control

emorgoch
New Contributor II

We're in the process of converting over our Databricks notebooks from .py file to .ipynb. We have disabled storing notebook output in source control at the workspace level.

However, what we're discovering is that every cell in our notebooks has 3 timestamp fields that are being included as part of the cell metadata in the notebook code: startTime, finishTime, and submitTime. These 3 values get updated any time a cell is executed during development.

This is presenting an issue with source control and code reviews as they are getting marked as updated lines that need to be reviewed even though the code of an actual cell ay not have been changed.

Is there any way that these fields can either be excluded or wiped as a part of the commit process? We're primarily using the web IDE, but could also apply to scenarios using VSCode as well.

emorgoch_0-1781635989625.png

 

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @emorgoch,

Thanks for raising this. This appears to be a regression rather than expected behaviour. Internally, the issue has been identified around .ipynb handling in Git folders, and the intended fix is to stop serialising these execution timestamp fields when outputs are not being exported. So the behaviour youโ€™re seeing is being addressed and is not the intended long-term state. I don't have an ETA for this though.

At the moment, the documented controls focus on the notebook format itself and whether notebook outputs are committed, but there isnโ€™t a documented setting in the web UI that strips only specific cell metadata fields, such as startTime, finishTime, and submitTime, during commit.

If your main goal is to reduce review noise, the two supported options today are either to switch those notebooks to Databricks source format, which is more lightweight for version control, or to stay on .ipynb and manage whether outputs are included through the commit_outputs configuration. You can see the notebook format options here in the Manage notebook format docs, and the broader Git folders behaviour here in Databricks Git folders and Create and manage Git folders.

If you need to stay with .ipynb because you want richer notebook fidelity, dashboards, or visualisations, that format does support those features better than the source format. The tradeoff is that .ipynb is a richer representation, so it can be less clean in source control than plain source notebooks. The docs call out that source format is the simpler code-only representation, while .ipynb captures notebook structure and optional outputs.

For teams working primarily in the web IDE, there isnโ€™t a documented pre-commit hook mechanism in the standard Git folders flow to automatically wipe just those timestamp fields before commit. If you are using a local clone or Git CLI-based workflow, then a custom pre-commit hook outside Databricks could sanitise those fields before pushing, but that would be a Git-side workaround rather than a built-in Databricks setting.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***