cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

workflow job

chari
Contributor

Hi ,

When I create a job of a machine learning model and run the job I see that the cell outputs do not get updated. The model variables would have updated, however.  I also need to keep the notebook updated with cell outputs always when I run the job.

Could anybody suggest any tips?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @chari, Certainly! It sounds like youโ€™re encountering a common issue when running machine learning jobs in notebooks. 

 

Letโ€™s explore some tips to address this:

 

Cell Outputs Not Updating:

  • When you run a cell in a notebook, the output is typically displayed below the cell. However, if you re-run the cell without clearing the previous output, the new output wonโ€™t replace the old one.
  • To ensure that cell outputs update correctly, consider the following:
    • Clear Output: Before re-running a cell, manually clear its output. You can do this by right-clicking on the cell and selecting โ€œClear Outputs.โ€
    • Re-Run All Cells: Sometimes, dependencies between cells can cause issues. Try re-running all cells in the notebook (e.g., by selecting โ€œRun Allโ€ from the menu) to ensure consistent results.

Keeping Notebook Outputs Updated:

  • If you want to keep the notebook outputs updated every time you run a job, consider the following strategies:
    • Logging: Explicitly log relevant information (such as model performance metrics, intermediate results, or debugging messages) using a logging library (e.g., Pythonโ€™s logging module). This way, you can review the logs even after re-running the notebook.
    • Markdown Cells: Use Markdown cells to document important details about the job, including any relevant outputs. You can add explanatory text, tables, or visualizations directly in Markdown cells.
    • Widgets and Interactive Outputs: Consider using interactive widgets (e.g., ipywidgets) to display dynamic outputs. Widgets allow you to create interactive elements (like sliders, dropdowns, or buttons) that update based on user input or changes in variables.
    • Save Outputs to Files: If your outputs are large or complex (e.g., model predictions), save them to files (e.g., CSV, JSON, or images). Then, load these files when needed, ensuring that the most recent results are available.

Version Control and Notebooks:

  • If youโ€™re working collaboratively or need to track changes over time, consider using version control (e.g., Git) for your notebooks. Committing changes allows you to compare different versions and understand how outputs evolve.
  • Note that notebooks can sometimes have large binary outputs (e.g., plots or images). Storing these outputs in version control repositories may not be ideal due to their size.

Automated Pipelines:

  • For production workflows, consider moving away from manual notebook execution to automated pipelines. Tools like Apache Airflow, Prefect, or Kubeflow Pipelines allow you to define and schedule data processing and ML workflows.
  • These pipelines can handle data ingestion, preprocessing, model training, and deployment, ensuring consistent results and reproducibility.

Feel free to share more details about your specific use case or any attached picture, and Iโ€™ll be happy to provide further insights! ๐Ÿ˜Š

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @chari, Certainly! It sounds like youโ€™re encountering a common issue when running machine learning jobs in notebooks. 

 

Letโ€™s explore some tips to address this:

 

Cell Outputs Not Updating:

  • When you run a cell in a notebook, the output is typically displayed below the cell. However, if you re-run the cell without clearing the previous output, the new output wonโ€™t replace the old one.
  • To ensure that cell outputs update correctly, consider the following:
    • Clear Output: Before re-running a cell, manually clear its output. You can do this by right-clicking on the cell and selecting โ€œClear Outputs.โ€
    • Re-Run All Cells: Sometimes, dependencies between cells can cause issues. Try re-running all cells in the notebook (e.g., by selecting โ€œRun Allโ€ from the menu) to ensure consistent results.

Keeping Notebook Outputs Updated:

  • If you want to keep the notebook outputs updated every time you run a job, consider the following strategies:
    • Logging: Explicitly log relevant information (such as model performance metrics, intermediate results, or debugging messages) using a logging library (e.g., Pythonโ€™s logging module). This way, you can review the logs even after re-running the notebook.
    • Markdown Cells: Use Markdown cells to document important details about the job, including any relevant outputs. You can add explanatory text, tables, or visualizations directly in Markdown cells.
    • Widgets and Interactive Outputs: Consider using interactive widgets (e.g., ipywidgets) to display dynamic outputs. Widgets allow you to create interactive elements (like sliders, dropdowns, or buttons) that update based on user input or changes in variables.
    • Save Outputs to Files: If your outputs are large or complex (e.g., model predictions), save them to files (e.g., CSV, JSON, or images). Then, load these files when needed, ensuring that the most recent results are available.

Version Control and Notebooks:

  • If youโ€™re working collaboratively or need to track changes over time, consider using version control (e.g., Git) for your notebooks. Committing changes allows you to compare different versions and understand how outputs evolve.
  • Note that notebooks can sometimes have large binary outputs (e.g., plots or images). Storing these outputs in version control repositories may not be ideal due to their size.

Automated Pipelines:

  • For production workflows, consider moving away from manual notebook execution to automated pipelines. Tools like Apache Airflow, Prefect, or Kubeflow Pipelines allow you to define and schedule data processing and ML workflows.
  • These pipelines can handle data ingestion, preprocessing, model training, and deployment, ensuring consistent results and reproducibility.

Feel free to share more details about your specific use case or any attached picture, and Iโ€™ll be happy to provide further insights! ๐Ÿ˜Š

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group