cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

When is it time to change from ETL in notebooks to whl/py?

Forsen
New Contributor

Hi!
I would like some input/tips from the community regarding when is it time to go from a working solution in notebooks to something more "stable", like whl/py-files?

What are the pros/cons with notebooks compared to whl/py?

The way i structured things now is that i use notebooks as a orchestrator. The code is built as modules in py-files and just imported to the notebook. Everything needed for the etl to work is a config-file(yml or json), so nothing is hardcoded.

Thanks in advance 🙂

1 REPLY 1

Isi
New Contributor III

Hey @Forsen ,

My advice:

Using .py files and .whl packages is generally more secure and scalable, especially when working in a team. One of the key advantages is that code reviews and version control are much more efficient with .py files, as changes can be properly tracked via pull requests.

While notebooks can have permissions set for reading and version control, they are often harder to manage in collaborative environments. A common issue is that people forget to remove unnecessary display() statements or collect(), which makes reviewing and debugging easier in a notebook but is considered bad practice in production. In addition, a single "," inserted in the notebook accidentally can make your production job fail.

Advantages of .py and .whl over notebooks:

Better version control & code reviews (easier to track changes and enforce coding standards).
Better modularization & reusability (separating logic into reusable components).
Easier CI/CD integration (you can automate testing, packaging, and deployment).
More structured and maintainable codebase (better organization and scalability).

Disadvantages:

Harder debugging compared to notebooks (notebooks allow quick testing and visualization).
Steeper learning curve for new users who are used to interactive workflows.

Given your current setup, where you use notebooks only as orchestrators and keep your logic in .py modules, you already have a good balance. The next step could be fully transitioning orchestration to workflows (like Airflow or Databricks Jobs) and packaging your code into .whl files for better maintainability.

🙂

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group