Databricks Community

MadelynM · ‎11-08-2021

Thanks to everyone who joined the Best Practices for Your Data Architecture session on Getting Workloads to Production using CI/CD. You can access the on-demand session recording here, and the code in the Databricks Labs CI/CD Templates Repo.

Posted below is a subset of the questions asked and answered throughout the session. Please feel free to ask follow-up questions or add comments as threads.

Q: What are examples of scheduling Notebooks with Airflow?

Check out the blog detailing the integration between Databricks and Airflow and read the docs with examples (AWS | Azure | GCP). Also, take a look at the Multitask Jobs capabilities, which is a Databricks-Native jobs scheduler.

Q: Will AWS MWAA also work with notebooks?

Yes, the docs show that Databricks Connection is available for AWS MWAA.

Q: Unit Testing and Integration testing - are there frameworks for testing notebooks?

The session has an example leveraging a framework using Nutter and pytest. Here are a couple of links to the documentation for you to take a look at:

1. https://github.com/microsoft/nutter [integration testing]

2. https://docs.pytest.org/en/6.2.x/ [unit testing]

There certainly are other frameworks depending on what code you're testing and the nature of the tests you are conducting, but we like these frameworks due to the tools’ simplicity and open source nature.

Q: Is it possible to integrate MLFlow to deploy models artifact within this CI/CD process?

Yes, please take a look at this blog, Using MLOps with MLflow and Azure.

Add your follow-up questions to threads!

Chris_Shehu · ‎11-12-2021

Would it be possible to get the power point that was used for this? There are several embedded links that would be beneficial but cannot be accessed from a video. Thanks!

MadelynM · ‎11-18-2021

Here's the embedded links list!

Jobs scheduling and orchestration

Built-in job scheduling: https://docs.databricks.com/jobs.html#schedule-a-job
- Periodic scheduling of the jobs
- Execute notebook / jar / Python script / Spark-submit
Multitask Jobs
- Execute notebook / jar / Python script / Spark-submit
Contrib module in Airflow
- Execute notebook / jar / Python script

Development interface resources