Thanks to everyone who joined the Best Practices for Your Data Architecture session on Getting Workloads to Production using CI/CD. You can access the on-demand session recording here, and the code in the Databricks Labs CI/CD Templates Repo.
Posted below is a subset of the questions asked and answered throughout the session. Please feel free to ask follow-up questions or add comments as threads.
Q: What are examples of scheduling Notebooks with Airflow?
Check out the blog detailing the integration between Databricks and Airflow and read the docs with examples (AWS | Azure | GCP). Also, take a look at the Multitask Jobs capabilities, which is a Databricks-Native jobs scheduler.
Q: Will AWS MWAA also work with notebooks?
Yes, the docs show that Databricks Connection is available for AWS MWAA.
Q: Unit Testing and Integration testing - are there frameworks for testing notebooks?
The session has an example leveraging a framework using Nutter and pytest. Here are a couple of links to the documentation for you to take a look at:
1. https://github.com/microsoft/nutter [integration testing]
2. https://docs.pytest.org/en/6.2.x/ [unit testing]
There certainly are other frameworks depending on what code you're testing and the nature of the tests you are conducting, but we like these frameworks due to the tools’ simplicity and open source nature.
Q: Is it possible to integrate MLFlow to deploy models artifact within this CI/CD process?
Yes, please take a look at this blog, Using MLOps with MLflow and Azure.
Add your follow-up questions to threads!