cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks regression test suite

_deepak_
New Contributor II

Hi, I am new to Databricks and setting up the non-prod environment. I am wanted to know, IS there any way by which I can run a regression suite so that existing setup should not break in case of any feature addition and also how can I make available prod datas in non-prod? Shallow copy?

4 REPLIES 4

Anonymous
Not applicable

@deepak prasad​ :

Yes, you can run regression tests to ensure that your changes do not break existing functionality. Databricks supports a number of testing frameworks like PyTest, which can be used to automate regression testing. You can write test cases that cover different scenarios and use cases of your application and run them automatically after each code change.

To make production data available in non-production, you can use a number of techniques such as database replication, backup and restore, or data cloning. One approach could be to take regular backups of your production databases and restore them in non-production environments. You can also use data masking and obfuscation techniques to protect sensitive data in non-production environments. Another approach is to use a data virtualization platform that can create a virtualized copy of the production data on demand, without actually copying the data. This can help reduce the storage requirements in non-production environments.

_deepak_
New Contributor II

Hi @Suteja Kanuri​ ,

I can create the testcases using any framework may be pytest or great_expectation, But how to run regression after any code changes. Is there any blog or documentation for the non-prod setup or regression running? Can you please share some references for this?

Anonymous
Not applicable

@deepak prasad​ :

Here you go

  1. https://docs.greatexpectations.io/docs/
  2. You can search here - https://www.databricks.com/blog

grkseo7
New Contributor II

Regression testing after code changes can be automated easily. Once you’ve created test cases with Pytest or Great Expectations, you can set up a CI/CD pipeline using tools like Jenkins or GitHub Actions. For a non-prod setup, Docker is great for replicating the environment consistently.

If you're looking for more details, this blog might help: Regression Testing and Stat Studio. It explains tools and processes for smoother regression testing.

Hope this helps! Let me know if you have any specific questions.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now