cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Pycharm and Unit Testing

Gubbanoa
New Contributor II

What is currently the best way of doing unit testing from pycharm into databricks? I have previously used databricks connect. However after upgrades, and now that even unit catalog has become a requirement, it appears quirky. Is it possible to use the new pycharm databricks plugin for running pytests inside databricks?

 

3 REPLIES 3

florence023
New Contributor III

@Gubbanoa wrote:

What is currently the best way of doing unit testing from pycharm into databricks? I have previously used databricks connect. However after upgrades, and now that even unit catalog has become a requirement, it appears quirky. Is it possible to use the new pycharm databricks plugin for running pytests inside databricks? Official Site

 


Hello,

 

Unit testing in Databricks can indeed be a bit tricky, especially with recent updates. Here are some current best practices and options for integrating unit testing from PyCharm into Databricks:

Databricks Connect:
While Databricks Connect has been a popular choice, recent updates and requirements like the Unity Catalog have introduced some quirks. If you still prefer using Databricks Connect, ensure you have the latest version and check the Databricks Connect documentation for any new configurations.
PyCharm Databricks Plugin:
The new PyCharm Databricks plugin can be used to run PyTests inside Databricks. This plugin allows you to connect to your Databricks workspace directly from PyCharm, making it easier to manage and run your tests. You can find more details and setup instructions in the PyCharm Databricks plugin documentation.
Unit Testing for Notebooks:
Databricks provides built-in support for unit testing within notebooks. You can organize your functions and their unit tests within the same notebook or in separate notebooks. This approach is detailed in the Databricks documentation.
Using GitHub Repositories:
There are several GitHub repositories that provide sample setups for unit testing in Databricks. For example, the databricks-unit-testing repository offers sample PySpark functions and PyTest unit tests. Another useful tool is Nutter, a testing framework specifically designed for Databricks notebooks.
Best Practices for PySpark:
For PySpark-specific unit testing, you can follow best practices such as creating isolated test environments, using mock data, and integrating tests into CI/CD pipelines. More details can be found in the Databricks documentation.
By leveraging these tools and practices, you can streamline your unit testing process in Databricks and ensure your code remains robust and reliable.

Hope this will help you.
Best regards,

Please, I need links, and examples. I have already searched the documentation and I am coming up short, maybe I just missed it, but I cannot find. For example the documentation for the new plugin seems to have nothing on how to run pytest on the cluster (Databricks | PyCharm Documentation (jetbrains.com))

You refer to github repositories, can you please provide a link? I can only find jonathanneo/databricks-unit-testing: Unit testing using databricks connect (github.com) which is an old repo, using databricks connect. I would like to see a repo with examples using the new plugin.

You refer to Nutter. Unfortunately it seems to also be an abandoned project - and to lean on that kind of repos for critical infrastructure would not be an option.


But anyways: regards!


Gubbanoa
New Contributor II

In fact I get the following input from Jetbrains:

Mikhail Tarabrikov (IntelliJ)

Aug 30, 2024, 17:22 GMT+2

Hi,
 
Thank you for the clarification!
 

What is the best way how to run junit tests with pycharm in databricks?

 
Unfortunately, there is currently a limited support of custom run configurations which would allow running specific scripts in Databricks cluster. This feature is on the list and is planned for the upcoming releases.
 
We're constantly improving Databricks support; with each release, many features and improvements are added for a better user experience. If you have more specific suggestions on how we can improve our IDE, you can create a feature request on our tracker. Then our developers and other members of the community can comment/vote.


Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group