โ08-20-2021 10:23 AM
Hi ML folks,
We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland.
Debugging in Databricks is awkward. We ended up doing all the development on local machines and when the code is "mature" we start playing with Databricks + MLFlow to train the model. We use Azure not only for databricks, but also for data. However, we ended up having a "security hole" with this approach. IT staff want to remove this type of permission from local machines, which will create a challenge for the ML team...
Google search didn't do much to find a good reference that works as a guideline for doing "complex" python-based model development and training in databricks. Any suggestion is welcome!
Cheers
โ09-10-2021 05:10 AM
Hello @MCostaโ thanks for posting this question. We are actively looking into how to make this a better experience for you. Can you please drop me a line at bilal dot aslam at databricks dot com and I will be happy to set up a call.
โ09-09-2021 01:48 PM
Unfortunately right now we are limited to the Notebooks for Python code. We are looking into other options like having hosted IDEs, but there is no release date yet. I would suggest for now trying to use an IDE on local as you have been doing and then syncing to the notebooks using the Repos gIt functionality. I hope you will be bale to develop locally!
โ09-10-2021 05:10 AM
Hello @MCostaโ thanks for posting this question. We are actively looking into how to make this a better experience for you. Can you please drop me a line at bilal dot aslam at databricks dot com and I will be happy to set up a call.
โ09-12-2021 05:59 AM
@MCostaโ , if you are using python, I assume that you have a case where you have a complex udf and you want to understand what happens in every line there, perhaps even in complex call stacks. Is that correct?
โ12-01-2022 03:11 PM
I also have questions about debugging, since I don't have the option of using my local machine due to data security. I am very shocked as a seasoned developer to learn that there is no proper IDE or even debugging tools in Databricks, just notebooks. Is this really the case, or am I missing something? If this was designed for developers, shouldn't there be a way to step through code at least? Notebooks are great for EDA but not designed for development.
โ12-01-2022 07:52 PM
@Emily Heureuxโ have I got a fun surprise for you. We have a LOAD of new features coming out to help with development in an IDE. Can you email saad dot ansari at databricks dot com and he will get you started?
โ12-02-2022 07:45 AM
Hi @Emily Heureuxโ please get in touch, would love to get your feedback!
โ01-10-2023 09:23 PM
I am also looking for a way to debug code with local IDE based development. We have a complex and mature ML codebase of python code (.py files, not notebooks) and use debugging extensively (vscode). It is critical to be able to step through code and inspect variables etc. This simply is not possible with notebooks.
We are looking at migrating this codebase onto Databricks but we have concerns regarding IDE based development and debugging capabilities. We want to avoid the situation where we sacrifice software best practices when using Databricks.
While notebook-based development seems suitable for the Databricks example ML projects, which use existing NN architectures / feature extraction / datasets, I would argue that this type of development is almost impossible for a large team of ML researchers / engineers contributing to a mature ML codebase.
I am also confused by the recommended tooling for IDE based development. databricks-connect is not recommended (deprecated?) and it is recommended that dbx is used instead. Dbx does not offer any debugging capabilities. It also forces you to use either notebooks or python packages. Ideally we want to execute the codebase as is from the IDE on a compute cluster with debugging.
Databricks recommends that you use dbx by Databricks Labs for local development instead of Databricks Connect. See: https://docs.databricks.com/dev-tools/databricks-connect.html#databricks-connect
dbx currently doesn't provide interactive debugging capabilities. If you want to use interactive debugging, you can use Databricks Connect, and then use dbx for deployment operations. See https://dbx.readthedocs.io/en/latest/intro/?h=debug#limitations
It seems that databricks-connect is the closest option for debugging within an IDE. However, from what I understand, only the spark related commands are sent to the Databricks compute cluster while the rest of the code is executed on the local machine. (See limitations https://docs.databricks.com/dev-tools/databricks-connect.html#limitations). This hybrid approach would be challenging to develop with as the majority of our code does not use spark APIs, and there is potential environment mismatch.
Finally, there is the vscode Databricks extension https://github.com/paiqo/Databricks-VSCode. I believe this allows you to execute notebooks on compute clusters. I have tried using this and can not find any support for debugging.
Any recommendations would be greatly appreciated!
Thanks
โ01-13-2023 10:26 PM
@James Wโ you should connect with @Saad Ansariโ. He can walk you through our roadmap - I think you will like it.
โ06-20-2023 07:35 AM
Thanks for the advice.
โ03-04-2024 01:06 PM
Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.
Variable explorer can be used and pdb, but not the same really..
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group