cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Debugging!

MCosta
New Contributor III

Hi ML folks,

We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland.

Debugging in Databricks is awkward. We ended up doing all the development on local machines and when the code is "mature" we start playing with Databricks + MLFlow to train the model. We use Azure not only for databricks, but also for data. However, we ended up having a "security hole" with this approach. IT staff want to remove this type of permission from local machines, which will create a challenge for the ML team...

Google search didn't do much to find a good reference that works as a guideline for doing "complex" python-based model development and training in databricks. Any suggestion is welcome!

Cheers

1 ACCEPTED SOLUTION

Accepted Solutions

BilalAslamDbrx
Honored Contributor III

Hello @MCosta​  thanks for posting this question. We are actively looking into how to make this a better experience for you. Can you please drop me a line at bilal dot aslam at databricks dot com and I will be happy to set up a call.

View solution in original post

11 REPLIES 11

Kaniz_Fatma
Community Manager
Community Manager

Hi @ MCosta! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your question first. Or else I will follow up shortly with a response.

Dan_Z
Honored Contributor

Unfortunately right now we are limited to the Notebooks for Python code. We are looking into other options like having hosted IDEs, but there is no release date yet. I would suggest for now trying to use an IDE on local as you have been doing and then syncing to the notebooks using the Repos gIt functionality. I hope you will be bale to develop locally!

BilalAslamDbrx
Honored Contributor III

Hello @MCosta​  thanks for posting this question. We are actively looking into how to make this a better experience for you. Can you please drop me a line at bilal dot aslam at databricks dot com and I will be happy to set up a call.

Vitaliy
New Contributor II

@MCosta​ , if you are using python, I assume that you have a case where you have a complex udf and you want to understand what happens in every line there, perhaps even in complex call stacks. Is that correct?

emily1
New Contributor III

I also have questions about debugging, since I don't have the option of using my local machine due to data security. I am very shocked as a seasoned developer to learn that there is no proper IDE or even debugging tools in Databricks, just notebooks. Is this really the case, or am I missing something? If this was designed for developers, shouldn't there be a way to step through code at least? Notebooks are great for EDA but not designed for development.

BilalAslamDbrx
Honored Contributor III

@Emily Heureux​ have I got a fun surprise for you. We have a LOAD of new features coming out to help with development in an IDE. Can you email saad dot ansari at databricks dot com and he will get you started?

saadansari-db
New Contributor III

Hi @Emily Heureux​ please get in touch, would love to get your feedback!

jamesw
New Contributor II

I am also looking for a way to debug code with local IDE based development. We have a complex and mature ML codebase of python code (.py files, not notebooks) and use debugging extensively (vscode). It is critical to be able to step through code and inspect variables etc. This simply is not possible with notebooks.

We are looking at migrating this codebase onto Databricks but we have concerns regarding IDE based development and debugging capabilities. We want to avoid the situation where we sacrifice software best practices when using Databricks.

While notebook-based development seems suitable for the Databricks example ML projects, which use existing NN architectures / feature extraction / datasets, I would argue that this type of development is almost impossible for a large team of ML researchers / engineers contributing to a mature ML codebase.

I am also confused by the recommended tooling for IDE based development. databricks-connect is not recommended (deprecated?) and it is recommended that dbx is used instead. Dbx does not offer any debugging capabilities. It also forces you to use either notebooks or python packages. Ideally we want to execute the codebase as is from the IDE on a compute cluster with debugging.

Databricks recommends that you use dbx by Databricks Labs for local development instead of Databricks Connect. See: https://docs.databricks.com/dev-tools/databricks-connect.html#databricks-connect

dbx currently doesn't provide interactive debugging capabilities. If you want to use interactive debugging, you can use Databricks Connect, and then use dbx for deployment operations. See https://dbx.readthedocs.io/en/latest/intro/?h=debug#limitations

It seems that databricks-connect is the closest option for debugging within an IDE. However, from what I understand, only the spark related commands are sent to the Databricks compute cluster while the rest of the code is executed on the local machine. (See limitations https://docs.databricks.com/dev-tools/databricks-connect.html#limitations). This hybrid approach would be challenging to develop with as the majority of our code does not use spark APIs, and there is potential environment mismatch.

Finally, there is the vscode Databricks extension https://github.com/paiqo/Databricks-VSCode. I believe this allows you to execute notebooks on compute clusters. I have tried using this and can not find any support for debugging.

Any recommendations would be greatly appreciated!

Thanks

BilalAslamDbrx
Honored Contributor III

@James W​ you should connect with @Saad Ansari​. He can walk you through our roadmap - I think you will like it.

Thanks for the advice.

petern
New Contributor II

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.

Variable explorer can be used and pdb, but not the same really..

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group