cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

VS Code Python file execution

nefflev1
New Contributor

Hi Everyone,

I'm using the Databricks VS Code Extension to develop and deploy Asset Bundles. Usually we work with Notebooks and use the "Run File as Workflow" function. Now I'm trying to use pure python file for a new use case and tried to use the "Upload and Run File" function in the VS Code Extension, however I get a rather quick success message saying the Job is done, without the job actually executing the code (at least so it seems). 

In the log4j output I see, that the REPL session is stopped after a few miliseconds (even if I add a sleeper for 10s).

The execution on the platform directly works as expected and the databricks connect execution works as well. The workspace we are using is deployed in our VNET in Azure as has external access disabled, however, i'm in a peered network.

How can I get this to work?

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

You're encountering a common issue when using the Databricks VS Code Extension's "Upload and Run File" with pure Python files, especially in a secure, VNet-injected Azure Databricks deployment. Hereโ€™s a direct summary of whatโ€™s happening and how you can troubleshoot and resolve it:

Issue Summary

  • "Upload and Run File" gives quick job success with no output: This is typically a sign that the Python file wasn't actually executed, or that the session was never properly established between VS Code and the Databricks cluster.

  • log4j shows immediate REPL session stop: This suggests that the interpreter process is starting and immediately stoppingโ€”implying a misconfiguration, communication issue, or security/networking block.

  • Works via Databricks workspace and Databricks Connect: Indicates the Python code and cluster config are functional, but likely the VS Code extension has trouble interacting through your networking setup.

Likely Causes

  • Network Restrictions: VNet-injected workspaces with external access disabled limit certain inbound/outbound traffic, which the VS Code extension may rely on to upload, start sessions, or monitor execution.

  • Peered Network Limitations: Even with a peered connection, certain required ports or endpoints may not be accessible between your workstation and the Databricks control plane or cluster nodes.

  • VS Code Extension Limitations: The extension's "Upload and Run File" has slightly different mechanics than running notebooks or Databricks Connectโ€”it may not be optimized/full-featured for heavy-restriction networks.

Steps to Diagnose and Fix

1. Check Workspace Connectivity Mode

  • Ensure your workspace connectivity mode is "No Public IP (VNet-injected, secure cluster connectivity)".

  • If so, confirm your subnet's NSG (Network Security Group) allows outbound communication to Databricks control plane (over port 443), and that peering allows your workstation to reach the cluster's private IPs.

2. Extension Logs and Debugging

  • Open "Output" and "Databricks" output logs in VS Code to check for error messages.

  • Check network, authentication, or session errors specifically.

3. Cluster Library/Environment Issues

  • Ensure your cluster is configured for "Single User" or "Shared" access mode properly.

  • Make sure the Python environment on the cluster matches what your code expects.

4. Test Simpler Code

  • Try running a very basic Python script (print("hello world")) to eliminate issues with the script itself.

  • Add explicit logging or output to confirm execution.

5. Manual Upload and Run (as a workaround)

  • Upload the Python file manually through the Databricks workspace UI and run it as a job.

  • Compare behaviorโ€”if it works, then the issue is isolated to the VS Code extensionโ€™s mechanism.

6. Databricks CLI & Connect as Alternatives

  • Use the Databricks CLI or Databricks Connect for command-line or programmatic job uploads, which might bypass some VS Code-specific issues.

Networking and Security Considerations

  • Extension needs:

    • Outbound HTTPS access for API calls.

    • Possibly access to storage endpoints (for asset bundle upload).

  • With no external access: The Databricks extension may fail to negotiate an execution session via the control plane, even if the browser/Databricks Connect work due to different routing.

Recommendations

  • Check Databricks Extension Requirements: Refer to the Databricks VS Code Extension documentation for networking prerequisites.

  • Consult Azure Network/Databricks Admin: To confirm necessary routes and NSG rules are in place.

  • Raise a Databricks Support Ticket: If all seems configured correctly, raise a ticket citing your setup and observed behaviorโ€”they may have visibility into ongoing limitations or bugs with the extension in highly secure network setups.


In summary:
Your code and cluster seem fine, but the Databricks VS Code Extension likely can't establish the proper session for file execution due to network restrictions in your VNet-injected setup. Manual upload methods or Databricks Connect/CLI are currently more reliable for secure environments, unless the required network routing can be confirmed for the extension.


If additional logs or error messages show anything specific in the VS Code "Output" panel, sharing them may help pinpoint the exact handshake or API call that's failing.