cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Hello, everyone. I want to ask if there is a way to connect Databricks cluster with SSH interpreter in your IDE? I know about databricks connect but I want to execute the entire code in the cluster.

BorislavBlagoev
Valued Contributor III
1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Florent POUSSEROT​ and @Borislav Blagoev​ , In early 2022 databricks-tunnel will be available, which will run your code on databricks cloud and not directly on the cluster. There will be ready extensions for PyCharm and VS Code.

View solution in original post

36 REPLIES 36

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @Borislav Blagoev​  unfortunately it is not possible to connect to the cluster other Databricks connect.

BorislavBlagoev
Valued Contributor III

Is it possible to execute the entire code in the databricks cluster instead only the spark code?

Prabakar
Esteemed Contributor III
Esteemed Contributor III

For Spark jobs, you can use Databricks connect.

To use Python code to run SQL commands on Databricks clusters and Databricks SQL endpoints you can use the Databricks SQL Connector for Python.

BorislavBlagoev
Valued Contributor III

I want to execute Python code as well. The entire code (Spark, SQL, Python).

-werners-
Esteemed Contributor III

hm I think plain python code will run with databricks connect (if it is a python program you are writing), and spark sql can be done by spark.sql(...).

Is that what you want to do?

Only the spark code is executed in the cluster. Unfortunately!

-werners-
Esteemed Contributor III

dang, not even the spark.sql("...")?

Prabakar
Esteemed Contributor III
Esteemed Contributor III

As I mentioned earlier, only spark codes will be executed with Databricks connect. We have an internal feature request to access the Python REPL from the local IDE through DBconnect.

I don't know why but when I want to access that link I get this error: Unable to sign in I tried with the same email as here.

@Werner Stinckens​ you can execute spark.sql("...") in the cluster but I want to execute this for example:

collection = [1, 2, 3, 4, 5]
sum = 0
for x in collection:
     sum += x

stupid example!

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @Borislav Blagoev​ , you won't be able to access it. As I mentioned in the previous comment it's an internal feature request and only available for Databricks employees.

Oh, OK! I didn't understand that sorry!

hi @Borislav Blagoev​ 

Have you check the list of limitation for DB connect? docs here https://docs.databricks.com/dev-tools/databricks-connect.html#limitations

Limitations

The following Databricks features and third-party platforms are unsupported:

  1. Structured Streaming.
  2. Running arbitrary code that is not a part of a Spark job on the remote cluster.
  3. Native Scala, Python, and R APIs for Delta table operations (for example, DeltaTable.forPath) are not supported. However, the SQL API (spark.sql(...)) with Delta Lake operations and the Spark API (for example, spark.read.load) on Delta tables are both supported.

Yes, that's why I want to use something different than Databrcks Connect!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!