Run pyspark queries from outside databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 08:49 AM - edited 09-03-2024 08:51 AM
I have written a Notebook which would execute pyspark query. I then execute it remotely from outside databricks environment using /api/2.1/jobs/run-now, which would then run the notebook. I also want to retrieve the results from this job execution. How should I do that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 09:04 AM - edited 09-03-2024 09:07 AM
Hi @SowmyaDesai ,
For this use case it's much better to use statement execution API which gives you ability to run SQL statement and fetch the results:
https://docs.databricks.com/api/workspace/statementexecution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 10:11 PM
Thanks for responding. I did go through this link. It talks about executing on SQL warehouse though. Is there a way we can execute queries on Databricks clusters instead?
Databricks has this connector for SQL https://docs.databricks.com/en/dev-tools/python-sql-connector.html , it supports SQL queries. But I do not see easier option for supporting pyspark queries. Any idea of how to do that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 10:51 PM
Hi @SowmyaDesai ,
So if you want to run queries outside databricks you can use Databricks Connect. Databricks Connect allows you to connect popular IDEs such as Visual Studio Code, PyCharm, RStudio Desktop, IntelliJ IDEA, notebook servers, and other custom applications to Databricks compute:
Databricks Connect for Python | Databricks on AWS

