cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Run pyspark queries from outside databricks

SowmyaDesai
New Contributor II

I have written a Notebook which would execute pyspark query. I then execute it remotely from outside databricks environment using /api/2.1/jobs/run-now, which would then run the notebook. I also want to retrieve the results from this job execution. How should I do that? 

response = requests.post(
    f"{DATABRICKS_INSTANCE}/api/2.1/jobs/run-now",
    headers={"Authorization": f"Bearer {API_TOKEN}"},
    json={
        "job_id": JOB_ID,
        "notebook_params": {
            "query": SQL_QUERY1
        }
    }
)
 
Notebook which would run pyspark
    dbutils.widgets.text("query", "")
    query = dbutils.widgets.get("query")

    # Execute the query
    spark = SparkSession.builder.getOrCreate()
    df = spark.sql(query)
    df.show()
    # Return a value from the notebook
    #dbutils.notebook.exit('hello!')
    #return('Hello')
3 REPLIES 3

szymon_dybczak
Contributor III

Hi @SowmyaDesai ,

For this use case it's much better to use statement execution API which gives you ability to run SQL statement and fetch the results:

https://docs.databricks.com/api/workspace/statementexecution

SowmyaDesai
New Contributor II

Thanks for responding. I did go through this link. It talks about executing on SQL warehouse though. Is there a way we can execute queries on Databricks clusters instead?

Databricks has this connector for SQL https://docs.databricks.com/en/dev-tools/python-sql-connector.html , it supports SQL queries. But I do not see easier option for supporting pyspark queries. Any idea of how to do that?

spark = SparkSession.builder.getOrCreate()
df = spark.sql(query)

Hi @SowmyaDesai ,

So if you want to run queries outside databricks you can use Databricks Connect. Databricks Connect allows you to connect popular IDEs such as Visual Studio Code, PyCharm, RStudio Desktop, IntelliJ IDEA, notebook servers, and other custom applications to Databricks compute:

Databricks Connect for Python | Databricks on AWS

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group