cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Query Serverless SQL Warehouse from Spark Submit Job

akhileshp
New Contributor III

I am trying to load data from a table in SQL warehouse using spark.sql("SELECT * FROM <table>") in a spark submit job, but the job is failing with [TABLE_OR_VIEW_NOT_FOUND] The table or view . The same statement is working in notebook but not in a job. How to query the SQL warehouse from the spark job.

6 REPLIES 6

Kaniz
Community Manager
Community Manager

Hi @akhileshpWhen querying a SQL warehouse from a Spark job, there are a few considerations to keep in mind.

Letโ€™s break down the steps to address your issue:

  1. Databricks SQL Connector:

    • In your notebook, you successfully queried the data using the Databricks SQL Connector. However, when transitioning to a Spark job, you encountered issues.
    • The Databricks SQL Connector allows you to execute SQL queries against your SQL Data Warehouse (or other supported databases) directly from Python code.
    • To translate your existing Databricks SQL Connector code to PySpark, follow the steps below.
  2. PySpark Approach:

    • First, ensure that you have the necessary dependencies. Youโ€™ve already included some libraries, but letโ€™s refine your PySpark code.
    • Hereโ€™s an example of how to query your SQL warehouse using PySpark:
#!/usr/bin/env python3
from pyspark.sql import SparkSession

# Set up the SparkSession
spark = SparkSession.builder \
    .appName("SQL Warehouse Query") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.execution.arrow.pyspark.enabled", "true") \
    .getOrCreate()

# Specify your SQL warehouse connection details
server_hostname = "adb-xxx.azuredatabricks.net"
http_path = "/sql/1.0/warehouses/xxx"
access_token = "xxx"

# Read data from your SQL warehouse table
table_name = "democatalog.users.people"
df = spark.read \
    .format("jdbc") \
    .option("url", f"jdbc:sqlserver://{server_hostname}{http_path}") \
    .option("dbtable", table_name) \
    .option("user", access_token) \
    .option("password", "") \
    .load()

# Show the first few rows
df.show()
  1. Adjustments:

    • Replace the placeholders (server_hostname, http_path, and access_token) with your actual connection details.
    • Modify the table_name to match the table you want to query.
    • Ensure that the necessary JDBC driver for your SQL warehouse is available in your Spark environment.
  2. GitHub Examples:

Remember to adapt the code to your specific environment and SQL warehouse configuration. If you encounter any further issues, feel free to ask for additional assistance! ๐Ÿš€

 

akhileshp
New Contributor III

I am running the spark job in databricks, so it is running in databricks runtime itself then why is the behaviour different with the notebook and a spark submit job

Wojciech_BUK
Contributor III

Hey,
Is your table in Unity Catalog or in Hive Metastore ?
just my guess(es):

- use full table path Catalog_name.Schema_Name.Table_name (maybe you had some context saved when querying data interactively. 

- if your table is in UC - is your Job Cluster running in UC enabled mode (single / shared) ? 

From your post it is bit uncleare where and what you are running. 

The table is in hive_metastore and I am using the full path.

The job compute is created using an instance pool

Wojciech_BUK
Contributor III

- when you query table manually and running job - do both those actions happens in same Databricks Workspace
- what is job configuration - who is job Owner or Run As Account -> do this principal/persona has access to the table ?

Yes, Both are in the same workspace and querying same table

Yes the principle has all the access.

Also its working with spark python job, the problem is with spark submit job only.

My usecase is supported only in submit job since I am having multiple scripts in a zip file

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.