cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

spark conf for serveless jobs

seefoods
Valued Contributor

Hello Guys, 

I use serveless on databricks Azure, so i have build a decorator which instanciate a SparkSession. My job use autolaoder / kafka using mode availableNow. Someone Knows which spark conf is required beacause i want to add it  ? 

Thanx 

import logging

from databricks.sdk import WorkspaceClient

from pyspark.sql import SparkSession
from pyspark.sql import DataFrame
from pyspark.sql.functions import col, udf

from pydantic import ValidationError

from typing import Callable
from functools import wraps




def get_spark_session():
"""

:return:
"""
return SparkSession.builder.getOrCreate()


def provide_spark_session(function):
""""
:param function:
:return: """

def wrapper(*args, **kwargs):
return function(spark=get_spark_session(), *args, **kwargs)

return wrapper


def get_workspace_session():
"""
:return: WorkspaceClient instance
"""
return WorkspaceClient()


def provide_workspace_session(function):
"""
Decorator to provide a Databricks WorkspaceClient session
:param function: Function to decorate
:return: Decorated function
"""

@wraps(function)
def wrapper(*args, **kwargs):
return function(workspace_session=get_workspace_session(), *args, **kwargs)

return wrapper
1 REPLY 1

Saritha_S
Databricks Employee
Databricks Employee

Hi @seefoods 

Please find below my findings for your case.

You don’t need (and can’t meaningfully add) any Spark conf to enable availableNow on Databricks Serverless.

Let me explain clearly, and then show what is safe to do in your decorator.

availableNow is not a Spark conf

For Auto Loader and Kafka, availableNow is a Structured Streaming trigger, not a Spark configuration.Correct usage (this is the only place it belongs):

(spark.readStream.format("cloudFiles")              # or "kafka"

 .option("cloudFiles.format", "json")

.option("cloudFiles.schemaLocation", schema_path)

 .load(input_path)

  .writeStream

  .trigger(availableNow=True)

   .option("checkpointLocation", checkpoint)

   .start(output_path)

)

There is no spark.conf.set(...) required (or available) for availableNow.

Why Spark conf won’t work on Serverless

On Databricks Serverless :

  • SparkSession is already created
  • Executor / driver / streaming engine configs are locked
  • SparkSession.builder.getOrCreate():
    • does not create a new session
    • silently ignores most configs

So adding something like:

SparkSession.builder \

  .config("spark.sql.streaming.availableNow", "true")  # does not exist

  .getOrCreate()