Hi ,
I am trying to create a SQL UDF and I am trying to run some python code involving pyspark, I am not able to create a spark session inside the python section of the function, here is how my code looks,
CREATE OR REPLACE FUNCTION test.getValuesFromTable(field1 INT,field2 INT)
RETURNS Map<STRING,ARRAY<STRING>>
LANGUAGE PYTHON
AS $$
from pyspark.sql.functions import col
import numpy as np
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("sample").getOrCreate()
def getqueryList():
query1 = "select distinct(value1) from test.table"
query2 ="select distinct(value2) from test.table"
query_list = [query1,query2]
return query_list
def executeQuery(field1,field2,query):
query = query + " where field1 = {} and field2 = {}".format(field1,field2)
return spark.sql(query)
def getValues(field1,field2):
result_dict = {}
result_list = []
df_list = [executeQuery(field1,field2,query) for query in getqueryList()]
for df in df_list:
fieldName = df.schema.names[0]
result_list = [row[0] for row in df.select(fieldName).collect()]
result_dict[fieldName] = np.array(result_list)
return result_dict
return getValues(field1,field2)
$$
when I try to execute the function I am not able to invoke a spark session,
SystemExit: -1
== Stacktrace ==
File "<udfbody>", line 6, in main
spark = SparkSession.builder.appName("sample").getOrCreate()
File "/databricks/spark/python/pyspark/sql/session.py", line 562, in getOrCreate
else SparkContext.getOrCreate(sparkConf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/core/context.py", line 574, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/databricks/spark/python/pyspark/core/context.py", line 206, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/databricks/spark/python/pyspark/core/context.py", line 495, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/java_gateway.py", line 63, in launch_gateway
SPARK_HOME = _find_spark_home()
Is there a way to run spark sql inside the SQL UDF?