Databricks Community

doremon11 · ‎03-07-2024

Here, we're trying to use the Python UDF inside the query.

taking the table as function input
converting the table into dataframe
performing modification
converting the dataframe into table
returning the table

How can we create spark context inside UDF in the query

CREATE FUNCTION fun1(input_table TABLE) RETURNS TABLE
LANGUAGE PYTHON
AS $$
  import pandas as pd
  
  df = spark.sql(f"SELECT * FROM {input_table}")
  def fun(df):
      # Convert table to DataFrame
      df.write.saveAsTable("my_table")
      return my_table
  return fun(input_table)
$$;

Vidhi_Khaitan · ‎06-03-2025

Hi team,
I believe you cannot create or access a SparkSession or run Spark operations like spark.sql() directly inside a Python UDF. input_table is a table argument, not a string with a table name. You receive it as a pandas DataFrame when using RETURNS TABLE

You need to define your logic outside SQL in a notebook and use regular Spark APIs: