cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

unable to perform modifications on Table while Using Python UDF in query

doremon11
New Contributor

Here, we're trying to use the Python UDF inside the query.

  • taking the table as function input 
  • converting the table into dataframe 
  • performing modification 
  • converting the dataframe into table 
  • returning the table  

How can we create spark context inside UDF in the  query 

 

 

CREATE FUNCTION fun1(input_table TABLE) RETURNS TABLE
LANGUAGE PYTHON
AS $$
  import pandas as pd
  
  df = spark.sql(f"SELECT * FROM {input_table}")
  def fun(df):
      # Convert table to DataFrame
      df.write.saveAsTable("my_table")
      return my_table
  return fun(input_table)
$$;

 

 

 

 

1 REPLY 1

Vidhi_Khaitan
Databricks Employee
Databricks Employee

Hi team,
I believe you cannot create or access a SparkSession or run Spark operations like spark.sql() directly inside a Python UDF. input_table is a table argument, not a string with a table name. You receive it as a pandas DataFrame when using RETURNS TABLE

You need to define your logic outside SQL in a notebook and use regular Spark APIs:

def process_table(table_name):
df = spark.table(table_name) # Transform using Spark DataFrame APIs df = df.withColumn("new_col", df["existing_col"]+2) df.write.mode("overwrite").saveAsTable("processed_table")

Then call process_table("my_table") in your notebook or job. Hope this helps!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now