I am trying to pass a column of data from python/pandas to Spark, then run AI_QUERY. However, when I attempt to pass modelParameters (such as temperature), the function fails. Below is a minimal example:
import pandas as pd
queries = pd.DataFrame([
{"request": """{"messages": [{"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Write a short haiku about coffee."}]}"""}
])
# Convert Pandas DataFrame to Spark DataFrame
queries_spark = spark.createDataFrame(queries)
# Create or replace a temporary view
queries_spark.createOrReplaceTempView("queries_view")
model = 'openai-gpt-4o-mini'
temp = 0.2
# Execute the SQL query
spark.sql(f"""
CREATE OR REPLACE TEMP VIEW responses_view AS
SELECT
request,
AI_QUERY(
endpoint => '{model}',
request => request,
returnType => 'STRING',
modelParameters => named_struct(
'temperature', {temp}
)
) as response
FROM queries_view
""")
# Load the data back into Python
responses_df = spark.table("responses_view")
display(responses_df)
This code results in the following error:
[UNRECOGNIZED_PARAMETER_NAME] Cannot invoke function `ai_query` because the function call included a named argument reference for the argument named `modelParameters`, but this function does not include any signature containing an argument with this name. Did you mean one of the following? [`returnType` `endpoint` `request`]. SQLSTATE: 4274K
File <command-384165507682497>, line 32 16 spark.sql(f""" 17 CREATE OR REPLACE TEMP VIEW responses_view AS 18 SELECT (...) 28 FROM queries_view 29 """) 31 # Load the data back into Python ---> 32 responses_df = spark.table("responses_view") 33 display(responses_df)