cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using UDF in an insert command

Dp15
Contributor

Hi,
I am trying to use a UDF to get the last day of the month and use the boolean result of the function in an insert command. Please find herewith the function and the my query.

function:

import calendar
from datetime import datetime, date, timedelta


def getMonthEndDate(curr_dt: int😞
    curr_dt = datetime.strptime(str(curr_dt),'%Y%m%d').date()
    print("Provided date is: ",curr_dt)
    me_date=getLastBusinessDateInMonth(curr_dt.year,curr_dt.month)
    print("Last Business Day of the Month is: ",me_date)
    if me_date == curr_dt:
        return True
    else:
        return False

def getLastBusinessDateInMonth(year: int, month: int) -> int:
    return date(year, month, max(calendar.monthcalendar(year, month)[-1][:5]))

spark.udf.register("isMonthEnd1", getMonthEndDate)
 %sql
INSERT INTO test.metadata(id, reporting_date, run_id, run_type, batch_type, is_latest, create_date, create_time,is_month_end)
VALUES(11, 20240409, 00, 0, 'Test', 1, NULL, NULL,isMonthEnd1(20240409));

when I execute this notebook I am getting this following error
Could someone please help?

AnalysisException: [INVALID_INLINE_TABLE.CANNOT_EVALUATE_EXPRESSION_IN_INLINE_TABLE] Invalid inline table. Cannot evaluate the expression "isMonthEnd1(20240409)" in inline table definition.

 

 

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Dp15The error you’re encountering is related to using a UDF (User-Defined Function) in an INSERT command.

  • The error message indicates that the expression "isMonthEnd1(20240409)" cannot be evaluated within the inline table definition.
  • This means that the UDF call is not allowed in the specific context of your INSERT statement.
    • The issue arises because the UDF is being used directly in the VALUES clause of the INSERT statement, which is not supported.
    • To work around this, you can create a temporary view or subquery that computes the UDF result before using it in the INSERT statement.
      • Instead of directly using isMonthEnd1(20240409) in the INSERT statement, create a subquery or view that calculates the UDF result first.
      • Then use the result from the subquery or view in the INSERT statement.

Here’s an example of how you can modify your query:

-- Create a temporary view with the UDF result
CREATE OR REPLACE TEMPORARY VIEW temp_view AS
SELECT
    id,
    reporting_date,
    run_id,
    run_type,
    batch_type,
    is_latest,
    NULL AS create_date,
    NULL AS create_time,
    isMonthEnd1(reporting_date) AS is_month_end
FROM
    (SELECT 11 AS id, 20240409 AS reporting_date, 00 AS run_id, 0 AS run_type, 'Test' AS batch_type, 1 AS is_latest);

-- Use the temporary view in the INSERT statement
INSERT INTO test.metadata
SELECT * FROM temp_view;
  1. Explanation:
    • In the modified approach, we create a temporary view called temp_view.
    • The view calculates the UDF result for each row using isMonthEnd1(reporting_date).
    • Then we use the view in the INSERT statement to insert the computed values into the test.metadata table.

Remember to adjust the column names and data types according to your actual schema. This approach should resolve the error you’re encountering. If you have any further questions, feel free to ask! 😊

 

Dp15
Contributor

Thank you @Kaniz_Fatma for your detailed explanation

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group