Hi, I'm trying to create a calendar dimension including a fiscal year with a fiscal start of April 1. I'm using the fiscalyear library and am setting the start to month 4 but it insists on setting April to month 7.
runtime 12.1
My code snipet is:
start_date = '2016-01-01'
end_date = date(date.today().year, 12, 31) + relativedelta(years=5)
fiscalyear.setup_fiscal_calendar(start_month=4)
df = spark.sql(f"SELECT SEQUENCE(TO_TIMESTAMP('{start_date}'), TO_TIMESTAMP('{end_date}'), INTERVAL 1 DAY) AS DATETIME").withColumn("DATETIME", F.explode(F.col("DATETIME")))
# bulk definition of UDFs
udf_f_y = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_year) # fiscal DateTime object f_y
udf_f_m = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_month) # fiscal DateTime object f_m
udf_f_m_str = udf(lambda CELL: 'M' + str(CELL).zfill(2)) # use zfill to turn 2 into 02 and attach "M" to the front
udf_f_q = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_quarter) # fiscal DateTime object f_q
udf_f_d = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_day) # fiscal DateTime object f_d
# Fiscal Columns
df = df.withColumn("FISCAL_YEAR_INTEGER", udf_f_y(F.col("DATETIME")))
df = df.withColumn("FISCAL_QUARTER_INTEGER", udf_f_q(F.col("DATETIME")))
df = df.withColumn("FISCAL_MONTH_INTEGER", udf_f_m(F.col("DATETIME")))
df = df.withColumn("FISCAL_DAY_INTEGER", udf_f_d(F.col("DATETIME")))
df = df.withColumn("FISCAL_YEAR_STRING", F.concat_ws("", F.lit("FY"), F.col("FISCAL_YEAR_INTEGER").cast("STRING").substr(3, 4)))
df = df.withColumn("FISCAL_QUARTER_STRING", F.concat_ws("_", F.col("FISCAL_YEAR_STRING"), F.concat_ws("", F.lit( "Q" ), F.col("FISCAL_QUARTER_INTEGER")) ) )
df = df.withColumn("FISCAL_MONTH_STRING", F.concat_ws("_", F.col("FISCAL_YEAR_STRING"), udf_f_m_str(F.col("FISCAL_MONTH_INTEGER"))))
Not sure if it's something to do with using the udf and the lambda cell because I've tested it manually with the following and I properly get April as fiscal month 1:
import fiscalyear
from pyspark.sql.functions import col
from pyspark.sql.functions import to_timestamp
#### Testing fiscalyear library #####
# Set the fiscal year start month to April (month 4)
fiscalyear.START_MONTH = 4
# # Create a FiscalDateTime object representing a calendar date
df = fiscalyear.FiscalDateTime(2023, 4, 1)
# Convert the calendar date to a fiscal year
fiscal_year = df.fiscal_year
fiscal_month = df.fiscal_month
print(fiscal_year, fiscal_month)
2024 1
Can anyone offer any insights? Thank you for looking.