โ03-22-2023 01:17 PM
Hi, I'm trying to create a calendar dimension including a fiscal year with a fiscal start of April 1. I'm using the fiscalyear library and am setting the start to month 4 but it insists on setting April to month 7.
runtime 12.1
My code snipet is:
start_date = '2016-01-01'
end_date = date(date.today().year, 12, 31) + relativedelta(years=5)
fiscalyear.setup_fiscal_calendar(start_month=4)
df = spark.sql(f"SELECT SEQUENCE(TO_TIMESTAMP('{start_date}'), TO_TIMESTAMP('{end_date}'), INTERVAL 1 DAY) AS DATETIME").withColumn("DATETIME", F.explode(F.col("DATETIME")))
# bulk definition of UDFs
udf_f_y = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_year) # fiscal DateTime object f_y
udf_f_m = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_month) # fiscal DateTime object f_m
udf_f_m_str = udf(lambda CELL: 'M' + str(CELL).zfill(2)) # use zfill to turn 2 into 02 and attach "M" to the front
udf_f_q = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_quarter) # fiscal DateTime object f_q
udf_f_d = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_day) # fiscal DateTime object f_d
# Fiscal Columns
df = df.withColumn("FISCAL_YEAR_INTEGER", udf_f_y(F.col("DATETIME")))
df = df.withColumn("FISCAL_QUARTER_INTEGER", udf_f_q(F.col("DATETIME")))
df = df.withColumn("FISCAL_MONTH_INTEGER", udf_f_m(F.col("DATETIME")))
df = df.withColumn("FISCAL_DAY_INTEGER", udf_f_d(F.col("DATETIME")))
df = df.withColumn("FISCAL_YEAR_STRING", F.concat_ws("", F.lit("FY"), F.col("FISCAL_YEAR_INTEGER").cast("STRING").substr(3, 4)))
df = df.withColumn("FISCAL_QUARTER_STRING", F.concat_ws("_", F.col("FISCAL_YEAR_STRING"), F.concat_ws("", F.lit( "Q" ), F.col("FISCAL_QUARTER_INTEGER")) ) )
df = df.withColumn("FISCAL_MONTH_STRING", F.concat_ws("_", F.col("FISCAL_YEAR_STRING"), udf_f_m_str(F.col("FISCAL_MONTH_INTEGER"))))
Not sure if it's something to do with using the udf and the lambda cell because I've tested it manually with the following and I properly get April as fiscal month 1:
import fiscalyear
from pyspark.sql.functions import col
from pyspark.sql.functions import to_timestamp
#### Testing fiscalyear library #####
# Set the fiscal year start month to April (month 4)
fiscalyear.START_MONTH = 4
# # Create a FiscalDateTime object representing a calendar date
df = fiscalyear.FiscalDateTime(2023, 4, 1)
# Convert the calendar date to a fiscal year
fiscal_year = df.fiscal_year
fiscal_month = df.fiscal_month
print(fiscal_year, fiscal_month)
2024 1
Can anyone offer any insights? Thank you for looking.
โ03-22-2023 09:41 PM
@Jason Johnsonโ :
It seems that the fiscalyear library you are using is actually setting the start month to July (month 7) instead of April (month 4) as you intended. This might be a bug or a known issue with the library.
To work around this, you can manually adjust the month of your date inputs to compensate for the offset. For example, you can subtract 3 from the month of your start date to shift it back to April:
start_date = '2016-01-01'
start_date = f'{start_date[:5]}{int(start_date[5:7])-3:02d}{start_date[7:]}'
# start_date is now '2016-04-01'
Similarly, you can subtract 3 from the month of your fiscal year calculation to get the correct fiscal month:
udf_f_m = udf(lambda CELL: fiscalyear.FiscalDateTime(CELL.year, CELL.month, CELL.day).fiscal_month - 3)
With these adjustments, your code should correctly generate a calendar dimension with a fiscal year starting in April.
โ03-23-2023 07:06 AM
Thanks for the reply. Not sure why I hadn't thought of doing this. I'll add these changes in and see how it turns out.
โ03-26-2023 10:15 PM
Hi @Jason Johnsonโ
Thank you for posting your question in our community! We are happy to assist you.
Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!
โ01-25-2024 07:53 AM - edited โ01-25-2024 07:56 AM
import fiscalyear
import datetime
def get_fiscal_date(year,month,day):
fiscalyear.setup_fiscal_calendar(start_month=4)
v_fiscal_month=fiscalyear.FiscalDateTime(year, month, day).fiscal_month #To get the Fiscal Month
v_fiscal_quarter=fiscalyear.FiscalDateTime(year, month, day).fiscal_quarter #To get the Fiscal Month
v_fiscal_year=fiscalyear.FiscalDateTime(year, month, day).fiscal_year #To get the Fiscal Month
return v_fiscal_month,v_fiscal_quarter,v_fiscal_year
from pyspark.sql.functions import udf,col
from pyspark.sql.types import ArrayType, IntegerType,StringType
# Define the UDF with return type of ArrayType containing three IntegerTypes
get_fiscal_date_udf = udf(lambda year, month, day: get_fiscal_date(year, month, day), ArrayType(StringType()))
# Register the UDF with Spark SQL
spark.udf.register("get_fiscal_date_udf", get_fiscal_date,(ArrayType(StringType())))
%sql
select get_fiscal_date_udf(2024,4,24), get_fiscal_date_udf(2023,12,24), get_fiscal_date_udf(2024,1,24), get_fiscal_date_udf(2024,8,24)
In My case, April is the Financial start month, so in UDF function, we have to set
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group