<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic An Equivalent Implementation of XIRR in Excel within PySpark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/an-equivalent-implementation-of-xirr-in-excel-within-pyspark/m-p/120091#M46055</link>
    <description>&lt;H1 id="2035"&gt;Introduction&lt;/H1&gt;&lt;P class=""&gt;IRR is a common method used by financial personnel to evaluate the financial benefits of project investment, and its formula is as follows.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_0-1748015758470.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17088iDB95BC4BFA212A1F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_0-1748015758470.png" alt="YoyoHappy_0-1748015758470.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In excel, you can use the XIRR function to calculate the IRR of non-periodic cash flows easily.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_1-1748015783803.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17089i34FDE772B78516FD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_1-1748015783803.png" alt="YoyoHappy_1-1748015783803.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But if we want to compute IRRs for thousands of investments and portfolios, we need something like PySpark to do it, which is what I’ve been working on recently.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1 id="a518"&gt;PyXIRR&lt;/H1&gt;&lt;P&gt;&lt;SPAN&gt;In pypi, there is a special library called pyxirr that can help us calculate IRR&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# pip install pyxirr

from datetime import date
from pyxirr import xirr

dates = [
    date(2021, 1, 1),
    date(2021, 12, 31),
    date(2022, 12, 31),
    date(2023, 12, 31),
    date(2024, 12, 31),
]
cashflows = [-10, 2, 3, 4, 8]

irr = xirr(dates, cashflows)

print(f"IRR: {irr}")
# IRR: 0.19686311278122343&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;SPAN&gt;Based on this, we can write a UDF to use in PySpark, let us demonstrate it next&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.createDataFrame([
   ("p1",   date(2021, 1, 1), -10)
  ,("p1", date(2021, 12, 31), 2)
  ,("p1", date(2022, 12, 31), 3)
  ,("p1", date(2023, 12, 31), 4)
  ,("p1", date(2024, 12, 31), 8)
], ("project", "accounting_date", "cashflow"))

df.createOrReplaceTempView("demo")

def xirr_udf(dates: list, cashflows:list) -&amp;gt; float: 
  return xirr(dates, cashflows)
spark.udf.register("xirr_udf", xirr_udf)&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;Since the data type received by the UDF is an array, we need group by and collect_list&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.sql(f"""
with tmp as (
    select
        project
        ,collect_list(accounting_date) as dates
        ,collect_list(cashflow) as cashflows
    from demo
    group by 
        project
)
select
     project
    ,xirr_udf(dates, cashflows)
from tmp           
          """).display()&lt;/LI-CODE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_2-1748016013807.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17090i5E5C2BF1E5E765F6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_2-1748016013807.png" alt="YoyoHappy_2-1748016013807.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But due to some limitations of the cluster, I can’t install any package I want, I have to stick with the existing ones, like numpy, scipy, the classics.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1 id="9573"&gt;The Roots&lt;/H1&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_3-1748016041346.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17091iDAD194578444B787/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_3-1748016041346.png" alt="YoyoHappy_3-1748016041346.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Looking back at this equation, we can see that the IRR value is the solution that makes it equal to zero. So we can express it like this.&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def xnpv(rate):
  """Calculate the NPV for a given rate."""
  return sum(cf / (1 + rate) ** (day / 365.0) for cf, day in zip(cashflows, days))&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;With the expression of this equation, we can solve its root using Newton’s method. With the help of scipy.optimize&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def xirr(cashflows, dates):
    days = np.array([(date - dates[0]).days for date in dates])

    def xnpv(rate):
        """Calculate the NPV for a given rate."""
        return sum(cf / (1 + rate) ** (day / 365.0) for cf, day in zip(cashflows, days))

    try:
        return newton(func=xnpv, x0=0.1)
    except RuntimeError:
        raise ValueError("XIRR calculation did not converge. Try different initial guesses or check inputs.")&lt;/LI-CODE&gt;&lt;H1 id="c456"&gt;Bonus&lt;/H1&gt;&lt;P class=""&gt;This function is a simple power function, you can write its derivative, and if you provide a derivative for NetWon, it will calculate the solution faster.&lt;/P&gt;&lt;P class=""&gt;&lt;A href="https://en.wikipedia.org/wiki/Power_rule" target="_self"&gt;https://en.wikipedia.org/wiki/Power_rule&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import numpy as np
from scipy.optimize import newton


def xirr(cashflows, dates):
    days = np.array([(date - dates[0]).days for date in dates])

    def xnpv(rate):
        """Calculate the NPV for a given rate."""
        return sum(cf / (1 + rate) ** (day / 365.0) for cf, day in zip(cashflows, days))

    def xnpv_derivative(rate):
        """Calculate the derivative of NPV with respect to the rate."""
        return sum(-cf * (day / 365.0) / (1 + rate) ** ((day / 365.0) + 1) for cf, day in zip(cashflows, days))

    try:
        return newton(func=xnpv, x0=0.1, fprime=xnpv_derivative)
    except RuntimeError:
        raise ValueError("XIRR calculation did not converge. Try different initial guesses or check inputs.")&lt;/LI-CODE&gt;</description>
    <pubDate>Fri, 23 May 2025 16:06:35 GMT</pubDate>
    <dc:creator>YoyoHappy</dc:creator>
    <dc:date>2025-05-23T16:06:35Z</dc:date>
    <item>
      <title>An Equivalent Implementation of XIRR in Excel within PySpark</title>
      <link>https://community.databricks.com/t5/data-engineering/an-equivalent-implementation-of-xirr-in-excel-within-pyspark/m-p/120091#M46055</link>
      <description>&lt;H1 id="2035"&gt;Introduction&lt;/H1&gt;&lt;P class=""&gt;IRR is a common method used by financial personnel to evaluate the financial benefits of project investment, and its formula is as follows.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_0-1748015758470.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17088iDB95BC4BFA212A1F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_0-1748015758470.png" alt="YoyoHappy_0-1748015758470.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In excel, you can use the XIRR function to calculate the IRR of non-periodic cash flows easily.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_1-1748015783803.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17089i34FDE772B78516FD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_1-1748015783803.png" alt="YoyoHappy_1-1748015783803.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But if we want to compute IRRs for thousands of investments and portfolios, we need something like PySpark to do it, which is what I’ve been working on recently.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1 id="a518"&gt;PyXIRR&lt;/H1&gt;&lt;P&gt;&lt;SPAN&gt;In pypi, there is a special library called pyxirr that can help us calculate IRR&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# pip install pyxirr

from datetime import date
from pyxirr import xirr

dates = [
    date(2021, 1, 1),
    date(2021, 12, 31),
    date(2022, 12, 31),
    date(2023, 12, 31),
    date(2024, 12, 31),
]
cashflows = [-10, 2, 3, 4, 8]

irr = xirr(dates, cashflows)

print(f"IRR: {irr}")
# IRR: 0.19686311278122343&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;SPAN&gt;Based on this, we can write a UDF to use in PySpark, let us demonstrate it next&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.createDataFrame([
   ("p1",   date(2021, 1, 1), -10)
  ,("p1", date(2021, 12, 31), 2)
  ,("p1", date(2022, 12, 31), 3)
  ,("p1", date(2023, 12, 31), 4)
  ,("p1", date(2024, 12, 31), 8)
], ("project", "accounting_date", "cashflow"))

df.createOrReplaceTempView("demo")

def xirr_udf(dates: list, cashflows:list) -&amp;gt; float: 
  return xirr(dates, cashflows)
spark.udf.register("xirr_udf", xirr_udf)&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;Since the data type received by the UDF is an array, we need group by and collect_list&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.sql(f"""
with tmp as (
    select
        project
        ,collect_list(accounting_date) as dates
        ,collect_list(cashflow) as cashflows
    from demo
    group by 
        project
)
select
     project
    ,xirr_udf(dates, cashflows)
from tmp           
          """).display()&lt;/LI-CODE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_2-1748016013807.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17090i5E5C2BF1E5E765F6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_2-1748016013807.png" alt="YoyoHappy_2-1748016013807.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But due to some limitations of the cluster, I can’t install any package I want, I have to stick with the existing ones, like numpy, scipy, the classics.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1 id="9573"&gt;The Roots&lt;/H1&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="YoyoHappy_3-1748016041346.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17091iDAD194578444B787/image-size/medium?v=v2&amp;amp;px=400" role="button" title="YoyoHappy_3-1748016041346.png" alt="YoyoHappy_3-1748016041346.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Looking back at this equation, we can see that the IRR value is the solution that makes it equal to zero. So we can express it like this.&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def xnpv(rate):
  """Calculate the NPV for a given rate."""
  return sum(cf / (1 + rate) ** (day / 365.0) for cf, day in zip(cashflows, days))&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;With the expression of this equation, we can solve its root using Newton’s method. With the help of scipy.optimize&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def xirr(cashflows, dates):
    days = np.array([(date - dates[0]).days for date in dates])

    def xnpv(rate):
        """Calculate the NPV for a given rate."""
        return sum(cf / (1 + rate) ** (day / 365.0) for cf, day in zip(cashflows, days))

    try:
        return newton(func=xnpv, x0=0.1)
    except RuntimeError:
        raise ValueError("XIRR calculation did not converge. Try different initial guesses or check inputs.")&lt;/LI-CODE&gt;&lt;H1 id="c456"&gt;Bonus&lt;/H1&gt;&lt;P class=""&gt;This function is a simple power function, you can write its derivative, and if you provide a derivative for NetWon, it will calculate the solution faster.&lt;/P&gt;&lt;P class=""&gt;&lt;A href="https://en.wikipedia.org/wiki/Power_rule" target="_self"&gt;https://en.wikipedia.org/wiki/Power_rule&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import numpy as np
from scipy.optimize import newton


def xirr(cashflows, dates):
    days = np.array([(date - dates[0]).days for date in dates])

    def xnpv(rate):
        """Calculate the NPV for a given rate."""
        return sum(cf / (1 + rate) ** (day / 365.0) for cf, day in zip(cashflows, days))

    def xnpv_derivative(rate):
        """Calculate the derivative of NPV with respect to the rate."""
        return sum(-cf * (day / 365.0) / (1 + rate) ** ((day / 365.0) + 1) for cf, day in zip(cashflows, days))

    try:
        return newton(func=xnpv, x0=0.1, fprime=xnpv_derivative)
    except RuntimeError:
        raise ValueError("XIRR calculation did not converge. Try different initial guesses or check inputs.")&lt;/LI-CODE&gt;</description>
      <pubDate>Fri, 23 May 2025 16:06:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/an-equivalent-implementation-of-xirr-in-excel-within-pyspark/m-p/120091#M46055</guid>
      <dc:creator>YoyoHappy</dc:creator>
      <dc:date>2025-05-23T16:06:35Z</dc:date>
    </item>
  </channel>
</rss>

