<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic PySpark Lazy Evaluation in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pyspark-lazy-evaluation/m-p/127693#M48057</link>
    <description>&lt;P&gt;PySpark Lazy Evaluation - Why does my logging function seem to execute without an explicit action in Databricks?&lt;/P&gt;&lt;P&gt;Hello everyone,&lt;BR /&gt;I was scrolling and found some Medium post on a PySpark (&lt;A href="https://medium.com/@sudeepwrites/pyspark-secrets-no-one-talks-about-but-every-data-engineer-should-know-876a0202864a" target="_blank"&gt;https://medium.com/@sudeepwrites/pyspark-secrets-no-one-talks-about-but-every-data-engineer-should-know-876a0202864a&lt;/A&gt;) and have a question about lazy evaluation. It is written that transformations are lazy and will not execute until an action is called (which I know).&lt;BR /&gt;Arctical have some code I have tried to execute, according to article it should not print &lt;STRONG&gt;'Logging something...'&lt;/STRONG&gt; ,&lt;STRONG&gt;However, It is printing.&lt;/STRONG&gt;&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="100%"&gt;&lt;P&gt;from pyspark.sql.functions import coldef log_step(df):&lt;BR /&gt;print("Logging something...")&lt;BR /&gt;return df&lt;/P&gt;&lt;P&gt;df = spark.sql("select * from delta.`s3://abc` limit 10")&lt;BR /&gt;log_step(df.filter(col("flag_source_system_delete") == "yes"))&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;BR /&gt;Even without an explicit action like .show() or .count() on the final line, the print() statement inside log_step executes and I see the output in my notebook. My understanding is that the filter is a transformation and should not trigger the code.&lt;BR /&gt;Can someone please explain why this is happening? Is there an implicit action being triggered by the &lt;STRONG&gt;Databricks notebook environment, or am I fundamentally misunderstanding something about lazy evaluation with functions&lt;/STRONG&gt;?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
    <pubDate>Thu, 07 Aug 2025 16:02:42 GMT</pubDate>
    <dc:creator>joggiri</dc:creator>
    <dc:date>2025-08-07T16:02:42Z</dc:date>
    <item>
      <title>PySpark Lazy Evaluation</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-lazy-evaluation/m-p/127693#M48057</link>
      <description>&lt;P&gt;PySpark Lazy Evaluation - Why does my logging function seem to execute without an explicit action in Databricks?&lt;/P&gt;&lt;P&gt;Hello everyone,&lt;BR /&gt;I was scrolling and found some Medium post on a PySpark (&lt;A href="https://medium.com/@sudeepwrites/pyspark-secrets-no-one-talks-about-but-every-data-engineer-should-know-876a0202864a" target="_blank"&gt;https://medium.com/@sudeepwrites/pyspark-secrets-no-one-talks-about-but-every-data-engineer-should-know-876a0202864a&lt;/A&gt;) and have a question about lazy evaluation. It is written that transformations are lazy and will not execute until an action is called (which I know).&lt;BR /&gt;Arctical have some code I have tried to execute, according to article it should not print &lt;STRONG&gt;'Logging something...'&lt;/STRONG&gt; ,&lt;STRONG&gt;However, It is printing.&lt;/STRONG&gt;&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="100%"&gt;&lt;P&gt;from pyspark.sql.functions import coldef log_step(df):&lt;BR /&gt;print("Logging something...")&lt;BR /&gt;return df&lt;/P&gt;&lt;P&gt;df = spark.sql("select * from delta.`s3://abc` limit 10")&lt;BR /&gt;log_step(df.filter(col("flag_source_system_delete") == "yes"))&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;BR /&gt;Even without an explicit action like .show() or .count() on the final line, the print() statement inside log_step executes and I see the output in my notebook. My understanding is that the filter is a transformation and should not trigger the code.&lt;BR /&gt;Can someone please explain why this is happening? Is there an implicit action being triggered by the &lt;STRONG&gt;Databricks notebook environment, or am I fundamentally misunderstanding something about lazy evaluation with functions&lt;/STRONG&gt;?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Thu, 07 Aug 2025 16:02:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-lazy-evaluation/m-p/127693#M48057</guid>
      <dc:creator>joggiri</dc:creator>
      <dc:date>2025-08-07T16:02:42Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark Lazy Evaluation</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-lazy-evaluation/m-p/127833#M48098</link>
      <description>&lt;P&gt;I don't have full access to that article, but here's something that might help clarify things! &lt;/P&gt;
&lt;P&gt;While Spark uses lazy evaluation (meaning it waits to execute until absolutely necessary), Python works with eager evaluation. This means that when you run &lt;EM&gt;log_step&lt;/EM&gt;, the Python code jumps into action and executes right away&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2025 16:16:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-lazy-evaluation/m-p/127833#M48098</guid>
      <dc:creator>cgrant</dc:creator>
      <dc:date>2025-08-08T16:16:31Z</dc:date>
    </item>
  </channel>
</rss>

