PySpark Lazy Evaluation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-07-2025 09:02 AM
PySpark Lazy Evaluation - Why does my logging function seem to execute without an explicit action in Databricks?
Hello everyone,
I was scrolling and found some Medium post on a PySpark (https://medium.com/@sudeepwrites/pyspark-secrets-no-one-talks-about-but-every-data-engineer-should-k...) and have a question about lazy evaluation. It is written that transformations are lazy and will not execute until an action is called (which I know).
Arctical have some code I have tried to execute, according to article it should not print 'Logging something...' ,However, It is printing.
from pyspark.sql.functions import coldef log_step(df): df = spark.sql("select * from delta.`s3://abc` limit 10") |
Even without an explicit action like .show() or .count() on the final line, the print() statement inside log_step executes and I see the output in my notebook. My understanding is that the filter is a transformation and should not trigger the code.
Can someone please explain why this is happening? Is there an implicit action being triggered by the Databricks notebook environment, or am I fundamentally misunderstanding something about lazy evaluation with functions?
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2025 09:16 AM
I don't have full access to that article, but here's something that might help clarify things!
While Spark uses lazy evaluation (meaning it waits to execute until absolutely necessary), Python works with eager evaluation. This means that when you run log_step, the Python code jumps into action and executes right away