Databricks Community

Christine · ‎05-24-2022

Hi,

I am facing a problem that I hope to get some help to understand.

I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to the table.

When I run the code without saving the data, it is possible to display the dataframe (display(dataframe)), however, after appending the data from the dataframe to the delta table, a new run of display(dataframe) suddently indicates that the dataframe is empty. Can somebody help me understand why the dataframe is displayed as empty, when the only change is that the data has been saved to the delta table? Does "display" somehow run the join function again?

Thank you!

Simplified code

#Load existing delta table

deltaDF = spark.read.format('delta').load(filePath)

#Remove any row that is identical to already existing data

condition = [<relevant column comparisons>]

noexistingDF = DF.join(deltaDF, on=condition, how="left_anti")

#Performing some additional columns to the data based on the already present data

display(noexistingDF ) #successfullly displays data

#Saving data to delta table

noexistingDF.write.format("delta").mode("append").save(fileDestination)

display(noexistingDF ) #Suddenly the dataframe is empty

-werners- · ‎05-25-2022

yes. Spark is lazy evaluated meaning that it will execute code only on actions. display() is such an action, write too.

So Spark will first execute the query for the write (read, transform, write), and then do the same a second time for the display (read, transform, display). The left_anti join will return nothing as the data is added to the delta table.

if you would read/write from different tables, the df would not be empty (as the read table has not changed).

View solution in original post

-werners- · ‎05-25-2022

yes. Spark is lazy evaluated meaning that it will execute code only on actions. display() is such an action, write too.

So Spark will first execute the query for the write (read, transform, write), and then do the same a second time for the display (read, transform, display). The left_anti join will return nothing as the data is added to the delta table.

if you would read/write from different tables, the df would not be empty (as the read table has not changed).

Christine · ‎05-25-2022

Okay, thank you! Do you know if there is a way to copy the table to work around it so display does not transform based on the read table but displays the data as it was before saving?

-werners- · ‎05-25-2022

There are several ways.

But they all come down to the same: writing the df to disk.

So if you write noexistingDF to disk (by spark.write or checkpoint) and then read it, you're there.

Copying the delta table itself seems overkill (althoug it can be done).

mayur_05 · ‎05-09-2024

But what if i want to do some transformation after writing noexisting df into tables and use that df into my code

Anonymous · ‎05-31-2022

Hi @Christine Pedersen please let us know if @Werner Stinckens answered helped you in mitigating the issue or do you need any further help on this?

Christine · ‎06-01-2022

Hi @Chetan Kardekar, the replies did answer my question, so I do not need more information, thank you.

Anonymous · ‎07-22-2022

Hey there @Christine Pedersen

Hope everything is going great!

Would you be happy to circle back and mark an answer as best? It would be really helpful for the other members to find the solution more quickly.

Cheers!

Christine · ‎07-31-2022

Hey @Vartika Nain

Yes of course.

Cheers!

SharathE · ‎09-23-2023

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works??

/

Databricks Community

pyspark dataframe empties after it has been saved to delta lake.

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐