etsyal1e2r3
Honored Contributor

You can add a column and give it a value of the days date that you ran for the newly added data with the selectExpr() function in autoloader. Itd look something like this...

From pyspark.sql.functions import current_timestamp
 
spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "json") \
  # The schema location directory keeps track of your data schema over time
  .option("cloudFiles.schemaLocation", "<path-to-checkpoint>") \
  .load("<source-data-with-nested-json>") \
  .selectExpr(
    "*",
    "current_timestamp() as `Date_Pulled`",
  )