Resolved! Pandas.spark.checkpoint() doesn't broke lineage
Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/") import pyspark.pandas as ps sql=("""select field1, field2 From table Where date>='2021-01.01""") df = ps.sql(sql) df.spark.checkpoint()That...
- 5577 Views
- 8 replies
- 9 kudos
Latest Reply
checkpoint() returns a checkpointed DataFrame, so you need to assign it to a new variable:checkpointedDF = df.checkpoint()
- 9 kudos