Resolved! Pandas.spark.checkpoint() doesn't broke lineage
Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/") import pyspark.pandas as ps sql=("""select field1, field2 From table Where date>='2021-01.01""") df = ps.sql(sql) df.spark.checkpoint()That...
- 2877 Views
- 7 replies
- 8 kudos
- 8 kudos
If you need checkpointing, please try the below code. Thanks to persist, you will avoid reprocessing:df = ps.sql(sql).persist() df.spark.checkpoint()
- 8 kudos