Resolved! Pandas.spark.checkpoint() doesn't broke lineage
Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/") import pyspark.pandas as ps sql=("""select field1, field2 From table Where date>='2021-01.01""") df = ps.sql(sql) df.spark.checkpoint()That...
- 4507 Views
- 7 replies
- 9 kudos
Latest Reply
If you need checkpointing, please try the below code. Thanks to persist, you will avoid reprocessing:df = ps.sql(sql).persist() df.spark.checkpoint()
- 9 kudos